二手车交易价格预测task1

阅读: 评论:0

二手车交易价格预测task1

二手车交易价格预测task1

赛题概况

赛题以预测二手车的交易价格为任务,数据集报名后可见并可下载,该数据来自某交易平台的二手车交易记录,总数据量超过40w,包含31列变量信息,其中15列为匿名变量。为了保证比赛的公平性,将会从中抽取15万条作为训练集,5万条作为测试集A,5万条作为测试集B,同时会对name、model、brand和regionCode等信息进行脱敏。

本赛题的评价标准为MAE(Mean Absolute Error)😗*

M A E = ∑ i = 1 n ∣ y i − y ^ i ∣ n MAE=frac{sum_{i=1}^{n}left|y_{i}-hat{y}_{i}right|}{n} MAE=n∑i=1n​∣yi​−y^​i​∣​
其中 y i y_{i} yi​代表第 i i i个样本的真实值,其中 y ^ i hat{y}_{i} y^​i​代表第 i i i个样本的预测值。

代码示例
## 基础工具
import numpy as np
import pandas as pd
import warnings
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.special import jn
from IPython.display import display, clear_output
import time
import csv
warnings.filterwarnings('ignore')
%matplotlib inline## 模型预测的
from sklearn import linear_model
from sklearn import preprocessing
from sklearn.svm import SVR
semble import RandomForestRegressor,GradientBoostingRegressor## 数据降维处理的
from sklearn.decomposition import PCA,FastICA,FactorAnalysis,SparsePCAimport lightgbm as lgb
import xgboost as xgb## 参数搜索和评价的
del_selection import GridSearchCV,cross_val_score,StratifiedKFold,train_test_split
ics import mean_squared_error, mean_absolute_error
def read_file(path,count=None):'''定义读取文件的函数path: 文件路径count: 读取行数'''# 1
#     list = []
#     with open(path,"r",encoding='UTF-8') as f: #以只读的方式打开文件
#         read_scv&#ader(f) #调用csv的reader方法读取文件并赋值给read_scv变量
#         for i,line in enumerate(read_scv):
#             if i == count:
#                 break
#             list.append(line) #将读取到的数据追加到list列表里面
#     list = pd.DataFrame(list)
#     lumns=['id','heartbeat_signals','label']
#     return list[1:] #返回列表数据# ad_csv(path,sep=' ',nrows = count)
Train_data = read_file('used_car_train_20200313.csv')
TestA_data = read_file('used_car_testA_20200313.csv')

数据字段

  • SaleID 交易ID,唯一编码
  • name 汽车交易名称,已脱敏
  • regDate 汽车注册日期,例如20160101,2016年01月01日
  • model 车型编码,已脱敏
  • brand 汽车品牌,已脱敏
  • bodyType 车身类型:豪华轿车:0,微型车:1,厢型车:2,大巴车:3,敞篷车:4,双门汽车:5,商务车:6,搅拌车:7
  • fuelType 燃油类型:汽油:0,柴油:1,液化石油气:2,天然气:3,混合动力:4,其他:5,电动:6
  • gearbox 变速箱:手动:0,自动:1
  • power 发动机功率:范围 [ 0, 600 ]
  • kilometer 汽车已行驶公里,单位万km
  • notRepairedDamage 汽车有尚未修复的损坏:是:0,否:1
  • regionCode 地区编码,已脱敏
  • seller 销售方:个体:0,非个体:1
  • offerType 报价类型:提供:0,请求:1
  • creatDate 汽车上线时间,即开始售卖时间
  • price 二手车交易价格(预测目标)
  • v_0’, ‘v_1’, ‘v_2’, ‘v_3’, ‘v_4’, ‘v_5’, ‘v_6’, ‘v_7’, ‘v_8’, ‘v_9’, ‘v_10’, ‘v_11’, ‘v_12’, ‘v_13’,‘v_14’(根据汽车的评论、标签等大量信息得到的embedding向量)匿名特征,包含v0-14在内15个匿名特征

数据观察

# 数据纵览
Train_data.head().append(Train_data.tail())
# TestA_data.head().append(TestA_data.tail())
SaleIDnameregDatemodelbrandbodyTypefuelTypegearboxpowerkilometer...v_5v_6v_7v_8v_9v_10v_11v_12v_13v_14
007362004040230.061.00.00.06012.5...0.2356760.1019880.1295490.0228160.097462-2.8818032.804097-2.4208210.7952920.914762
1122622003030140.012.00.00.0015.0...0.2647770.1210040.1357310.0265970.020582-4.9004822.096338-1.030483-1.7226740.245522
221487420040403115.0151.00.00.016312.5...0.2514100.1149120.1651470.0621730.027075-4.8467491.8035591.565330-0.832687-0.229963
337186519960908109.0100.00.01.019315.0...0.2742930.1103000.1219640.0333950.000000-4.5095991.285940-0.501868-2.438353-0.478699
4411108020120103110.051.00.00.0685.0...0.2280360.0732050.0918800.0788190.121534-1.8962400.9107830.9311102.8345181.923482
14999514999516397820000607121.0104.00.01.016315.0...0.2802640.0003100.0484410.0711580.0191741.988114-2.9839730.589167-1.304370-0.302592
14999614999618453520091102116.0110.00.00.012510.0...0.2532170.0007770.0840790.0996810.0793711.839166-2.7746152.5539940.924196-0.272160
1499971499971475872010100360.0111.01.00.0906.0...0.2333530.0007050.1188720.1001180.0979142.439812-1.6306772.2901971.8919220.414931
149998149998459072006031234.0103.01.00.015615.0...0.2563690.0002520.0814790.0835580.0814982.075380-2.6337191.4149370.431981-1.659014
1499991499991776721999020419.0286.00.01.019312.5...0.2844750.0000000.0400720.0625430.0258191.978453-3.1799130.031724-1.483350-0.342674

10 rows × 31 columns

#数据行列信息
print('Train data shape:',Train_data.shape)
print('Test data shape:',TestA_data.shape)
Train data shape: (150000, 31)
Test data shape: (50000, 30)
# 数据信息查看
# 通过info来了解数据每列的type
# Train_data.info()
TestA_data.info()
<class &#frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 30 columns):#   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  0   SaleID             50000 non-null  int64  1   name               50000 non-null  int64  2   regDate            50000 non-null  int64  3   model              50000 non-null  float644   brand              50000 non-null  int64  5   bodyType           48587 non-null  float646   fuelType           47107 non-null  float647   gearbox            48090 non-null  float648   power              50000 non-null  int64  9   kilometer          50000 non-null  float6410  notRepairedDamage  50000 non-null  object 11  regionCode         50000 non-null  int64  12  seller             50000 non-null  int64  13  offerType          50000 non-null  int64  14  creatDate          50000 non-null  int64  15  v_0                50000 non-null  float6416  v_1                50000 non-null  float6417  v_2                50000 non-null  float6418  v_3                50000 non-null  float6419  v_4                50000 non-null  float6420  v_5                50000 non-null  float6421  v_6                50000 non-null  float6422  v_7                50000 non-null  float6423  v_8                50000 non-null  float6424  v_9                50000 non-null  float6425  v_10               50000 non-null  float6426  v_11               50000 non-null  float6427  v_12               50000 non-null  float6428  v_13               50000 non-null  float6429  v_14               50000 non-null  float64
dtypes: float64(20), int64(9), object(1)
memory usage: 11.4+ MB
# 通过 .columns 查看列名
# lumns
lumns
Index(['SaleID', 'name', 'regDate', 'model', 'brand', 'bodyType', 'fuelType','gearbox', 'power', 'kilometer', 'notRepairedDamage', 'regionCode','seller', 'offerType', 'creatDate', 'v_0', 'v_1', 'v_2', 'v_3', 'v_4','v_5', 'v_6', 'v_7', 'v_8', 'v_9', 'v_10', 'v_11', 'v_12', 'v_13','v_14'],dtype='object')
# 通过 .describe() 可以查看数值特征列的一些统计信息
# 个数count、平均值mean、方差std、最小值min、中位数25% 50% 75% 、以及最大值
# Train_data.describe()
TestA_data.describe()
SaleIDnameregDatemodelbrandbodyTypefuelTypegearboxpowerkilometer...v_5v_6v_7v_8v_9v_10v_11v_12v_13v_14
count50000.00000050000.0000005.000000e+0450000.00000050000.00000048587.00000047107.00000048090.00000050000.00000050000.000000...50000.00000050000.00000050000.00000050000.00000050000.00000050000.00000050000.00000050000.00000050000.00000050000.000000
mean174999.50000068542.2232802.003393e+0746.8445208.0562401.7821850.3734050.224350119.88362012.595580...0.2486690.0450210.1227440.0579970.062000-0.017855-0.013742-0.013554-0.0031470.001516
std14433.90106761052.8081335.368870e+0449.4695487.8194771.7607360.5464420.417158185.0973873.908979...0.0446010.0517660.1959720.0292110.0356533.7479853.2312582.5159621.2865971.027360
min150000.0000000.0000001.991000e+070.0000000.0000000.0000000.0000000.0000000.0000000.500000...0.0000000.0000000.0000000.0000000.000000-9.160049-5.411964-8.916949-4.123333-6.112667
25%162499.75000011203.5000001.999091e+0710.0000001.0000000.0000000.0000000.00000075.00000012.500000...0.2437620.0000440.0626440.0350840.033714-3.700121-1.971325-1.876703-1.060428-0.437920
50%174999.50000052248.5000002.003091e+0729.0000006.0000001.0000000.0000000.000000109.00000015.000000...0.2578770.0008150.0958280.0570840.0587641.613212-0.355843-0.142779-0.0359560.138799
75%187499.250000118856.5000002.007110e+0765.00000013.0000003.0000001.0000000.000000150.00000015.000000...0.2653280.1020250.1254380.0790770.0874892.8327081.2629141.7643350.9414690.681163
max199999.000000196805.0000002.015121e+07246.00000039.0000007.0000006.0000001.00000020000.00000015.000000...0.2916180.1532651.3588130.1563550.21477512.33887218.85621812.9504985.9132732.624622

8 rows × 29 columns

本文发布于:2024-02-01 17:16:22,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170677935938209.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:二手车   交易价格
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23