返回列表

1st place solution: everyone can be a winner!

444. Tabular Playground Series - Jun 2021 | tabular-playground-series-jun-2021

开始: 2021-06-01 结束: 2021-06-30 商品理解 数据算法赛
第一名方案:人人都能成为赢家!

第一名方案:人人都能成为赢家!

作者: kailai | 排名: 第1名 | 点赞数: 57

0. 前言

如果你认真练习过 TPS(Tabular Playground Series)1-5 月的比赛,每个人都能成功解决这个问题。

1. 第一步:构建 XGBoost 模型

使用了以下 3 组参数构建了大约 15 个 XGBoost 模型,选择了约 90 个预测结果作为特征来训练神经网络(NNet)。

1.1 参数组 1

params <- list( 
  "tree_method"         = "hist",
  "max_bin" = 512,   
  "max_leaves" = 150,
  "min_child_weight"    = 110,
  "grow_policy" = "lossguide",
  "eta"                 = 0.009,
  "max_depth"           = 0,
  "subsample"           = 0.7,
  "colsample_bytree"    = 0.11
  ,"colsample_bylevel"   = 0.90
  #,"colsample_bynode"    = 0.80
  ,"lambda"              = 0
  ,"alpha" = 22
  ,objective = "multi:softprob" 
  ,eval_metric = "mlogloss"
  ,"num_class" = 9
  ,"max_delta_step" = 10
)

1.2 参数组 2

params <- list( 
  "eta"                 = 0.006,
  "max_depth"           = 22,
  "min_child_weight"    = 110,
  "gamma"               = 0.01,
  "subsample"           = 0.7 
  ,"colsample_bytree"    = 0.1
  ,"colsample_bylevel"   = 0.90
  ,"colsample_bynode"    = 0.80
  ,"lambda"              = 1.5
  ,"alpha" = 21
  ,objective = "multi:softprob" 
  ,eval_metric = "mlogloss"
  ,"num_class" = 9
  ,"max_delta_step" = 10
)

1.3 参数组 3

params <- list( 
  "tree_method"         = "hist",
  "max_bin" = 512,   
  "max_leaves" = 200,  
  "grow_policy" = "lossguide"
  #"max_depth"            = 3
  ,"min_child_weight"    = 110
  ,"eta"                 = 1
  ,"alpha"               = 22
  ,"lambda"              = 0 
  ,"subsample"           = 0.70  
  ,"colsample_bytree"    = 0.11
  ,"colsample_bylevel"   = 0.90
  ,"num_parallel_tree"   = 110
  ,objective = "multi:softprob" 
  ,eval_metric = "mlogloss"
  ,"num_class" = 9
)

2. 第二步:模型融合

使用 R 语言的 ann2 包进行集成。

bst <- neuralnetwork(X, Y, hidden.layers = c(63, 27), standardize = TRUE,
                   optim.type = 'adam', learn.rates = 0.0004, val.prop = 0.2
                   ,batch.size = 320, random.seed = 8888688
                   ,L1 = 2, L2 = 0, activ.functions = c('sigmoid', 'sigmoid')
                   ,n.epochs = 200)

到了这一步,如果仅使用树模型的预测作为特征,Private LB 分数可能为 1.73900

如果加入一些神经网络预测作为特征,Private LB 分数可能为 1.73890

3. 第三步:加权平均

感谢 Kaggle!

同比赛其他方案