1st place solution: everyone can be a winner!

第一名方案：人人都能成为赢家！

作者： kailai | 排名： 第1名 | 点赞数： 57

0. 前言

如果你认真练习过 TPS（Tabular Playground Series）1-5 月的比赛，每个人都能成功解决这个问题。

1. 第一步：构建 XGBoost 模型

使用了以下 3 组参数构建了大约 15 个 XGBoost 模型，选择了约 90 个预测结果作为特征来训练神经网络（NNet）。

1.1 参数组 1

params <- list( 
  "tree_method"         = "hist",
  "max_bin" = 512,   
  "max_leaves" = 150,
  "min_child_weight"    = 110,
  "grow_policy" = "lossguide",
  "eta"                 = 0.009,
  "max_depth"           = 0,
  "subsample"           = 0.7,
  "colsample_bytree"    = 0.11
  ,"colsample_bylevel"   = 0.90
  #,"colsample_bynode"    = 0.80
  ,"lambda"              = 0
  ,"alpha" = 22
  ,objective = "multi:softprob" 
  ,eval_metric = "mlogloss"
  ,"num_class" = 9
  ,"max_delta_step" = 10
)

1.2 参数组 2

params <- list( 
  "eta"                 = 0.006,
  "max_depth"           = 22,
  "min_child_weight"    = 110,
  "gamma"               = 0.01,
  "subsample"           = 0.7 
  ,"colsample_bytree"    = 0.1
  ,"colsample_bylevel"   = 0.90
  ,"colsample_bynode"    = 0.80
  ,"lambda"              = 1.5
  ,"alpha" = 21
  ,objective = "multi:softprob" 
  ,eval_metric = "mlogloss"
  ,"num_class" = 9
  ,"max_delta_step" = 10
)

1.3 参数组 3

params <- list( 
  "tree_method"         = "hist",
  "max_bin" = 512,   
  "max_leaves" = 200,  
  "grow_policy" = "lossguide"
  #"max_depth"            = 3
  ,"min_child_weight"    = 110
  ,"eta"                 = 1
  ,"alpha"               = 22
  ,"lambda"              = 0 
  ,"subsample"           = 0.70  
  ,"colsample_bytree"    = 0.11
  ,"colsample_bylevel"   = 0.90
  ,"num_parallel_tree"   = 110
  ,objective = "multi:softprob" 
  ,eval_metric = "mlogloss"
  ,"num_class" = 9
)

2. 第二步：模型融合

使用 R 语言的 ann2 包进行集成。

bst <- neuralnetwork(X, Y, hidden.layers = c(63, 27), standardize = TRUE,
                   optim.type = 'adam', learn.rates = 0.0004, val.prop = 0.2
                   ,batch.size = 320, random.seed = 8888688
                   ,L1 = 2, L2 = 0, activ.functions = c('sigmoid', 'sigmoid')
                   ,n.epochs = 200)

到了这一步，如果仅使用树模型的预测作为特征，Private LB 分数可能为 1.73900；

如果加入一些神经网络预测作为特征，Private LB 分数可能为 1.73890。

3. 第三步：加权平均

感谢 Kaggle！