17th place solution | Hand-Tuning and features | 优胜方案

第17名解决方案 | 手工调参与特征工程

作者：JEANMPIA | 比赛排名：第17名

首先，我们的核心方法是采取3种不同的方案，彼此之间不透露计划，最后进行集成。就像这样：

集成方法示意图

在这个话题中，我只讨论我的解决方案，你可以在这里找到大部分代码。 集成工作由 @shashwatraman 完成。

@shashwatraman 的解决方案 : https://www.kaggle.com/competitions/playground-series-s3e11/discussion/399485
@ifreenibrahim 的解决方案 : https://www.kaggle.com/competitions/playground-series-s3e11/discussion/399438

我的部分示意图

总体思路：

我模型的主要关注点是模型的超参数，为了确保我的交叉验证（CV）足够稳健以支持这种搜索，同时避免对训练集甚至验证集过拟合。在调参方面，我全部是手工完成的，所以我无法提供 Optuna 的指南。我认为在超过一定点之后，Optuna 在 GPU 时间上成本太高，但提升却很微小。手工调参对我来说效果最好，而且你还能学到很多关于模型架构的知识。

特征工程：

正如许多人指出的那样，并非所有特征都是相关的，我的选择是：

FEATS = ["total_children", "num_children_at_home", "avg_cars_at home(approx).1", "store_sqft", "coffee_bar", "video_store", 'florist',"prepared_food"]

我还为我的 CatBoost 模型创建了一些特征，如下所示：

concat_train_hold_test = pd.concat([train,hold,test],ignore_index=True)

for feature in INIT_FEATS:
    if feature in ['units_per_case','store_sales(in millions)','total_children']:
        avg_df[f'avg_{feature}'] = concat_train_hold_test.groupby('store_sqft')[feature].mean()
        avg_df_test[f'avg_{feature}'] = concat_train_hold_test.groupby('store_sqft')[feature].mean()
        avg_df_hold[f'avg_{feature}'] = concat_train_hold_test.groupby('store_sqft')[feature].mean()
        avg_df_original[f'avg_{feature}'] = concat_train_hold_test.groupby('store_sqft')[feature].mean()
        
        CAT_FEATS.append(f'avg_{feature}')

你可能想知道这段代码的作用以及这些特征的目的，请查看这个话题。

模型：

XGB
CAT
LGBM

这就是我的解决方案！请查看我的队友发布的解决方案！

17th place solution | Hand-Tuning and features

第17名解决方案 | 手工调参与特征工程

首先，我们的核心方法是采取3种不同的方案，彼此之间不透露计划，最后进行集成。就像这样：

总体思路：

特征工程：

模型：

同比赛其他方案