42nd place solution

574. CommonLit - Evaluate Student Summaries | commonlit-evaluate-student-summaries

开始: 2023-07-12 结束: 2023-10-11 智能评测数据算法赛

第42名解决方案

作者： Lawliet（贡献者）
排名： 第42名（超过2000支队伍）

第42名解决方案

这是我第一次参加Kaggle竞赛，最终在2000多支队伍中获得了第42名。

训练代码： cless/run_sweep.py
推理代码

解决方案

最终解决方案包含

6个Deberta模型（2个base版本和4个large版本）
每个模型之上的LGBM
Late Fusion集成

结果

交叉验证：

编号	检查点名称	模型名称	添加特征	伪标签	内容RMSE	表达RMSE	MCRMSE	LGBM内容RMSE	LGBM表达RMSE	LGBMMCRMSE
1	lrg-add-nops-202310101531	large	是	否	0.436815	0.563804	0.500309	0.416913	0.547992	0.482452
2	lrg-add2-nops-202310071225	large	是	否	0.422034	0.557535	0.489785	0.422094	0.552657	0.487376
3	deberta-large-pseudo-20231004	large	否	是	0.447833	0.599299	0.523566	0.431393	0.559015	0.495204
4	cless-deberta-20230919-2131-ensamble	base	否	否	0.478477	0.6168	0.547638	0.436959	0.556711	0.496835
5	base-noadd-pseu-202310101211	base	否	是	0.450015	0.631051	0.540533	0.426784	0.570818	0.498801
6	lrg-add2-ps-frz8-202310061415	large	是	是	0.450208	0.574412	0.51231	0.439451	0.563622	0.501536

列说明：

检查点名称 - 模型检查点名称（包含4折）
模型名称 - 使用的模型：microsoft/deberta-v3-base 或 microsoft/deberta-v3-large
添加特征 - 如果为True，则prompt_text和prompt_question都作为输入添加到模型中
伪标签 - 在Feedback Prize - English Language Learning数据集上的预训练
内容RMSE, 表达RMSE, MCRMSE - 原始Deberta回归模型的指标
LGBM内容RMSE, LGBM表达RMSE, LGBMMCRMSE - 每个Deberta之上的LGBM回归器指标

注意： 在输入中未添加prompt_text和prompt_question的模型(adds=False)的CV MCRMSE要差得多。然而，这通过LGBM得到了缓解，LGBM利用了prompt_text的手工文本挖掘技术，这些模型的LGBM MCRMSE得到了显著改善。

集成指标

编号	指标	值
1	内容RMSE	0.40780695722821314
2	表达RMSE	0.5389858871892348
3	MCRMSE	0.47339642220872397

排行榜

编号	指标	值
1	MCRMSE（公开）	0.432
2	MCRMSE（私有）	0.468

感谢该笔记本的作者，它对我帮助很大。

训练代码 cless/run_sweep.py 推理代码 Kaggle Notebook

同比赛其他方案

1st: A brief review of the competition experience (Detail solution is on the way)

2nd Place Solution

3rd place solution

4th place solution

5th place solution