第7名解决方案

作者: Ethan (Kaggle Grandmaster)
发布时间: 2023年10月12日

首先，感谢比赛主办方举办这场有趣的竞赛，也感谢我的队友 @emiria。我从emiria的想法和代码中学到了很多。祝贺emiria获得第四枚金牌，一位新的特级大师(GM)即将诞生。这也是我在NLP比赛中的第一枚金牌。

我们策略的关键点：

对我们无效的方法：

这是我们用于最终提交模型的描述。我们使用基于"prompt_id"的GroupKFold进行本地验证，并在推理时使用所有提示进行训练。

id	骨干网络	输入	最大长度	损失函数	CV
model1	deberta-v3-large	text+sep+prompt_text+sep+prompt_question	1280	mseloss	0.500
model2	deberta-v3-large	text+sep+prompt_title+sep+prompt_question+sep+prompt_text	1280	mseloss	0.489
model3	deberta-v3-large	prompt_title+sep+prompt_question+sep+text+sep+prompt_text	1280	mseloss	0.506
model4	deberta-v3-large+lgb	prompt_question+sep+text	512	mseloss	0.520
model5	deberta-v3-large	text+sep+prompt_title+sep+prompt_question+sep+prompt_text	768	mseloss	-
model6	deberta-v3-large	text+sep+prompt_title+sep+prompt_question+sep+prompt_text	768	logloss	-
model7	deberta-large	text+sep+prompt_title+sep+prompt_question+sep+prompt_text	1024	mseloss	-

以下是我们的模型及其最佳分数：
每个模型是两个种子的平均值，除了"model4"（包含lightgbm）。

PB	LB	是否选用	模型组合
0.456	0.427	是	0.32model1+0.32model2+0.16model3+0.2model7
0.453	0.428	否	0.32model1+0.32model2+0.16model4+0.1model5+0.1*model6

7th Solution