2nd place solution

第二名解决方案

作者：Takoi (Grandmaster) | 比赛排名：第2名

首先，我要感谢 Kaggle 和主办方举办了如此有趣的比赛！

摘要

我选择了 Public LB 表现最好和 CV 表现最好的两个模型作为最终提交。

	CV	Public	Private
最佳 Public	0.4503	0.444	0.446
最佳 CV	0.4449	0.447	0.447

以下是关于最佳 Public 模型的说明。（在最佳 CV 模型中，权重是通过 nelder-mead 算法确定的，并且取消了权重之和为 1 的限制。模型部分也相对于最佳 Public 模型做了一些改动。）
我集成了 19 个模型并进行了后处理。我通过观察 LB 和 CV 来调整模型的权重。我使用了负权重以及正权重。在后处理中，我根据预测值乘以了不同的系数。

交叉验证

我使用了以下方法：
https://www.kaggle.com/abhishek/step-1-create-folds

模型与权重

除了模型 1 和 2 之外，我将其他模型的 dropout 设置为 0 进行训练。此外，我只对模型 3 进行了 mlm 预训练。权重通过 nelder-mead 算法计算，然后针对更高的 LB 进行微调。

模型	CV	Public	权重
1. roberta-base -> svr	0.500	0.476	0.020
2. roberta-base -> ridge	0.500		0.020
3. roberta-base	0.485	0.476	0.040
4. roberta-large	0.483	0.463	0.088
5. muppet-roberta-large	0.480	0.466	0.022
6. bart-large	0.476	0.469	0.090
7. electra-large	0.483	0.470	0.050
8. funnel-large-base	0.479	0.471	0.050
9. deberta-large	0.481	0.460	0.230
10. deberta-v2-xlarge	0.486	0.466	0.050
11. mpnet-base	0.482	0.470	0.130
12. deberta-v2-xxlarge	0.482	0.465	0.140
13. funnel-large	0.475	0.464	0.110
14. gpt2-medium	0.498	0.478	0.170
15. albert-v2-xxlarge	0.486	0.467	0.120
16. electra-base	0.493		同比赛其他方案 1st place solution - external data, teacher/student and sentence transformers 3rd place solution (0.447 private - 3 models simple average) 🏅️ 4th Place Solution (0.447) 🏅️ 5th place solution 6th place solution (Gaussian process regression (GPR))

第二名解决方案

摘要

交叉验证

模型与权重

同比赛其他方案