485. NBME - Score Clinical Patient Notes | nbme-score-clinical-patient-notes
首先,我要感谢竞赛主办方举办了这场有趣的比赛。感谢我优秀的队友 @bestpredict 和 @lxf615712,感谢大家一个多月的辛勤工作。同时也要感谢 Kaggle 社区,我从各种 Notebooks 和讨论中学到了很多。感谢优秀的 基线 和 实验结果,以及 case5 的相关讨论。
基于字符概率的5个模型融合。在最后几天,我们加入了 v2 xxlarge 模型,它的 CV 分数为 884,LB 分数为 883(2折),表现并不算好,但在模型融合中对 LB 有提升,对 PB 却有负面影响。我们当时没有选择 30 多个 PB 表现更好的提交。请相信你的 CV(交叉验证)。
空间迁移 + OOF 集合(参考 word probs)。我们针对某些特征设置了特定的阈值(低于所有最佳 case_num 阈值的 CV 表现在 PB 上更好)。
def convert_offsets_to_word_indices(preds_offsets, texts, case_nums, feature_nums, th=0.5):
predicts = []
for text, preds, case_num, feature_num in zip(texts, preds_offsets, case_nums, feature_nums):
encoded_text = tokenizer(text, add_special_tokens=True, max_length=CFG.max_len, padding="max_length", return_offsets_mapping=True)
offset_mapping = encoded_text['offset_mapping']
sep_index = encoded_text["input_ids"].index(tokenizer.sep_token_id)
result = np.zeros(len(preds))
results = np.zeros(sep_index)
for idx, (offset, pred) in enumerate(zip(offset_mapping[:sep_index], preds)):
start = offset[0]
results[idx] = preds[start]
sample_pred_scores = results
# 针对特定 case_num 和 feature_num 设置不同的阈值
if str(feature_num)[-1] == '3' and (str(case_num) == '0' or str(case_num) == '3'):
result = [1 if s >= 0.54 else 0 for s in results]
elif str(feature_num)[-1] == '3' and (str(case_num) == '1'):
result = [1 if s >= 0.45 else 0 for s in results]
elif str(feature_num)[-1] == '3' and str(case_num) == '6':
result = [1 if s >= 0.52 else 0 for s in results]
elif str(case_num) == '5' and (str(feature_num) == '503'):
result = [1 if s >= 0.49 else 0 for s in results]
elif str(case_num) == '5' and (str(feature_num) == '504'):
result = [1 if s >= 0.4 else 0 for s in results]
elif str(case_num) == '5' and (str(feature_num) == '508'):
result = [1 if s >= 0.49 else 0 for s in results]
elif str(case_num) == '5' and (str(feature_num) == '509'):
result = [1 if s >= 0.4 else 0 for s in results]
elif str(case_num) == '5' and (str(feature_num) == '510'):
result = [1 if s >= 0.55 else 0 for s in results]
elif str(case_num) == '5' and (str(feature_num) == '511'):
result = [1 if s >= 0