673. MAP - Charting Student Math Misunderstandings | map-charting-student-math-misunderstandings
我使用 4 折交叉验证(4 KFold)训练了一个 QWEN3-14B 模型。
我应用了以下特征工程函数。
# =========================================================================================================================
# Preprocess
# =========================================================================================================================
def preprocess(sample, cfg = CFG):
text = 'Math question: ' \
+ sample['QuestionText'] \
+ '.' \
+ ' Possible Answer A: ' \
+ sample['pos_answer1'] \
+ '.' \
+ ' Possible Answer B: ' \
+ sample['pos_answer2'] \
+ '.' \
+ ' Possible Answer C: ' \
+ sample['pos_answer3'] \
+ '.' \
+ ' Possible Answer D: ' \
+ sample['pos_answer4'] \
+ '.' \
+ ' Possible Misconceptions: ' \
+ sample['Possible_Misconception'] \
+ '.' \
+ ' Student Answer: ' \
+ sample['MC_Answer'] \
+ '.' \
+ ' Student Explanation: ' \
+ sample['StudentExplanation'] \
+ '.' \
+ ' Is correct: ' \
+ sample['is_correct'] \
+ '.'
inputs = cfg.tokenizer.encode_plus(
text,
return_tensors = None,
add_special_tokens = True,
max_length = cfg.max_len,
padding = False,
truncation = True
)
for k, v in inputs.items():
inputs[k] = torch.tensor(v, dtype = torch.long)
return inputs
我为每个问题添加了可能的答案,以及对应问题的可能误解。
训练使用 qlora 完成,训练 2 个 epoch,学习率 2e-4,批量大小 16,lora_alpha = 16,r = 8。
然后我使用 4 折交叉验证训练了另一个 Qwen3-14B 模型,但使用了不同的种子,这使 Bagging 效果更强。
因为标签存在噪声(可能是标注者之间存在分歧),Bagging 很有用,因为它模拟了标注者所处理的情况。
最终模型是 8 个 Qwen3-14B 模型预测的平均值。
我相信很多人用 100% 的数据训练并信任公共 leaderboard,因为分数非常接近,我根据我的折外(OOF)分数来指导提交。
我尝试了另一种技术,即对抗权重扰动(Adversarial Weight Perturbation)。它实际上大大提高了我的 CV 分数,但我没有时间提交。也许这可以轻松获得金牌 xD。
以下是 AWP 的代码:
class AWPTrainer(Trainer):
def __init__(self, *args, adv_lr=1e-4, adv_eps=1e-3, **kwargs):
super().__init__(*args, **kwargs)
self.adv_lr = adv_lr
self.adv_eps = adv_eps
self.backup = {}
def _attack_step(self):
e = 1e-6
for name, param in self.model.named_parameters():
if param.requires_grad and param.grad is not None:
self.backup[name] = param.data.clone()
norm_grad = torch.norm(param.grad)
norm_data = torch.norm(param.data)
if norm_grad != 0 and not torch.isnan(norm_grad):
p_adv = param.grad / (norm_grad + e) * self.adv_eps * norm_data
param.data.add_(p_adv)
def _restore_step(self):
for name, param in self.model.named_parameters():
if name in self.backup:
param.data = self.backup[name]
self.backup = {}
def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
# ⭐ FINAL, ROBUST FIX ⭐
# Check the model's state instead of the trainer's.
# The Trainer automatically sets model.training to False for evaluation.
if not model.training:
return super().compute_loss(model, inputs, return_outputs=return_outputs, **kwargs)
# --- AWP Logic for Training Step Only ---
# 1. Calculate original loss and gradients
loss = super().compute_loss(model, inputs, return_outputs=False, **kwargs)
loss.backward(retain_graph=True)
# 2. Apply adversarial perturbation
self._attack_step()
# 3. Calculate adversarial loss
adv_loss = super().compute_loss(model, inputs, return_outputs=False, **kwargs)
# 4. Restore original weights
model.zero_grad()
self._restore_step()
# The trainer will call .backward() on this returned loss.
return adv_loss
希望这对大家有帮助,再见。