Place 22 with only 1 model

仅用 1 个模型获得第 22 名

Qwen3 和轻微的特征工程奏效了。

作者: Martin Kovacevic Buvinic (ragnar123) [GRANDMASTER]
竞赛排名: 22
发布日期: 2025-10-16

我使用 4 折交叉验证（4 KFold）训练了一个 QWEN3-14B 模型。

我应用了以下特征工程函数。

# =========================================================================================================================
# Preprocess
# =========================================================================================================================
def preprocess(sample, cfg = CFG):
    text = 'Math question: ' \
    + sample['QuestionText'] \
    + '.' \
    + ' Possible Answer A: ' \
    + sample['pos_answer1'] \
    + '.' \
    + ' Possible Answer B: ' \
    + sample['pos_answer2'] \
    + '.' \
    + ' Possible Answer C: ' \
    + sample['pos_answer3'] \
    + '.' \
    + ' Possible Answer D: ' \
    + sample['pos_answer4'] \
    + '.' \
    + ' Possible Misconceptions: ' \
    + sample['Possible_Misconception'] \
    + '.' \
    + ' Student Answer: ' \
    + sample['MC_Answer'] \
    + '.' \
    + ' Student Explanation: ' \
    + sample['StudentExplanation'] \
    + '.' \
    + ' Is correct: ' \
    + sample['is_correct'] \
    + '.' 
    inputs = cfg.tokenizer.encode_plus(
        text, 
        return_tensors = None, 
        add_special_tokens = True, 
        max_length = cfg.max_len,
        padding = False,
        truncation = True
    )
    for k, v in inputs.items():
        inputs[k] = torch.tensor(v, dtype = torch.long)
    return inputs

我为每个问题添加了可能的答案，以及对应问题的可能误解。

训练使用 qlora 完成，训练 2 个 epoch，学习率 2e-4，批量大小 16，lora_alpha = 16，r = 8。

然后我使用 4 折交叉验证训练了另一个 Qwen3-14B 模型，但使用了不同的种子，这使 Bagging 效果更强。

因为标签存在噪声（可能是标注者之间存在分歧），Bagging 很有用，因为它模拟了标注者所处理的情况。

最终模型是 8 个 Qwen3-14B 模型预测的平均值。

我相信很多人用 100% 的数据训练并信任公共 leaderboard，因为分数非常接近，我根据我的折外（OOF）分数来指导提交。

我尝试了另一种技术，即对抗权重扰动（Adversarial Weight Perturbation）。它实际上大大提高了我的 CV 分数，但我没有时间提交。也许这可以轻松获得金牌 xD。

以下是 AWP 的代码：

class AWPTrainer(Trainer):
    def __init__(self, *args, adv_lr=1e-4, adv_eps=1e-3, **kwargs):
        super().__init__(*args, **kwargs)
        self.adv_lr = adv_lr
        self.adv_eps = adv_eps
        self.backup = {}

    def _attack_step(self):
        e = 1e-6
        for name, param in self.model.named_parameters():
            if param.requires_grad and param.grad is not None:
                self.backup[name] = param.data.clone()
                norm_grad = torch.norm(param.grad)
                norm_data = torch.norm(param.data)
                
                if norm_grad != 0 and not torch.isnan(norm_grad):
                    p_adv = param.grad / (norm_grad + e) * self.adv_eps * norm_data
                    param.data.add_(p_adv)

    def _restore_step(self):
        for name, param in self.model.named_parameters():
            if name in self.backup:
                param.data = self.backup[name]
        self.backup = {}

    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        # ⭐ FINAL, ROBUST FIX ⭐
        # Check the model's state instead of the trainer's.
        # The Trainer automatically sets model.training to False for evaluation.
        if not model.training:
            return super().compute_loss(model, inputs, return_outputs=return_outputs, **kwargs)

        # --- AWP Logic for Training Step Only ---
        
        # 1. Calculate original loss and gradients
        loss = super().compute_loss(model, inputs, return_outputs=False, **kwargs)
        loss.backward(retain_graph=True)
        
        # 2. Apply adversarial perturbation
        self._attack_step()
        
        # 3. Calculate adversarial loss
        adv_loss = super().compute_loss(model, inputs, return_outputs=False, **kwargs)
        
        # 4. Restore original weights
        model.zero_grad()
        self._restore_step()
        
        # The trainer will call .backward() on this returned loss.
        return adv_loss

希望这对大家有帮助，再见。

仅用 1 个模型获得第 22 名

Qwen3 和轻微的特征工程奏效了。

同比赛其他方案