第18名方案

作者：Moyashii | 排名：第18名

摘要

我使用了不同的训练设置（频谱图设置、模型类型、数据增强）进行 logit 蒸馏，并组装了四个不同配置的蒸馏模型集成。

蒸馏

使用来自多标签知识蒸馏的 $L_{MLD}$ 提升了模型准确率。

def forward(
    self,
    logits_student: torch.Tensor,
    logits_teacher: torch.Tensor,
    target: torch.Tensor,
    epoch: int,
) -> torch.Tensor:
    # focal loss
    loss_focal = self.focal_weight * self.focal_loss(logits_student, target.float())

    # mld loss
    prob_student = torch.sigmoid(logits_student)
    prob_teacher = torch.sigmoid(logits_teacher)
    prob_student = torch.clamp(prob_student, min=self.eps, max=1-self.eps)
    prob_teacher = torch.clamp(prob_teacher, min=self.eps, max=1-self.eps)
    loss_mld = self.kl_div_loss(torch.log(prob_student), prob_teacher) + self.kl_div_loss(torch.log(1 - prob_student), 1 - prob_teacher)
    loss_mld = loss_mld.mean()
    loss_mld = min(epoch / self.warmup, 1.0) * loss_mld

    return loss_focal + loss_mld

模式 1

模型	公开	私有
ECA NFNet L0 (teacher)	0.685730	0.637945
EfficientNet B0 (student)	0.681507	0.638541
Focal + KD	0.674996	0.639671
Focal + DKD	0.657009	0.629826
Focal + MLD	0.695467	0.655356

模式 2

模型	公开	私有
ECA NFNet L1 (teacher)	0.690174	0.630097
EfficientNet B0 (student)	0.661563	0.622148
Focal + KD	N/A	N/A
Focal + DKD	N/A	N/A
Focal + MLD	0.704317	0.633722

模式 3

模型	公开	私有
ECA NFNet L1 (teacher)	0.680272	0.658853
EfficientNet B1 (student)	0.661441	0.620913
Focal + KD	N/A	N/A
Focal + DKD	N/A	N/A
Focal + MLD	0.693386	0.667936

KD：Logit 蒸馏；DKD：解耦知识蒸馏（Decoupled Knowledge Distillation）；MLD：来自多标签知识蒸馏的 $L_{MLD}$。

反思

✅ 采用 MLD 损失的蒸馏取得了显著效果。
✅ 使用多样化的训练设置让我们能够抵御分数波动，并构建出比公开笔记本略准确的方案。
❌ 我们在确定最佳方案时遇到了困难。

附录

将 timm 中的 eca_nfnet_l* 模型转换为 OpenVINO 格式的代码如下：

import copy
from functools import reduce

import torch.nn as nn
import torch.nn.functional as F

import timm

def get_module_by_name(module: nn.Module, access_string: str) -> nn.Module:
    names = access_string.split(sep='.')
    return reduce(getattr, names, module)

def convert_scaled_std_conv2d_to_conv2d(model: nn.Module) -> nn.Module:
    converted_model = copy.deepcopy(model)
    module_table = dict(converted_model.named_modules())
    for name, m in converted_model.named_modules():
        if isinstance(m, timm.layers.std_conv.ScaledStdConv2d):
            scaled_weight = F.batch_norm(
                m.weight.reshape(1, m.out_channels, -1), None, None,
                weight=(m.gain * m.scale).view(-1),
                training=True, momentum=0., eps=m.eps).reshape_as(m.weight).detach()

            bias = m.bias is not None
            conv = nn.Conv2d(m.in_channels, m.out_channels, m.kernel_size,
                             stride=m.stride, padding=m.padding, dilation=m.dilation,
                             groups=m.groups, bias=bias, padding_mode=m.padding_mode)
            conv.weight.data = scaled_weight
            if bias:
                conv.bias.data = m.bias

            # replace ScaledStdConv2d to nn.Conv2d
            parent_name, child_name = name.rsplit('.', 1)
            parent_module = module_table[parent_name]
            setattr(parent_module, child_name, conv)

    return converted_model

18th place

第18名方案

摘要

蒸馏

模式 1

模式 2

模式 3

反思

附录

同比赛其他方案