返回列表

18th place

612. BirdCLEF 2024 | birdclef-2024

开始: 2024-04-03 结束: 2024-06-10 生命科学 数据算法赛

第18名方案

作者:Moyashii | 排名:第18名

摘要

我使用了不同的训练设置(频谱图设置、模型类型、数据增强)进行 logit 蒸馏,并组装了四个不同配置的蒸馏模型集成。

蒸馏

使用来自多标签知识蒸馏的 $L_{MLD}$ 提升了模型准确率。

def forward(
    self,
    logits_student: torch.Tensor,
    logits_teacher: torch.Tensor,
    target: torch.Tensor,
    epoch: int,
) -> torch.Tensor:
    # focal loss
    loss_focal = self.focal_weight * self.focal_loss(logits_student, target.float())

    # mld loss
    prob_student = torch.sigmoid(logits_student)
    prob_teacher = torch.sigmoid(logits_teacher)
    prob_student = torch.clamp(prob_student, min=self.eps, max=1-self.eps)
    prob_teacher = torch.clamp(prob_teacher, min=self.eps, max=1-self.eps)
    loss_mld = self.kl_div_loss(torch.log(prob_student), prob_teacher) + self.kl_div_loss(torch.log(1 - prob_student), 1 - prob_teacher)
    loss_mld = loss_mld.mean()
    loss_mld = min(epoch / self.warmup, 1.0) * loss_mld

    return loss_focal + loss_mld

模式 1

模型 公开 私有
ECA NFNet L0 (teacher) 0.685730 0.637945
EfficientNet B0 (student) 0.681507 0.638541
Focal + KD 0.674996 0.639671
Focal + DKD 0.657009 0.629826
Focal + MLD 0.695467 0.655356

模式 2

模型 公开 私有
ECA NFNet L1 (teacher) 0.690174 0.630097
EfficientNet B0 (student) 0.661563 0.622148
Focal + KD N/A N/A
Focal + DKD N/A N/A
Focal + MLD 0.704317 0.633722

模式 3

模型 公开 私有
ECA NFNet L1 (teacher) 0.680272 0.658853
EfficientNet B1 (student) 0.661441 0.620913
Focal + KD N/A N/A
Focal + DKD N/A N/A
Focal + MLD 0.693386 0.667936

KD:Logit 蒸馏;DKD:解耦知识蒸馏(Decoupled Knowledge Distillation);MLD:来自多标签知识蒸馏的 $L_{MLD}$。

反思

  • ✅ 采用 MLD 损失的蒸馏取得了显著效果。
  • ✅ 使用多样化的训练设置让我们能够抵御分数波动,并构建出比公开笔记本略准确的方案。
  • ❌ 我们在确定最佳方案时遇到了困难。

附录

将 timm 中的 eca_nfnet_l* 模型转换为 OpenVINO 格式的代码如下:

import copy
from functools import reduce

import torch.nn as nn
import torch.nn.functional as F

import timm

def get_module_by_name(module: nn.Module, access_string: str) -> nn.Module:
    names = access_string.split(sep='.')
    return reduce(getattr, names, module)

def convert_scaled_std_conv2d_to_conv2d(model: nn.Module) -> nn.Module:
    converted_model = copy.deepcopy(model)
    module_table = dict(converted_model.named_modules())
    for name, m in converted_model.named_modules():
        if isinstance(m, timm.layers.std_conv.ScaledStdConv2d):
            scaled_weight = F.batch_norm(
                m.weight.reshape(1, m.out_channels, -1), None, None,
                weight=(m.gain * m.scale).view(-1),
                training=True, momentum=0., eps=m.eps).reshape_as(m.weight).detach()

            bias = m.bias is not None
            conv = nn.Conv2d(m.in_channels, m.out_channels, m.kernel_size,
                             stride=m.stride, padding=m.padding, dilation=m.dilation,
                             groups=m.groups, bias=bias, padding_mode=m.padding_mode)
            conv.weight.data = scaled_weight
            if bias:
                conv.bias.data = m.bias

            # replace ScaledStdConv2d to nn.Conv2d
            parent_name, child_name = name.rsplit('.', 1)
            parent_module = module_table[parent_name]
            setattr(parent_module, child_name, conv)

    return converted_model
同比赛其他方案