我使用了不同的训练设置(频谱图设置、模型类型、数据增强)进行 logit 蒸馏,并组装了四个不同配置的蒸馏模型集成。
使用来自多标签知识蒸馏的 $L_{MLD}$ 提升了模型准确率。
def forward(
self,
logits_student: torch.Tensor,
logits_teacher: torch.Tensor,
target: torch.Tensor,
epoch: int,
) -> torch.Tensor:
# focal loss
loss_focal = self.focal_weight * self.focal_loss(logits_student, target.float())
# mld loss
prob_student = torch.sigmoid(logits_student)
prob_teacher = torch.sigmoid(logits_teacher)
prob_student = torch.clamp(prob_student, min=self.eps, max=1-self.eps)
prob_teacher = torch.clamp(prob_teacher, min=self.eps, max=1-self.eps)
loss_mld = self.kl_div_loss(torch.log(prob_student), prob_teacher) + self.kl_div_loss(torch.log(1 - prob_student), 1 - prob_teacher)
loss_mld = loss_mld.mean()
loss_mld = min(epoch / self.warmup, 1.0) * loss_mld
return loss_focal + loss_mld
| 模型 | 公开 | 私有 |
|---|---|---|
| ECA NFNet L0 (teacher) | 0.685730 | 0.637945 |
| EfficientNet B0 (student) | 0.681507 | 0.638541 |
| Focal + KD | 0.674996 | 0.639671 |
| Focal + DKD | 0.657009 | 0.629826 |
| Focal + MLD | 0.695467 | 0.655356 |
| 模型 | 公开 | 私有 |
|---|---|---|
| ECA NFNet L1 (teacher) | 0.690174 | 0.630097 |
| EfficientNet B0 (student) | 0.661563 | 0.622148 |
| Focal + KD | N/A | N/A |
| Focal + DKD | N/A | N/A |
| Focal + MLD | 0.704317 | 0.633722 |
| 模型 | 公开 | 私有 |
|---|---|---|
| ECA NFNet L1 (teacher) | 0.680272 | 0.658853 |
| EfficientNet B1 (student) | 0.661441 | 0.620913 |
| Focal + KD | N/A | N/A |
| Focal + DKD | N/A | N/A |
| Focal + MLD | 0.693386 | 0.667936 |
KD:Logit 蒸馏;DKD:解耦知识蒸馏(Decoupled Knowledge Distillation);MLD:来自多标签知识蒸馏的 $L_{MLD}$。
将 timm 中的 eca_nfnet_l* 模型转换为 OpenVINO 格式的代码如下:
import copy
from functools import reduce
import torch.nn as nn
import torch.nn.functional as F
import timm
def get_module_by_name(module: nn.Module, access_string: str) -> nn.Module:
names = access_string.split(sep='.')
return reduce(getattr, names, module)
def convert_scaled_std_conv2d_to_conv2d(model: nn.Module) -> nn.Module:
converted_model = copy.deepcopy(model)
module_table = dict(converted_model.named_modules())
for name, m in converted_model.named_modules():
if isinstance(m, timm.layers.std_conv.ScaledStdConv2d):
scaled_weight = F.batch_norm(
m.weight.reshape(1, m.out_channels, -1), None, None,
weight=(m.gain * m.scale).view(-1),
training=True, momentum=0., eps=m.eps).reshape_as(m.weight).detach()
bias = m.bias is not None
conv = nn.Conv2d(m.in_channels, m.out_channels, m.kernel_size,
stride=m.stride, padding=m.padding, dilation=m.dilation,
groups=m.groups, bias=bias, padding_mode=m.padding_mode)
conv.weight.data = scaled_weight
if bias:
conv.bias.data = m.bias
# replace ScaledStdConv2d to nn.Conv2d
parent_name, child_name = name.rsplit('.', 1)
parent_module = module_table[parent_name]
setattr(parent_module, child_name, conv)
return converted_model