第 6 名解决方案

感谢组织者举办这场优秀的比赛，也感谢所有在比赛期间分享宝贵见解的参与者。

我深受过去比赛顶尖解决方案的启发。感谢分享他们方法的每一个人。
在这篇文章中，我将主要强调我的方法中的不同之处。

模型 / 损失函数

我使用了如下所示的 SED 风格模型：

class AttBlockV2(nn.Module):
    def __init__(self, in_features: int, out_features: int, activation="sigmoid"):
        ...
        self.activation = activation
        ...

    def forward(self, x):
        norm_att = torch.softmax(torch.tanh(self.att(x)), dim=-1)
        cla = self.nonlinear_transform(self.cla(x))
        x = (norm_att * cla).sum(2)
        return x, norm_att, cla

   def nonlinear_transform(self, x):
        if self.activation == "linear":
            return x
        elif self.activation == "sigmoid":
            return torch.sigmoid(x)
            
class BirdModel(nn.Module):
    def __init__(self, cfg, pretrained: bool = True):
        ...
        self.encoder = timm.create_model(
            cfg.backbone,
            pretrained=cfg.pretrained,
            num_classes=0,
            global_pool="",
            in_chans=cfg.in_chans,
            drop_path_rate=0.2,
            drop_rate=0.5,
        )
        ...
        self.att_block = AttBlockV2(in_features, self.num_classes, activation="sigmoid")
        ...
       
    def forward(self, x, y=None):
        ...
        clipwise_output, norm_att, segmentwise_output = self.att_block(x)
        segmentwise_logit = self.att_block.cla(x).transpose(1, 2)
        if self.training:
            return clipwise_output, segmentwise_logit.max(1)[0], y
        else:
            return clipwise_output, segmentwise_logit.max(1)[0]

在训练期间，我对 clipwise_output 和 segmentwise_logit.max(1)[0] 都应用了 nn.BCEWithLogitsLoss。
虽然 clipwise_output 在损失计算前通过了 sigmoid（使得 BCEWithLogitsLoss 在技术上不太合适），但这种设置显著提高了我的公共分数。

当使用 timm/tf_efficientnet_b3.ns_jft_in1k 作为骨干网络并提交 clipwise_output（不使用 train_soundscape）时，我达到了 0.900 的公共分数和 0.908 的私有分数。在这个阶段，clipwise_output 比 segmentwise_logit 给出了更好的结果。

伪标签 (Pseudo Labeling)

我向 train_soundscapes 添加了伪标签，并在几个训练周期中将它们作为训练数据包含在内。
在第一轮中，我仅使用单个模型 (timm/tf_efficientnet_b3.ns_jft_in1k) 的 clipwise_output 来生成伪标签。
从第二轮开始，我使用多个模型的 segmentwise_logit.max(1)[0] 输出的集成来进行伪标签。
- 由于 clipwise_output 是用 BCEWithLogitsLoss somewhat 不自然地训练的，其值太小，当在伪标签中重用时无法帮助提高分数。
- 使用 segmentwise_logit.max(1)[0] 进行伪标签导致了更高的公共分数。
用于生成伪标签的模型包括：
- timm/tf_efficientnet_b3.ns_jft_in1k
- timm/tf_efficientnet_b5.ns_jft_in1k
- timm/tf_efficientnetv2_b3.in21k
最后，我在组合的训练数据和伪标签上训练了以下两个模型，并将它们用于最终提交：
- timm/tf_efficientnet_b3.ns_jft_in1k
- timm/tf_efficientnetv2_b3.in21k

分数

公共分数：0.928
私有分数：0.923

6th place solution

第 6 名解决方案

模型 / 损失函数

伪标签 (Pseudo Labeling)

分数

同比赛其他方案