第11名解决方案 - BirdCLEF 2024

作者：lhwcv (Grandmaster)
团队成员：Donghui Zhang、HZM、ITK8191、lhwcv、zby
发布日期：2024-06-11
比赛排名：第11名

第11名解决方案

我非常感谢组织者。这次比赛对我来说非常有挑战性，幸运地，我们获得了最后一枚金牌名次。

简要汇报一下我们的方案，与之前的差异不大：

数据

数据使用了 BirdCLEF2021、2022、2023、2024（本次比赛）以及 ff1010bird_nocall，未采用额外的 mp3。当加入额外的 mp3 时，LB 反而下降，这让我觉得有些奇怪。

模型

我们采用了去年排名第4的模型，但将骨干网络替换为 EfficientNet B0/B1/B2。

特征

使用 log 梅尔频谱，参数如下：

CFG.n_mels = 128
CFG.fmin = 20
CFG.fmax = 16000
CFG.n_fft = 2048
CFG.hop_length = 512
CFG.sample_rate = 32000
CFG.secondary_coef = 1.0

数据增强

强数据增强提升了我的 CV，但导致 LB 下降。

A.HorizontalFlip(p=0.5),
A.OneOf([
    A.Cutout(max_h_size=5, max_w_size=16),
    A.CoarseDropout(max_holes=4),
], p=0.5),

以及 MixupV2：

class MixupV2(nn.Module):
    def __init__(self, mix_range=(0.3, 0.7), add_label=True):
        super(MixupV2, self).__init__()
        self.distribution = torch.distributions.Uniform(low=mix_range[0], high=mix_range[1])
        self.add_label = add_label

    def forward(self, X, Y, weight=None, teacher_preds=None):
        bs = X.shape[0]
        n_dims = len(X.shape)
        perm = torch.randperm(bs)
        coeffs = self.distribution.rsample(torch.Size((bs,))).to(X.device)

        if n_dims == 2:
            X = coeffs.view(-1, 1) * X + (1 - coeffs.view(-1, 1)) * X[perm]
        elif n_dims == 3:
            X = coeffs.view(-1, 1, 1) * X + (1 - coeffs.view(-1, 1, 1)) * X[perm]
        else:
            X = coeffs.view(-1, 1, 1, 1) * X + (1 - coeffs.view(-1, 1, 1, 1)) * X[perm]

        if self.add_label:
            Y = Y + Y[perm]
            Y = torch.clamp(Y, 0, 1.0)
        else:
            Y = coeffs.view(-1, 1) * Y + (1 - coeffs.view(-1, 1)) * Y[perm]

        if weight is None:
            return X, Y
        else:
            weight = coeffs.view(-1) * weight + (1 - coeffs.view(-1)) * weight[perm]
            if self.add_label:
                teacher_preds = teacher_preds + teacher_preds[perm]
                teacher_preds = torch.clamp(teacher_preds, 0, 1.0)
            else:
                teacher_preds = coeffs.view(-1, 1) * teacher_preds + (1 - coeffs.view(-1, 1)) * teacher_preds[perm]
            return X, Y, weight, teacher_preds

训练

首先，我们在 BirdCLEF2021、2022、2023 数据上对前15秒进行预训练，但剔除出现在 2024 数据中的类别，以避免泄露。随后，使用 2024 数据的前 5 秒训练最终模型。这是我们发现最稳定的训练方案。

后处理

对公开榜和私密榜都有轻微提升，我的队友之后会进一步补充。

尝试但未成功的方案

Hubert 与蒸馏
对 BirdNet 进行蒸馏或微调
伪标签
根据得分在音频中选取多个片段（原因不明，非常奇怪）

模型表现

我们单个最佳模型是基于 B2 的模型，公开榜 0.68，私密榜 0.67。最终我们选取了 6 个模型的集成，公开榜 0.7，私密榜 0.67（未进行后处理）。

11st solution

第11名解决方案

数据

模型

特征

数据增强

训练

后处理

尝试但未成功的方案

模型表现

相关链接

同比赛其他方案