第 4 名解决方案 - BirdCLEF 2025

第 4 名解决方案

作者: Dylan Liu (Master)
比赛: BirdCLEF 2025
发布日期: 2025-06-11
排名: 第 4 名

首先，这是我第一次在 Kaggle 上获奖，非常感谢组织者和所有参与者。

我的解决方案是一个 SED（声音事件检测）解决方案，灵感来自 BirdCLEF 2023 第 2 名解决方案，结合了自定义的 soft AUC loss 和半监督学习。

主要概念：soft AUC loss

我一直认为最好的损失函数应该是 metric 本身，所以我尝试寻找 AUC loss 并找到了以下实现：

class AUCLoss(nn.Module):
    def __init__(self, margin=1.0, pos_weight=1.0, neg_weight=1.0):
        super().__init__()
        self.margin = margin
        self.pos_weight = pos_weight
        self.neg_weight = neg_weight

    def forward(self, preds, labels, sample_weights=None):
        pos_preds = preds[labels == 1]
        neg_preds = preds[labels == 0]
        
        if len(pos_preds) == 0 or len(neg_preds) == 0:
            return torch.tensor(0.0, device=preds.device)
        
        if sample_weights is not None:
            sample_weights = torch.stack([sample_weights]*labels.shape[1], dim=1)
            pos_weights = sample_weights[labels == 1]  # [正样本权重]
            neg_weights = sample_weights[labels == 0]  # [负样本权重]
        else:
            pos_weights = torch.ones_like(pos_preds) * self.pos_weight
            neg_weights = torch.ones_like(neg_preds) * self.neg_weight
        
        diff = pos_preds.unsqueeze(1) - neg_preds.unsqueeze(0)  # [正样本数，负样本数]
        loss_matrix = torch.log(1 + torch.exp(-diff * self.margin))  # [正样本数，负样本数]
        
        weighted_loss = loss_matrix * pos_weights.unsqueeze(1) * neg_weights.unsqueeze(0)
        
        return weighted_loss.mean()

这个 AUC loss 似乎非常抗过拟合。在所有实验中，使用交叉熵损失训练的模型的 CV 分数显著优于使用 soft AUC loss 训练的模型，但 LB 分数却显著较差。

这个 AUC loss 存在一个问题：它不支持像交叉熵损失那样的软标签。对于知识蒸馏和半监督学习，我需要一个支持软标签的损失函数，所以我对上述 AUC loss 做了一些修改：

class SoftAUCLoss(nn.Module):
    def __init__(self, margin=1.0, pos_weight=1.0, neg_weight=1.0):
        super().__init__()
        self.margin = margin
        self.pos_weight = pos_weight
        self.neg_weight = neg_weight

    def forward(self, preds, labels, sample_weights=None):
        pos_preds = preds[labels>0.5]
        neg_preds = preds[labels<0.5]
        pos_labels = labels[labels>0.5]
        neg_labels = labels[labels<0.5]
        
        if len(pos_preds) == 0 or len(neg_preds) == 0:
            return torch.tensor(0.0, device=preds.device)

        pos_weights = torch.ones_like(pos_preds) * self.pos_weight * (pos_labels-0.5)
        neg_weights = torch.ones_like(neg_preds) * self.neg_weight * (0.5-neg_labels)
        if sample_weights is not None:
            sample_weights = torch.stack([sample_weights]*labels.shape[1], dim=1)
            pos_weights = pos_weights * sample_weights
            neg_weights = neg_weights * sample_weights
           
        diff = pos_preds.unsqueeze(1) - neg_preds.unsqueeze(0)  # [正样本数，负样本数]
        loss_matrix = torch.log(1 + torch.exp(-diff * self.margin))  # [正样本数，负样本数]
        
        weighted_loss = loss_matrix * pos_weights.unsqueeze(1) * neg_weights.unsqueeze(0)
        
        return weighted_loss.mean()

这个 soft AUC loss 加上半监督学习，将我单个 tf_efficientnetv2_b0 模型的 LB 分数从 0.850 提高到了 0.901。更重要的是，我在私有 LB 上从第 11 名提升到第 4 名，可能归功于使用了这个损失函数。

其他有帮助的方法

半监督学习：标注模型是 10 个 SED 模型，包括 efficientnet_b0-b4, efficientnetv2_b0-b3 和 efficientnetv2_s，使用前 10 秒音频数据训练。
更小的 hop_length (64) 和更大的 n_mels (256)。
Audio mixup 增强：将两段音频相加作为新音频，并取它们标签的最大值作为新标签。这种增强并没有直接提高我的模型性能，但为了增加最终解决方案的多样性，我仍然在一些模型的训练中添加了它。

没有帮助或效果变差的方法

任何类型的预训练。
知识蒸馏。
efficientnet 以外的模型。
除了 2D 批归一化以外的数据归一化方法。

最终模型

16 个模型，包括 efficientnet_lite0-4, efficientnet_b2-3, efficientnetv2_b2-3 和 efficientnetv2_s。训练 17-25 个 epoch，学习率 5e-4。使用了 3 种 mel 频谱图参数和 2 种数据增强。数据选取了前 10 秒数据和随机 10 秒数据。

代码

GitHub 仓库 https://github.com/dylanliu2/BirdCLEF2025-4th-place-solution

4th place solution

第 4 名解决方案

主要概念：soft AUC loss

其他有帮助的方法

没有帮助或效果变差的方法

最终模型

代码

同比赛其他方案