第 11 名解决方案 - BirdCLEF 2025

作者： Baiph (Grandmaster)
排名： 第 11 名
发布时间： 2025-06-06
竞赛： BirdCLEF 2025

第 11 名解决方案

感谢组织者举办了如此有趣的比赛，也感谢往届的优秀方案给予我的启发。同时，祝贺所有顶尖获奖者。接下来，我将介绍我的解决方案。

1、训练数据

206 个类别已扩展至 316 个

(1) 2025 年竞赛数据：train_audio 和 train_soundscapes

(2) 从往届竞赛数据中选取 110 个类别（样本量少于 10 的类别）

这些额外类别主要用于构建本地交叉验证（CV）。不幸的是，此 CV 策略无效。然而，这种混合训练提高了我的 leaderboard 分数，所以我保留了此操作

每个类别的最大样本量为 500。小于 10 的类别将进行上采样
仅从部分 CAS 数据中移除人声

2、模型架构

基于 2023 年第 2 名解决方案开源的 Sed 模型。在此基础上我添加了伪标签相关代码

骨干网络为：

tf_efficientnetv2_b3
tf_efficientnetv2_s

所有模型均在 10 秒片段上进行训练。

3、损失函数

使用 CE 损失，与 BCE 损失相比，多个模型均有质的提升（0.83→0.88）。

4、Mel 频谱图参数

{'sample_rate': 32000, 'n_mels': 256, 'image_size': 300, 'f_min': 90, 'f_max': 14000, 'n_fft': 1536, 'normalized': True, 'hop_length': 535}

{'sample_rate': 32000, 'n_mels': 256, 'image_size': 300, 'f_min': 50, 'f_max': 14000, 'n_fft': 1024, 'normalized': True, 'hop_length': 535}

5、伪标签

基于去年第 10 名方案中提到的基于熵的筛选策略选择高质量伪标签

# 示例代码
import numpy as np
epsilon = 1e-12

pre_probs = sub_df[columns].values
print(pre_probs.shape)

probs = pre_probs.copy()
entropies = -np.sum(probs * np.log(probs + epsilon), axis=1)
print(entropies.shape)

# 筛选前 20% 的伪标签
top_10_indices = np.argsort(entropies)[:int(len(entropies) * 0.2)]
top_10_pseudo_probs = probs[top_10_indices]
print(top_10_pseudo_probs.shape)

# 对于每个类别的伪标签，将低于前 92% 的标签值设置为 0
for i in range(top_10_pseudo_probs.shape[1]):
    class_probs = top_10_pseudo_probs[:, i]
    threshold = np.percentile(class_probs, 92)
    top_10_pseudo_probs[class_probs < threshold, i] = 0
print(top_10_pseudo_probs.shape)

真实数据集和伪标签数据集拼接在一起输入模型，伪标签部分的损失权重降低

batch_size = 96，pl_batch_size = 16

6、集成与后处理

集成模型

5 ✖️ v2b3 + 1✖️v2s

公有榜分数：0.920
私有榜分数：0.919

后处理

后处理与 2024 年第 6 名方案相同，提升约 0.001

def smooth_array_general(array, w=[0.1, 0.2, 0.4, 0.2, 0.1]):
    smoothed_array = np.zeros_like(array)
    timesteps = array.shape[0]
    radius = len(w) // 2

    for t in range(timesteps):
        for i, weight in enumerate(w):
            index = t - radius + i
            if index < 0:
                smoothed_array[t] += array[0] * weight
            elif index >= timesteps:
                smoothed_array[t] += array[-1] * weight
            else:
                smoothed_array[t] += array[index] * weight
    for c in range(array.shape[1]):
        smoothed_array[:, c] = smoothed_array[:, c] * 0.8 + smoothed_array[:, c].mean(keepdims=True) * 0.2
    return smoothed_array

7、无效尝试

CNN
rms sample
移除所有人声

11th solution