第 6 名解决方案

首先，我想感谢主办方组织了如此有趣的比赛，感谢 Kaggle 团队如此顺畅地协助。同时，祝贺所有顶尖获奖者。

我的解决方案基于去年 @honglihang 的第 2 名解决方案。我非常感谢他。他的去年解决方案与我的主要区别在于伪标签（pseudo-label）和后处理（post-processing）。

交叉验证策略 (CV Strategy)

交叉验证 (CV) 对我帮助不大，所以我通过检查公共排行榜 (Public LB) 来调整模型。为了避免过拟合公共 LB，我减少了提交次数，并放弃了对次要超参数的调整。

最终的提交之一是我在公共 LB 上表现最好的模型。使用额外的训练数据使得公共 LB 下降了约 0.02，但这违反直觉，我预计这在私有 LB (Private LB) 上会反转，所以我选择了它作为另一个提交。

伪标签 (Pseudo Labeling)

model.forward() 方法一起输入几段 5 秒的音频，但在使用伪标签数据时计算损失不进行聚合，而在比赛数据中使用 max 进行聚合。通过阈值进行硬标签 (Hard labeling) 并没有提高分数。

这在集成之前在公共 LB 上提高了 0.02~0.03，在私有 LB 上提高了 0.015~0.04。

""" model.forward() """
total_loss = 0
for i in range(bs):
    start = i * self.factor
    end = (i + 1) * self.factor

    if is_pseudo[i] == 0:
        this_logits = torch.max(logits[start:end], dim=0, keepdim=True).values
        this_y = torch.max(y[i], dim=0, keepdim=True).values
        this_weight = torch.max(weight[i], dim=0, keepdim=True).values

        loss = self.loss_function(this_logits, this_y)
        if self.loss == "ce":  # loss: (n_sample, )
            loss = (loss * this_weight) / weight.sum()
        elif self.loss == "bce":  # loss: (n_sample, n_class)
            loss = (loss.sum(dim=1) * this_weight) / weight.sum()
        else:
            raise NotImplementedError
        loss = loss.sum() * self.factor
        total_loss += loss
    else:
        this_logits = logits[start:end]
        this_y = y[i]
        this_weight = weight[i]

        loss = self.loss_function(this_logits, this_y)
        if self.loss == "ce":  # loss: (n_sample, )
            loss = (loss * this_weight) / weight.sum()
        elif self.loss == "bce":  # loss: (n_sample, n_class)
            loss = (loss.sum(dim=1) * this_weight) / weight.sum()
        else:
            raise NotImplementedError
        loss = loss.sum() / self.factor
        total_loss += loss

后处理 (Post Processing)

计算权重为 [0.1, 0.2, 0.4, 0.2, 0.1] 的移动平均线，最后为每个物种添加全局平均值 * 0.2。

这在两个最终提交中一致地将公共和私有 LB 提高了 0.014~0.016。

def smooth_array_general(array, w=[0.1, 0.2, 0.4, 0.2, 0.1]):
    smoothed_array = np.zeros_like(array)
    timesteps = array.shape[0]
    radius = len(w) // 2

    for t in range(timesteps):
        for i, weight in enumerate(w):
            index = t - radius + i
            if index < 0: 
                smoothed_array[t] += array[0] * weight
            elif index >= timesteps: 
                smoothed_array[t] += array[-1] * weight
            else:
                smoothed_array[t] += array[index] * weight
    for c in range(array.shape[1]):
        smoothed_array[:, c] = smoothed_array[:, c] * 0.8 + smoothed_array[:, c].mean(keepdims=True) * 0.2
    return smoothed_array

其他设置 (Other Setting)

重采样 (resampling) 成为了瓶颈，因此预处理采样到 32 kHz 并将其存储在磁盘上有助于加快训练阶段。
按 RMS 采样 (+0.010)
- RMS 采样 > 使用前 5 秒 > 随机采样
骨干网络 (backbone)
- resnet18d, resnet34d 和 efficientnetv2s

感谢阅读

6th Place Solution

第 6 名解决方案

交叉验证策略 (CV Strategy)

伪标签 (Pseudo Labeling)

后处理 (Post Processing)

其他设置 (Other Setting)

同比赛其他方案