第 29 名发现与解决方案 - 在噪声数据上训练鲁棒的神经网络

第 29 名发现与解决方案

副标题：在噪声数据上训练鲁棒的神经网络

作者： shanzhong8 及其团队

团队成员： @ajinomoto132, @chenzhenyuan, @atamazian, @larrylin666

排名： 第 29 名 (银牌)

发布日期： 2025-09-28

我想向 Kaggle 和伦敦大学学院表示诚挚的感谢，感谢他们主办了这次比赛。我学到了很多，这真是一次令人兴奋和有益的经历。我还要感谢我的队友 @ajinomoto132 @chenzhenyuan @atamazian @larrylin666 ——我们一起自豪地赢得了本次比赛的银牌。

总结

预处理： 降噪
第一阶段： 传输时间预测和噪声样本平滑
第二阶段： 传输建模和端到端神经网络
后处理： 传输模型的简单 refinements 和神经网络的基于 ML 的 sigma 预测

数据准备

基于官方提交 baseline，我们做了以下修改：我们采用了 5 的分箱因子以保留更丰富的时间信息，并应用背景噪声去除以提高信号质量。

数据增强

我们通过将每个信号与所有可用的校准文件配对来增强数据集，而不仅仅是其名义配对，以增加样本多样性并提高模型鲁棒性。

signal_0 + calibration_0 → sample1 
signal_0 + calibration_1 → sample2 
signal_1 + calibration_0 → sample3 
signal_1 + calibration_1 → sample4

平滑与归一化

def smooth_data_lambda_batch(train_signal, win=3):
    """
    Smooth spectral data with Gaussian filter (batch version).
    
    Args:
        train_signal: numpy array of shape (batch_size, n_channels, n_wavelengths)
        win: window half-size (default=3)
    
    Returns:
        Smoothed signal of shape (batch_size, n_channels, ?)
    """
    batch_size, n_channels, n_wavelengths = train_signal.shape

    def gaussian_kernel(size=7, sigma=1.0):
        x = np.arange(-size//2 + 1, size//2 + 1)
        g = np.exp(-(x**2) / (2*sigma**2))
        return g / g.sum()
    gauss_coefs = gaussian_kernel(size=7, sigma=1.0)

    # Slice region of interest
    q = train_signal[:, :, 40-win:322+win]  # (B, C, slice_len)

    # Normalize each channel by its mean (per batch & channel)
    q = q / q.mean(axis=2, keepdims=True)

    # Reference spectrum: mean across batch & channels
    q_coef = q.mean(axis=(0,1))  # shape (slice_len,)

    # Copy ROI for smoothing
    t_smooth = train_signal[:, :, 40-win:322+win].copy()

    # Loop over wavelengths inside ROI
    for l in range(win, t_smooth.shape[2]-win):
        coefs = q_coef[l-win:l+win+1] / q_coef[l]              # (2*win+1,)
        window = train_signal[:, :, 40-win+l-win:40-win+l+win+1]  # (B, C, 2*win+1)
        
        # Weighted Gaussian smoothing
        t_smooth[:, :, l] = np.tensordot(window * coefs, gauss_coefs, axes=([2],[0]))

    # Trim edges and reverse order
    if win > 0:
        t_smooth = t_smooth[:, :, win:-win][:, :, ::-1]
    else:
        t_smooth = t_smooth[:, :, ::-1]

    # Concatenate first column (wavelength=0) back
    first_col = train_signal[:, :, 0:1]  # (B, C, 1)
    return np.concatenate([first_col, t_smooth], axis=2)

然后我们在此之后进行归一化。