返回列表

10th Place Solution

545. IceCube - Neutrinos in Deep Ice | icecube-neutrinos-in-deep-ice

开始: 2023-01-19 结束: 2023-04-19 物理与天文 数据算法赛

第十名解决方案

作者:CroDoc | 排名:第10名 | 发布日期:2023年4月20日

过去几周非常紧张,但最终我非常高兴能够获得第一枚个人金牌。比赛整体非常棒!看到本地验证和LB分数之间的稳定性真是太好了(即使只使用了1.5%的数据作为验证集)。

模型训练过程采用了训练-验证拆分,其中批次11-660用于训练批次1-10用于验证。使用爬山法Nelder-Mead优化技术形成了一个由8个模型组成的集成,提交时间大约为3到3.5小时。集成中包含更多模型会得到更好的分数;然而,由于我在比赛结束前几个小时才提交最佳融合结果,因此没有足够的时间对更大的集成进行推理。相同的模型架构被多次训练并融合。

集成模型的本地验证分数为0.97747,公开和私有LB分数为0.976

模型输入数据每个事件包含以下9个特征:sensor_x, sensor_y, sensor_z, time, charge, auxiliary, is_main_sensor, is_deep_veto, 和 is_deep_core

最终提交选择了四种模型架构。

模型1

验证分数1:1.0016
验证分数2:1.0005
损失函数:CrossEntropyLoss

数据预处理步骤:

  • sensor_x, sensor_y, 和 sensor_z 除以 600
  • time 除以 1000 并减去其最小值
  • charge 除以 300
class Model1(pl.LightningModule):
    def __init__(self):

        self.bin_num = 31

        self.gru = nn.GRU(9, 192, num_layers=3, dropout=0.0, batch_first=True, bidirectional=True)
        self.fc1 = nn.Sequential(nn.Linear(384, 256), nn.ReLU())
        self.fc2 = nn.Linear(256, bin_num*bin_num)

    def forward(self, x, batch_sizes):

        batch_sizes = batch_sizes.cpu()
        x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
        x, _ = self.gru(x)
        x, _ = pad_packed_sequence(x, batch_first=True)

        x = x.sum(dim=1)
        x = x.div(batch_sizes.unsqueeze(-1).cuda())
        
        x = self.fc1(x)
        x = self.fc2(x)

        return x

输出 bins 使用 代码 生成(来自 @rsmits)。

模型2-4

损失函数:VonMisesFisher3DLoss

数据预处理步骤:

  • sensor_x, sensor_y, 和 sensor_z 除以 500
  • time 通过减去 1.0e04 并除以 3.0e4 进行缩放
  • charge 使用以10为底的对数转换,然后除以 3.0

验证分数3:0.9847
验证分数4:0.9859

class Model2(pl.LightningModule):
    def __init__(self):

        self.bilstm = nn.LSTM(9, 256, num_layers=3, dropout=0.2, batch_first=True, bidirectional=True)

        self.fc1 = nn.Sequential(nn.Linear(512, 256), nn.ReLU())
        self.dropout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(256, 3)

    def forward(self, x, batch_sizes):

        batch_sizes = batch_sizes.cpu()
        x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
        x, _ = self.bilstm(x)
        x, _ = pad_packed_sequence(x, batch_first=True)

        x = x.sum(dim=1)
        x = x.div(batch_sizes.unsqueeze(-1).cuda())
        
        x = self.fc1(x)
        x = self.dropout(x)
        pred = self.fc2(x)

        kappa = pred.norm(dim=1, p=2) + 1e-8
        pred_x = pred[:, 0] / kappa
        pred_y = pred[:, 1] / kappa
        pred_z = pred[:, 2] / kappa
        pred = torch.stack([pred_x, pred_y, pred_z, kappa], dim=1)

        return pred

验证分数5:0.9872
验证分数6:0.9887

class Model3(pl.LightningModule):
    def __init__(
        self
    ):
        super().__init__()

        self.embedding = nn.Linear(9, 512)
        self.bilstm = nn.LSTM(512, 256, num_layers=3, dropout=0.0, batch_first=True, bidirectional=True)

        self.fc1 = nn.Sequential(nn.Linear(512, 256), nn.ReLU())
        self.fc2 = nn.Linear(256, 3)

    def forward(self, x, batch_sizes):

        x = self.embedding(x)

        batch_sizes = batch_sizes.cpu()
        x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
        x, _ = self.bilstm(x)
        x, _ = pad_packed_sequence(x, batch_first=True)

        x = x.sum(dim=1)
        x = x.div(batch_sizes.unsqueeze(-1).cuda())
        
        x = self.fc1(x)
        pred = self.fc2(x)

        kappa = pred.norm(dim=1, p=2) + 1e-8
        pred_x = pred[:, 0] / kappa
        pred_y = pred[:, 1] / kappa
        pred_z = pred[:, 2] / kappa
        pred = torch.stack([pred_x, pred_y, pred_z, kappa], dim=1)

        return pred

验证分数7:0.9842
验证分数8:0.9841

class Model4(pl.LightningModule):
    def __init__(self):

        self.embedding = nn.Linear(9, 192)

        self.bilstm = nn.LSTM(192, 96, num_layers=3, dropout=0.0, batch_first=True, bidirectional=True)

        self.fc1 = nn.Sequential(nn.Linear(lstm_units, 256), nn.ReLU())
        self.fc2 = nn.Linear(256, 3)

    def forward(self, x, batch_sizes):

        batch_sizes = batch_sizes.cpu()

        x = self.embedding(x)

        x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
        x, _ = self.bilstm(x)
        x, _ = pad_packed_sequence(x, batch_first=True)

        x = x.sum(dim=1)
        x = x.div(batch_sizes.unsqueeze(-1).cuda())
        
        x = self.fc1(x)
        pred = self.fc2(x)

        kappa = pred.norm(dim=1, p=2) + 1e-8
        pred_x = pred[:, 0] / kappa
        pred_y = pred[:, 1] / kappa
        pred_z = pred[:, 2] / kappa
        pred = torch.stack([pred_x, pred_y, pred_z, kappa], dim=1)

        return pred

超参数:

  • 优化器:Adam
  • 调度器:CosineAnnealingLR
  • 批量大小:2048
  • 最大脉冲数:128
  • 最大学习率:1e-35e-4
  • 最小学习率:1e-6
  • 预热步数:2000
  • 轮数:10-15(可能还有几轮额外的微调)

比赛使用的深度学习库是 PyTorch (Lightning),它比 TensorFlow 提供了更好的结果。虽然使用 TensorFlow 融合多个模型时,代码产生了一些我无法调试的意外错误。切换到 PyTorch 更加简单。

我尝试了几种不同的 transformer 架构,但没有足够的时间将其完成。我重新训练了一些类似于 graphnet 的模型(代码 来自 @amoshuangyc 非常有用),但它并没有提升最终的集成结果。

同比赛其他方案