545. IceCube - Neutrinos in Deep Ice | icecube-neutrinos-in-deep-ice
过去几周非常紧张,但最终我非常高兴能够获得第一枚个人金牌。比赛整体非常棒!看到本地验证和LB分数之间的稳定性真是太好了(即使只使用了1.5%的数据作为验证集)。
模型训练过程采用了训练-验证拆分,其中批次11-660用于训练,批次1-10用于验证。使用爬山法或Nelder-Mead优化技术形成了一个由8个模型组成的集成,提交时间大约为3到3.5小时。集成中包含更多模型会得到更好的分数;然而,由于我在比赛结束前几个小时才提交最佳融合结果,因此没有足够的时间对更大的集成进行推理。相同的模型架构被多次训练并融合。
集成模型的本地验证分数为0.97747,公开和私有LB分数为0.976。
模型输入数据每个事件包含以下9个特征:sensor_x, sensor_y, sensor_z, time, charge, auxiliary, is_main_sensor, is_deep_veto, 和 is_deep_core。
最终提交选择了四种模型架构。
验证分数1:1.0016
验证分数2:1.0005
损失函数:CrossEntropyLoss
数据预处理步骤:
class Model1(pl.LightningModule):
def __init__(self):
self.bin_num = 31
self.gru = nn.GRU(9, 192, num_layers=3, dropout=0.0, batch_first=True, bidirectional=True)
self.fc1 = nn.Sequential(nn.Linear(384, 256), nn.ReLU())
self.fc2 = nn.Linear(256, bin_num*bin_num)
def forward(self, x, batch_sizes):
batch_sizes = batch_sizes.cpu()
x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
x, _ = self.gru(x)
x, _ = pad_packed_sequence(x, batch_first=True)
x = x.sum(dim=1)
x = x.div(batch_sizes.unsqueeze(-1).cuda())
x = self.fc1(x)
x = self.fc2(x)
return x
损失函数:VonMisesFisher3DLoss
数据预处理步骤:
验证分数3:0.9847
验证分数4:0.9859
class Model2(pl.LightningModule):
def __init__(self):
self.bilstm = nn.LSTM(9, 256, num_layers=3, dropout=0.2, batch_first=True, bidirectional=True)
self.fc1 = nn.Sequential(nn.Linear(512, 256), nn.ReLU())
self.dropout = nn.Dropout(0.2)
self.fc2 = nn.Linear(256, 3)
def forward(self, x, batch_sizes):
batch_sizes = batch_sizes.cpu()
x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
x, _ = self.bilstm(x)
x, _ = pad_packed_sequence(x, batch_first=True)
x = x.sum(dim=1)
x = x.div(batch_sizes.unsqueeze(-1).cuda())
x = self.fc1(x)
x = self.dropout(x)
pred = self.fc2(x)
kappa = pred.norm(dim=1, p=2) + 1e-8
pred_x = pred[:, 0] / kappa
pred_y = pred[:, 1] / kappa
pred_z = pred[:, 2] / kappa
pred = torch.stack([pred_x, pred_y, pred_z, kappa], dim=1)
return pred
验证分数5:0.9872
验证分数6:0.9887
class Model3(pl.LightningModule):
def __init__(
self
):
super().__init__()
self.embedding = nn.Linear(9, 512)
self.bilstm = nn.LSTM(512, 256, num_layers=3, dropout=0.0, batch_first=True, bidirectional=True)
self.fc1 = nn.Sequential(nn.Linear(512, 256), nn.ReLU())
self.fc2 = nn.Linear(256, 3)
def forward(self, x, batch_sizes):
x = self.embedding(x)
batch_sizes = batch_sizes.cpu()
x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
x, _ = self.bilstm(x)
x, _ = pad_packed_sequence(x, batch_first=True)
x = x.sum(dim=1)
x = x.div(batch_sizes.unsqueeze(-1).cuda())
x = self.fc1(x)
pred = self.fc2(x)
kappa = pred.norm(dim=1, p=2) + 1e-8
pred_x = pred[:, 0] / kappa
pred_y = pred[:, 1] / kappa
pred_z = pred[:, 2] / kappa
pred = torch.stack([pred_x, pred_y, pred_z, kappa], dim=1)
return pred
验证分数7:0.9842
验证分数8:0.9841
class Model4(pl.LightningModule):
def __init__(self):
self.embedding = nn.Linear(9, 192)
self.bilstm = nn.LSTM(192, 96, num_layers=3, dropout=0.0, batch_first=True, bidirectional=True)
self.fc1 = nn.Sequential(nn.Linear(lstm_units, 256), nn.ReLU())
self.fc2 = nn.Linear(256, 3)
def forward(self, x, batch_sizes):
batch_sizes = batch_sizes.cpu()
x = self.embedding(x)
x = pack_padded_sequence(x, batch_sizes, batch_first=True, enforce_sorted=False)
x, _ = self.bilstm(x)
x, _ = pad_packed_sequence(x, batch_first=True)
x = x.sum(dim=1)
x = x.div(batch_sizes.unsqueeze(-1).cuda())
x = self.fc1(x)
pred = self.fc2(x)
kappa = pred.norm(dim=1, p=2) + 1e-8
pred_x = pred[:, 0] / kappa
pred_y = pred[:, 1] / kappa
pred_z = pred[:, 2] / kappa
pred = torch.stack([pred_x, pred_y, pred_z, kappa], dim=1)
return pred
比赛使用的深度学习库是 PyTorch (Lightning),它比 TensorFlow 提供了更好的结果。虽然使用 TensorFlow 融合多个模型时,代码产生了一些我无法调试的意外错误。切换到 PyTorch 更加简单。
我尝试了几种不同的 transformer 架构,但没有足够的时间将其完成。我重新训练了一些类似于 graphnet 的模型(代码 来自 @amoshuangyc 非常有用),但它并没有提升最终的集成结果。