22th place solution

第22名方案

作者： Zichen Wang
比赛排名： 第22名

首先，非常感谢所有为这次比赛做出贡献的人，我在这次比赛中学到了很多，非常感谢大家！我的解决方案并没有太多的创新，但我想分享一些在实践中提高性能的技巧。

预处理

dicom转png
主要参考了以下资源：
3小时 tensorRT NextVIT 示例
 使用 nvJPEG2000 轻松加载图像（快5倍）
如何将 DICOM 图像处理为 PNG
感兴趣区域 (ROI)
使用 OpenCV 的 findContours() 提取 ROI，参考了使用 OpenCV 进行 ROI 提取

数据增强

增强方式包括：水平翻转、垂直翻转、平移和缩放。
我们保持长宽比不变，并且只进行一次缩小调整，以避免失真或模糊。（变换前的图像是原始分辨率的 png 文件。）
Mix up（以类别最大值作为目标）。

训练设置

数据
分辨率：1536x960
Batch size：8
正样本图像上采样 x6（由于上采样，使用了较高的 dropout 率）

模型

from timm.models import efficientnet
backbone = efficientnet.tf_efficientnetv2_b2(drop_rate=0.4, drop_path_rate=0.4)

优化器

optimizer = AdamW(param_group, lr=1e-4, betas=(0.9, 0.935), weight_decay=1e-2)

损失函数
BCEWithLogitsLoss()

软目标（或自蒸馏？）
这可能有助于模型减少对数据集噪声的关注，即那些困难的正样本和负样本。

logits = ddp_model(images)
with torch.no_grad():
    lam = 0.7
    targets = lam*targets + (1 - lam) * logits.sigmoid()
loss = loss_func(logits, targets)

SWA (随机权重平均)
参考博客 Stochastic Weight Averaging in PyTorch 执行 SWA。

from torchcontrib.optim import SWA
opt = SWA(optimizer, swa_start=3000, swa_freq=100, swa_lr=None)

提交

最终的提交是基于 8 个 tf_efficientnetv2_b2 模型的集成，这些模型在不同的设置下训练，例如是否使用 mix up、分辨率是 1536x960 还是 1024x640、是否使用 focal loss 等。LB（Leaderboard）上的 PF1 分数曲线相当稳定，并且具有很宽的最大值平坦区域。

数据增强代码

from torchvision import transforms
from torchvision.transforms import functional as F
from torchvision.transforms.functional import InterpolationMode

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.5),
    ScaleTransform(size=1536, scale=(0.85, 1.13), transl=0.1),
])

class ScaleTransform(torch.nn.Module):
    def __init__(self, size=1024, scale=(0.7, 1.3), transl=0.1, train=True):
        super(ScaleTransform, self).__init__()
        self.size = size
        self.scale = scale
        self.transl = transl
        self.train = train
        self.interpolation = InterpolationMode.BILINEAR

第22名方案

预处理

数据增强

训练设置

提交

数据增强代码

同比赛其他方案