TL;DR

MaxVit tiny模型
xy、xz、yz三个轴向推理
强正则化
概率阈值

解决方案

模型

MaxVit tiny

class SenUNetStem(nn.Module):
    def __init__(self, encoder_name="resnest26d",output_stride=32,
                 encoder_depth=5 , in_chans=1,
        decoder_use_batchnorm: bool = True,
        decoder_channels: List[int] = (256, 128, 64, 32, 16),
        decoder_attention_type: Optional[str] = None, classes=1, activation=None):
        super(SenUNetStem, self).__init__()
        kwargs = dict(
            in_chans=in_chans,
            features_only=True,
            # output_stride=output_stride,
            pretrained=True,
            out_indices=tuple(range(encoder_depth)),
        )
        self.conv_stem = Conv2dReLU(in_chans, 16, 3, use_layernorm=False, padding=1)
        self.encoder = timm.create_model(encoder_name, **kwargs)
        self._out_channels = [
            32,
        ] + self.encoder.feature_info.channels()

        self.decoder = UnetDecoder(
            encoder_channels=self._out_channels,
            decoder_channels=decoder_channels,
            n_blocks=encoder_depth,
            use_batchnorm=decoder_use_batchnorm,
            center=True if encoder_name.startswith("vgg") else False,
            attention_type=decoder_attention_type,
        )

        self.segmentation_head = SegmentationHead(
            in_channels=decoder_channels[-1] + 16,
            out_channels=classes,
            activation=activation,
            kernel_size=3,
        )

        self.n_time = n_time
        self.pickup_index = pickup_index

    def forward(self, x):
        B, C, H, W = x.shape
        h = (H//32)*32
        w = (W//32)*32
        x = x[:,:,:h,:w]
        stem = self.conv_stem(x)
        features = self.encoder(x)        
        features = [
            stem,
        ] + features

        decoder_output = self.decoder(*features)
        masks = self.segmentation_head(decoder_output)
        masks = F.pad(masks,[0,W-w,0,H-h,0,0,0,0], mode='constant', value=0)
        
        return masks[:,0]

SenUNetStem(
            encoder_name="maxvit_tiny_tf_512.in1k",
            classes=1,
            activation=None,
        )

数据集

kidney1和kidney3数据集

训练技巧

我专注于正则化技巧，因为这次竞赛的CV分数不稳定，公开榜单与最终结果关联性不强。而且主办方分享了公开/私有榜单的图像信息，我认为这预示着不稳定性。

EMA（指数移动平均）
50个epoch
AdamW优化器（权重衰减1e-2）
CutMix（前25个epoch）
MixUp（前25个epoch）
DiceLoss（平滑因子0.1）

强数据增强

train_aug = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
    A.RandomBrightness(limit=0.1, p=0.7),
    A.OneOf([
            A.GaussNoise(var_limit=[10, 50]),
            A.GaussianBlur(),
            A.MotionBlur(),
            A.MedianBlur(blur_limit=3),
            ], p=0.4),
    A.OneOf([
        A.GridDistortion(num_steps=5, distort_limit=0.3, p=1.0),
        A.OpticalDistortion(distort_limit=1., p=1.0)
    ],p=0.2),
    A.ShiftScaleRotate(p=0.7, scale_limit=0.5, shift_limit=0.2, rotate_limit=30),
    A.CoarseDropout(max_holes=1, max_height=0.25, max_width=0.25),
    ToTensorV2(transpose_mask=True)
])

512尺寸裁剪

推理

在xy、xz、yz三个轴向上进行推理，使用512尺寸裁剪，步长为256。

后处理

使用概率阈值（sigmoid输出）。我没有使用许多公开笔记本和过去分割竞赛（如Volcano）中常用的百分位数方法，因为我在本地CV验证中发现百分位数阈值不稳定。

未生效的方法

更大的模型（maxvit base、small）
大尺寸推理（1024），512对这个竞赛来说已经足够
Rotate90
预训练其他体积数据（kidney_2/kidney_1_volumes）

查看完整代码 notebook https://www.kaggle.com/code/tereka/simpleunet-xy-xz-yz-v2-nbp-b749ff/notebook

9th Place Solution

TL;DR

解决方案

模型

数据集

训练技巧

推理

后处理

未生效的方法

同比赛其他方案