10th place solution

第10名解决方案 - Vesuvius Challenge 墨水检测

作者：Feng Qilong（Kaggle Master）
比赛排名：第10名
发布日期：2023年6月15日

摘要

使用 ink-id InkClassifier3DCNN 预训练3D编码器，采样尺寸为64，步长为32（交叉验证提升了0.03-0.06）
3D编码器 → 沿z轴池化 → 2D FPN解码器
简单的自定义注意力池化机制
数据采样策略：1个含墨水区域 : 8个无墨水区域
去噪器模块

数据准备

5折交叉验证，将ink_id=2的样本沿高度方向平分为3个区域
分辨率224，步长56，切片范围20-36
采用1:8的比例采样含墨水和不含墨水的区域来生成更多数据，这比使用滑动窗口采样使训练更加稳定
数据增强：

[
        A.HorizontalFlip(p=0.5),
        A.VerticalFlip(p=0.5),
        A.RandomRotate90(p=0.5),
        A.Affine(rotate=0, translate_percent=0.1, scale=[0.9,1.5], shear=0, p=0.5),
        A.OneOf([
            A.RandomToneCurve(scale=0.3, p=0.2),
            A.RandomBrightnessContrast(brightness_limit=(-0.1, 0.2), contrast_limit=(-0.4, 0.5), brightness_by_max=True, always_apply=False, p=0.8)
        ], p=0.5),
        A.OneOf([
            A.ShiftScaleRotate(shift_limit=None, scale_limit=[-0.15, 0.15], rotate_limit=[-30, 30], interpolation=cv2.INTER_LINEAR, border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=None, shift_limit_x=[-0.1, 0.1], shift_limit_y=[-0.2, 0.2], rotate_method='largest_box', p=0.5),
            A.ElasticTransform(alpha=1, sigma=20, alpha_affine=10, interpolation=cv2.INTER_LINEAR, border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=None, approximate=False, same_dxdy=False, p=0.5),
            A.GridDistortion(num_steps=5, distort_limit=0.3, interpolation=cv2.INTER_LINEAR, border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=None, normalized=True, p=0.5),
        ], p=0.5),
        A.OneOf([
            A.GaussNoise(var_limit=[10, 50], p=0.5),
            A.GaussianBlur(p=0.5),
            A.MotionBlur(p=0.5),
        ], p=0.5),
        A.CoarseDropout(max_holes=3, max_width=0.15, max_height=0.25, mask_fill_value=0, p=0.5),
        A.Normalize(
            mean=[0]*in_chans, 
            std=[1]*in_chans, 
        ),
        ToTensorV2(transpose_mask=True),
]

模型架构

使用 ResNet3D 作为骨干网络
注意力池化层：

class AttentionPool(torch.nn.Module):
    def __init__(self, depth, height, width):
        super().__init__()
        self.attention_weights = nn.Parameter(torch.ones(1, 1, depth, height, width))
        self.softmax = nn.Softmax(dim=2)

    def forward(self, x):
        # 沿深度维度应用softmax获取注意力权重
        attention_weights = self.softmax(self.attention_weights)
        # 通过注意力权重与输入相乘实现注意力池化
        pooled_output = torch.mul(attention_weights, x)
        # 沿深度维度求和
        pooled_output = torch.sum(pooled_output, dim=2)
        return pooled_output

使用FPN作为解码器
去噪器（受扩散模型启发，模型预测噪声）：

if cfg.use_denoiser:
    self.denoiser = smp.Unet(
        encoder_name="tu-resnet10t",  # "tu-resnet10t" 或 "resnet18"
        encoder_weights="imagenet",
        in_channels=1,
        classes=1,
        activation=None,
    )

# 推理阶段
if self.cfg.use_denoiser:
    noise = self.denoiser(masks)
    masks = masks - noise

损失函数：二元交叉熵（BCE）

训练策略

使用自动混合精度（AMP）
优化器：AdamW

推理过程

测试时增强（TTA）：rot90旋转3次 + 原始图像
阈值主要根据交叉验证确定，曾尝试使用93%分位数但未采用，因为不确定私有数据集中的墨水成分

完整流程

使用ink-id InkClassifier3DCNN预训练 → 保存编码器 → 加载到分割模型 → 训练分割模型 → 推理预测

实验结果

交叉验证分数：0.685
Public LB：0.69
Private LB：0.65

代码链接

推理代码 https://www.kaggle.com/code/fengqilong/vesuvius-inference 训练代码 https://github.com/fengql123/kaggle-vesuvius-10th-place-solution/tree/main

第10名解决方案 - Vesuvius Challenge 墨水检测

摘要

数据准备

模型架构

训练策略

推理过程

完整流程

实验结果

代码链接

同比赛其他方案