第 14 名解决方案 - ISIC 2024

第 14 名解决方案

竞赛排名: 第 14 名 (金牌)

发布时间: 2024-09-08

首先，我要真诚地感谢 ISIC 2024 竞赛的组织者。我还要感谢 @daikon99 和 @yuyagi 与我组队！代表我的团队，我很兴奋分享我们的解决方案。

推理管道概述

我们基于这个优秀的 Notebook 进行了改进 (ISIC 2024 Borrowed .179LB Tabular/OOF ImageNet)。为了构建一个对抗方差具有鲁棒性的模型，我们通过增加用于特征提取的 CNN 模型的多样性，最大化了集成的数量。对于最终提交，我们选择了基于信任 LB 的方案和基于信任 CV 的方案。结果，信任 LB 的提交为我们赢得了一枚金牌！

解决方案

验证策略

我们参考了之前的竞赛开发了我们的验证策略。(三重分层无泄漏 KFold 交叉验证)

为了减少拥有大量负样本患者的影响，我们在 Fold 分割中使用了以下方法：

对于拥有超过 100 个负样本的患者，我们将每个患者的负样本数量限制为 100 个。
应用此限制后，我们以一种确保正样本数量在 Fold 之间平衡的方式划分每个患者 ID 的样本。
最后，对于负样本数量受限的患者 ID，我们将所有负样本分配到与相应患者 ID 相同的 Fold 中。

在我们的实验中，使用这种分割训练的 CNN 和 Stacking GBDT 模型与其他分割（基于患者 ID 的分组 K-Fold 或分层分组 K-Fold）相比，与 CV 和私有 LB 具有更高的相关性。
它们在私有 LB 上的表现也优于使用其他分割训练的模型。

neg_counts = 100

target_counts = train_df.groupby('patient_id')['target'].value_counts().unstack(fill_value=0).reset_index()

target_counts.columns = ['patient_id', 'target_0_count', 'target_1_count']
target_counts['total_images'] = target_counts['target_0_count'] + target_counts['target_1_count']
sorted_target_counts = target_counts.sort_values(by='total_images', ascending=False).reset_index(drop=True)

for patient_id in sorted_target_counts['patient_id']:
    target_0_max = sorted_target_counts.loc[sorted_target_counts['patient_id'] == patient_id, 'target_0_count'].values[0]
    if target_0_max >= neg_counts:
        patient_data = train_df[(train_df['patient_id'] == patient_id) & (train_df['target'] == 0)]
        if len(patient_data) > neg_counts:
            patient_data_sample = patient_data.sample(n=neg_counts, random_state=42)
            train_df = train_df[~((train_df['patient_id'] == patient_id) & (train_df['target'] == 0))]
            train_df = pd.concat([train_df, patient_data_sample])

推理管道

结果：CV→0.1793, 公共 LB→0.186 (第 2 名), 私有 LB→0.171

最终管道是三个 Stacking GBDT 模型、无 CNN 预测的 GBDT 模型和单个 CNN 模型的加权平均，如下所示。
推理管道图

阶段 1. 带元数据的 CNN

输入特征

原始特征 (ISIC2024 train_metadata.csv)
工程特征 (ISIC 2024 Borrowed .179LB Tabular/OOF ImageNet)

CNN 模型架构

CNN 模型是基于之前竞赛的第 1 名解决方案 (1st place solution) 构建的。
我们修改了这个优秀 Notebook 中的模型以适用于 EfficientNetV2 模型。

# tf_efficientnet_b0_ns, swin_small_patch4_window7_224.ms_in21k_ft_in1k, convnext_tiny.in22k_ft_in1k 
class ISICModel(nn.Module):
    def __init__(self, model_name, n_meta_features,model_output_size, num_classes=1, pretrained=False,):
        super(ISICModel, self).__init__()
        self.model = timm.create_model(model_name, pretrained=pretrained)
        
        self.model.classifier = nn.Identity()  

        self.meta_fc1 = nn.Linear(n_meta_features, 64)
        self.meta_fc2 = nn.Linear(64, 64)
    
        self.fc1 = nn.Linear(1128, 512) # swins → 1128
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, num_classes)

        n_meta_dim = [512, 128]
        self.meta = nn.Sequential(
                nn.Linear(n_meta_features, n_meta_dim[0]),
                nn.BatchNorm1d(n_meta_dim[0]),
                nn.Dropout(p=0.3),
                nn.Linear(n_meta_dim[0], n_meta_dim[1]),
                nn.BatchNorm1d(n_meta_dim[1]),
        )

    def forward(self, x):
        images, meta = x
        x1 = self.model(images) 
        x2 = self.meta(meta)
        x = torch.cat((x1, x2), dim=1) 
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# tf_efficientnetv2_s_in1k
class ISICModel(nn.Module):
    def __init__(self, model_name, num_classes=1, pretrained=True, config=None, meta_features=200):
        super(ISICModel, self).__init__()
        self.is_feature_only = True
        try:
            self.model = timm.create_model(
                model_name, in_chans=3, pretrained=pretrained, features_only=True
            )
            in_features = self.model.feature_info[-1]['num_chs']
        except Exception as e:
            print(e)
            self.model = timm.create_model(
                model_name, in_chans=3, num_classes=1, pretrained=pretrained)
            in_features = self.model.head.in_features
            self.model.head = nn.Identity()
            self.is_feature_only = False

        self.pooling = GeM()
        self.linear = nn.Linear(in_features + meta_features//2, num_classes)
        self.criterion = nn.BCEWithLogitsLoss()
        self.config = config
        self.training_step_outputs = []
        self.validation_step_outputs = []

        self.meta_bn = nn.BatchNorm1d(meta_features)
        self.meta_linear = nn.Linear(
            meta_features, meta_features//2)

    def forward(self, images, meta_data):
        if self.is_feature_only:
            features = self.model(images)[-1]
            if features.shape[1] == features.shape[2]:
                features = features.permute(0, 3, 1, 2)
        else:
            features = self.model.forward_features(images).unsqueeze(1)
            features = features.permute(0, 3, 1, 2)
        pooled_features = self.pooling(features).flatten(1)

        meta_data = self.meta_bn(meta_data)
        meta_data = self.meta_linear(meta_data)

        combined_features = torch.cat([pooled_features, meta_data], dim=1)

        output = self.linear(combined_features)
        return output

数据增强

对于数据增强，我们基于之前竞赛的第 1 名解决方案 (1st place solution)。

# tf_efficientnet_b0_ns, swin_small_patch4_window7_224.ms_in21k_ft_in1k, convnext_tiny.in22k_ft_in1k 
A.Resize(CONFIG['img_size'], CONFIG['img_size']),
A.RandomRotate90(p=0.5),
A.Flip(p=0.5),
A.RandomBrightnessContrast(
    brightness_limit=0.2, contrast_limit=0.2, p=0.5),
A.OneOf([
    A.OpticalDistortion(distort_limit=1.0),
    A.GridDistortion(num_steps=5, distort_limit=1.),
    A.ElasticTransform(alpha=3),
], p=0.7),
A.Downscale(p=0.25),
A.HueSaturationValue(hue_shift_limit=10, sat_shift_limit=20, val_shift_limit=10, p=0.5),
A.ShiftScaleRotate(shift_limit=0.1,
                    scale_limit=0.15,
                    rotate_limit=60,
                    p=0.5),
A.CoarseDropout(max_height=int(CONFIG['img_size'] * 0.375),
                max_width=int(CONFIG['img_size'] * 0.375),
                max_holes=1, p=0.7),
A.Normalize()

# tf_efficientnetv2_s_in1k
A.Resize(CONFIG['img_size'], CONFIG['img_size']),
A.RandomRotate90(p=0.5),
A.Flip(p=0.5),
A.OneOf([
    A.OpticalDistortion(distort_limit=1.0),
    A.GridDistortion(num_steps=5, distort_limit=1.),
    A.ElasticTransform(alpha=3),
], p=0.5),
A.HueSaturationValue(hue_shift_limit=1, sat_shift_limit=1, val_shift_limit=1, p=0.5),
A.RandomBrightnessContrast(
    brightness_limit=0.2, contrast_limit=0.2, p=0.5),
A.Normalize()

CNN 模型的 CV 和公共 LB 结果

骨干网络	使用原始特征	使用工程特征	欠采样	学习率	图像大小	CV	公共 LB
tf_efficientnet_b0_ns	◯	×	每个患者最多使用 100 个负样本	3e-4	128	0.151	0.161
swin_small_patch4_window7_224.ms_in21k_ft_in1k	◯	×	每个患者最多使用 100 个负样本	1e-5	224	0.153	0.162
convnext_tiny.in22k_ft_in1k	◯	×	每个患者最多使用 100 个负样本	1e-4	288	0.155	0.163
tf_efficientnetv2_s_in1k	◯	◯	从所有负样本中随机欠采样	1e-4	224	0.165	0.170

阶段 2. Stacking GBDT 和集成

输入特征

原始特征 (ISIC2024 train_metadata.csv)
工程特征 (ISIC 2024 Borrowed .179LB Tabular/OOF ImageNet)
Norm2 (聚合特征)
CNN 预测

特征选择

使用 Notebook 中的思路，计算每个 Stacking 模型的特征重要性，并移除重要性为 0.0 的不必要特征。结果，CV 和 LB 分数均有所提高。

Norm2: 聚合特征 (patient_id + tbp_lv_location_simple)

除了基础 Notebook 特征外，使用 patient_id 和 tbp_lv_location_simple 添加以下聚合特征。

pl.read_csv(path)
.with_columns(
    (pl.col('patient_id') + '_' + pl.col('tbp_lv_location_simple')).alias('patient_id_location')
)
.with_columns(
    ((pl.col(col) - pl.col(col).mean().over('patient_id_location')) / (pl.col(col).std().over('patient_id_location') + err)).alias(f'{col}_patient_location_norm') for col in (num_cols + new_num_cols)
)

集成中使用的模型的 CV 结果

对于 stacking 模型，有人提议要么增加用作特征的模型数量，要么增加 GBDT 的数量。实验结果表明，将用作特征的模型数量限制为最多 2 个，同时增加 GBDT 的数量，在 CV 和 LB 上都产生了更好的结果。
此外，tf_efficientnetv2_s_in1k 在 CV 和公共 LB 中得分最高，因此我们将其作为集成模型采用。

模型名称	oof_1	oof_2	使用 norm2	集成权重	CV
GBDT_1(lgb+cb+xgb)	tf_efficientnet_b0_ns	swin_small_patch4_window7_224.ms_in21k_ft_in1k	×	1.882e-01	0.178
GBDT_2(lgb+cb+xgb)	tf_efficientnet_b0_ns	swin_small_patch4_window7_224.ms_in21k_ft_in1k	◯	4.963e-01	0.179
GBDT_3(lgb+cb+xgb)	convnext_tiny.in22k_ft_in1k	tf_efficientnetv2_s_in1k	◯	1.534e-01	0.180
GBDT_4(lgb+cb+xgb)	×	×	◯	5.075e-02	0.171
tf_efficientnetv2_s_in1k only	×	×	×	2.702e-04	0.165

集成方法

我们的集成方法使用加权平均，其中权重计算为最小化损失。

from scipy.optimize import minimize

def ensemble_loss(weights):
    weights = np.array(weights)
    weights = weights / np.sum(weights)
    
    ensemble_preds = weights[0] * oof_1['oof_pred'] + weights[1] * oof_2['oof_pred'] + weights[2] * oof_3['oof_pred'] + weights[3] * oof_4['oof_pred'] + weights[4] * oof_5['oof_pred']
    # calculate loss
    return 1 - comp_score(df_train["target"], ensemble_preds)

result = minimize(
    ensemble_loss, 
    [1/5, 1/5, 1/5, 1/5, 1/5], 
    method='Nelder-Mead',
    )
print(1-result.fun)
print(result)

无效的方法

将仅存在于训练数据中的特征作为 CNN 的预测目标添加。
如果活检则使用软标签
age_approx 的后处理
TTA
验证中未知医院的 Fold 分割
向单个 GBDT 添加 3 个或更多 oof 特征
添加头发增强
改为 Focal loss
使用过去竞赛数据进行预训练
排名集成

14th place solution

第 14 名解决方案

推理管道概述

解决方案

验证策略

推理管道

阶段 1. 带元数据的 CNN

输入特征

CNN 模型架构

数据增强

CNN 模型的 CV 和公共 LB 结果

阶段 2. Stacking GBDT 和集成

输入特征

特征选择

Norm2: 聚合特征 (patient_id + tbp_lv_location_simple)

集成中使用的模型的 CV 结果

集成方法

无效的方法

同比赛其他方案