My 23rd Place Aprroach

我的第23名方案

作者： Ertuğrul Demir
原文发布时间： 2020年8月18日

更新；

由于我的TPU限制重置，我在这里分享了我方法的轻量版本：

https://www.kaggle.com/datafan07/final-melanoma-model-18th-place-solution-light-v

首先，感谢Kaggle和参与这次比赛的所有人。这是我第一次认真参加比赛，在这个过程中学到了很多。我想写下对我有用的东西（或者我认为有用的东西），我从这个社区学到了很多，所以我想把它们分享回来！p>

我在公共排行榜上得分最高的提交仅基于2020年的数据。它们在使用EfficientNet + 元数据融合时表现不错，但我注意到它们在处理未见过的数据时表现不佳。我认为这是由于一些未见过的测试集造成的，Chris和我在一些公开讨论中指出了这一点。

我在测试集的某些情况下得到了一些不稳定的结果，仅用2020年数据训练的预测结果和仅用2019年数据的预测结果之间存在很大差异。我有一种直觉，这可能是由于黑色素瘤阶段的某些医学差异，或者不同的扫描设备造成的，但这完全不是我的专业领域，所以为了克服这个问题，我决定使用外部数据，我认为增加更多的例子会让我的模型更好地预测这些奇怪的情况。感谢Chris，我在现有模型上使用了外部TFRecords和恶性样本上采样。

嗯……这大大提高了我的CV（交叉验证）分数，但LB（排行榜）情况并非如此。我决定逐一添加这些外部数据，最后决定将2019年的部分排除在我的模型之外，只使用2018年的数据。这对我有一点帮助，但有一个大问题：过拟合。尝试了一些增强和正则化，但在我看来还不够。我对Chris的粗略丢弃很感兴趣，但它在我想要的丢弃水平上有点损害我的模型速度。然后发现了@benboren很棒的撒盐方法，并针对我的模型进行了微调：

def make_mask(num_holes,side_length,rows, cols, num_channels):
        """Builds the mask for all sprinkles."""
        row_range = tf.tile(tf.range(rows)[..., tf.newaxis], [1, num_holes])
        col_range = tf.tile(tf.range(cols)[..., tf.newaxis], [1, num_holes])
        r_idx = tf.random.uniform([num_holes], minval=0, maxval=rows-1,
                                  dtype=tf.int32)
        c_idx = tf.random.uniform([num_holes], minval=0, maxval=cols-1,
                                  dtype=tf.int32)
        r1 = tf.clip_by_value(r_idx - side_length // 2, 0, rows)
        r2 = tf.clip_by_value(r_idx + side_length // 2, 0, rows)
        c1 = tf.clip_by_value(c_idx - side_length // 2, 0, cols)
        c2 = tf.clip_by_value(c_idx + side_length // 2, 0, cols)
        row_mask = (row_range > r1) & (row_range < r2)
        col_mask = (col_range > c1) & (col_range < c2)

        # Combine masks into one layer and duplicate over channels.
        mask = row_mask[:, tf.newaxis] & col_mask
        mask = tf.reduce_any(mask, axis=-1)
        mask = mask[..., tf.newaxis]
        mask = tf.tile(mask, [1, 1, num_channels])
        return mask
    
def sprinkles(image, cfg = CFG): 
    num_holes = cfg['num_holes']
    side_length = cfg['side_length']
    mode = cfg['sprinkles_mode']
    PROBABILITY = cfg['sprinkles_prob']
    
    RandProb = tf.cast( tf.random.uniform([],0,1) < PROBABILITY, tf.int32)
    if (RandProb == 0)|(num_holes == 0): return image
    
    img_shape = tf.shape(image)
    if mode is 'normal':
        rejected = tf.zeros_like(image)
    elif mode is 'salt_pepper':
        num_holes = num_holes // 2
        rejected_high = tf.ones_like(image)
        rejected_low = tf.zeros_like(image)
    elif mode is 'gaussian':
        rejected = tf.random.normal(img_shape, dtype=tf.float32)
    else:
        raise ValueError(f'Unknown mode "{mode}" given.')
        
    rows = img_shape[0]
    cols = img_shape[1]
    num_channels = img_shape[-1]
    if mode is 'salt_pepper':
        mask1 = make_mask(num_holes,side_length,rows, cols, num_channels)
        mask2 = make_mask(num_holes,side_length,rows, cols, num_channels)
        filtered

我的第23名方案

同比赛其他方案