39th place solution

512. Google Universal Image Embedding | google-universal-image-embedding

开始: 2022-07-11 结束: 2022-10-10 计算机视觉数据算法赛

第39名解决方案

第39名解决方案

作者： imori
比赛排名： 第39名

感谢比赛主办方和各位参赛者！

这是第39名的解决方案。

模型

使用 PyTorch 实现的双头集成模型。

主干网络： ViT-H-14 + LAION2B
Head1： Dropout(0.1) -> Linear(64) -> Normalize
Head2： Linear(64) -> Normalize
输出： Head1 + Head2 (相加)

流程：ViT -> Head1 / Head2 -> Normalize -> 集成

在训练过程中，主干网络被冻结，仅使用 ArcFace 训练头部模块。

数据集

Head1： GLR2021 + ImageNet1K + Products10K
Head2： GLR2021 + ImageNet1K + Products10K + AlibabaGoods + FoodRecognition2022 + MET

无效尝试

部分尝试基于 TensorFlow 实现。

ArcFace (大 s + 小 margin)
SubCenter ArcFace (部分 K 值)
AdaCos
Head : SelfAttention + Linear
Head : Linear, ReLU, Dropout, Linear
Head : Linear, ReLU, Dropout, Linear, BatchNormalization
对比学习
Focal Loss
单模型或集成其他 CLIP (RN50, ViT-B, ViT-L (224px, 336px), ViT-g), CNN, 其他基于 Arcface 的 Head
集成方法 (concat + AveragePooling, 加权相加)
使用其他数据集

疑问

数据增强效果不好？
在通道轴上进行拼接的集成方式看起来不错，但得分低于简单的 64 维模型
model1 -> 32dim, model2 -> 32dim, ensemble (64 dim)
实例级标签效果不好？
2阶段分类是否有效？第1阶段：类别分类，第2阶段：嵌入 (我没时间尝试)

同比赛其他方案

1st place solution

[2nd place] Solution

GUIE 4th Place Solution

5th place solution[NS embedding]

9th place solution: finetune CLIP ViT-H/14