GAIIC2024 · 多模态目标检测 · 方案代码说明

🏆 GAIIC2024 · 多模态目标检测 📅 方案代码说明 ⚙️ Co-DETR + 双光融合 + 数据清洗

GAIIC2024 多模态目标检测 · 方案代码说明

基于Co-DETR的双光融合检测 · 数据清洗 + 两阶段增强 + 模型集成

📁 代码仓库: https://gitee.com/mr_xia_dada/gaiic2024

训练配置

显卡：A100 40G x 2
内存：100G（实际使用约40G）
训练参数：batch_size=3, backbone.with_cp=False, transformer.encoder.with_cp=True，显存开销约38G，每个模型约需1.5天。

环境配置

项目依赖项不多，具体可查看 init.sh。关键包版本（仅供参考）：

mmcv                      2.1.0                    pypi_0    pypi
mmdet                     3.3.0                    pypi_0    pypi
mmengine                  0.10.3                   pypi_0    pypi
pytorch                   2.2.2           py3.10_cuda11.8_cudnn8.7.0_0    pytorch
        

数据准备

⚠️ 外部数据说明： B榜提交时额外集成外部数据模型未带来显著收益，代码实现较仓促，路径写死。如不需要严格复现，可跳过外部数据处理。默认训练脚本已注释相关代码。

使用两个公开数据集：

aistudio：下载链接（需手动下载）
vedai：https://downloads.greyc.fr/vedai/（下载方式见train.sh）

数据解压后目录结构：

data
├── aistudio
│   ├── original
│   │   └── original
│   │       ├── annotations
│   │       └── imgs
│   ├── data.csv
│   └── roundabouts.csv
└── vedai
    ├── Annotations512
    └── Vehicules512
        

运行 python tools/external_data.py 可将两个外部数据处理为COCO标注格式备用。

预训练模型

使用 Co-DETR-SwinL-16Epoch-DETR-o365+COCO 预训练权重（表格最后一行）。建议同时下载 swin-large backbone 权重，否则需注释配置中的 pretrained 字段。

wget https://download.openmmlab.com/mmdetection/v3.0/codetr/co_dino_5scale_swin_large_16e_o365tococo-614254c9.pth
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth
        

算法细节

整体思路

最终B榜提交两个版本：

9模型版本：得分 0.48887175566920904（包含两个外部数据模型）
3模型版本：得分 0.48128947817837825（无需外部数据）

所有模型基于 Co-DETR，差异在于训练数据、数据增强pipeline、网络结构、预训练参数。

数据清洗

人工清洗训练集和验证集，主要修正 truck/van/freight_car 三类标注。保留两个版本的训练标注：

data/track1-A/annotations/train_0518.json
data/track1-A/annotations/train_0527.json（van类别清洗更激进）

验证集：data/track1-A/annotations/val_0527.json

数据增强 Pipeline

常规一阶段增强

train_pipeline = [
    dict(type="LoadImageFromFile"),
    dict(type="LoadTirFromPath"),
    dict(type="LoadAnnotations", with_bbox=True),
    dict(type='AdaptiveHistEQU'),       # 自适应对比度
    dict(type='RandomShiftOnlyImg', max_shift_px=10, prob=0.5),
    dict(
        type="BugFreeTransformBroadcaster",
        mapping={
            "img": ["tir", "img"],
            "img_shape": ["img_shape", "img_shape"],
            "gt_bboxes": ['gt_bboxes', 'gt_bboxes'],
            "gt_bboxes_labels": ['gt_bboxes_labels', 'gt_bboxes_labels'],
            "gt_ignore_flags": ['gt_ignore_flags', 'gt_ignore_flags'],
        },
        auto_remap=True,
        share_random_params=True,
        transforms=[
            dict(type='RandomResize', scale=image_size, ratio_range=(0.75, 1.5), keep_ratio=True),
            dict(type='RandomCrop', crop_type='absolute', crop_size=image_size, allow_negative_crop=False),
            dict(type='RandomFlip', prob=0.5),
            dict(type='Pad', size=image_size, pad_val=dict(img=(114, 114, 114))),
        ],
    ),
    dict(type="CustomPackDetInputs"),
]
        

两阶段增强 Pipeline

前几个epoch使用 mosaic 等强增广，之后切换为常规pipeline（通过 PipelineSwitchHook 实现）。带 mosaic 的 pipeline 示例：

train_pipeline_stage1 = [
    dict(
        type='MultiInputMosaic',
        keys=['img', 'tir'],
        prob=1.0,
        img_scale=image_size,
        center_ratio_range=(0.5, 1.05),
        pad_val=114.0,
        bbox_clip_border=True,
        individual_pipeline=[
            dict(type="LoadImageFromFile"),
            dict(type="LoadTirFromPath"),
            dict(type="LoadAnnotations", with_bbox=True),
            dict(type='AdaptiveHistEQU'),
            dict(type='RandomShiftOnlyImg', max_shift_px=10, prob=0.5),
            dict(**transform_broadcast, transforms=[
                dict(type='RandomRotate90', prob=0.5, dir=['l']),
                dict(type='RandomFlip', prob=0.5, direction=['horizontal', 'vertical', 'diagonal']),
            ])
        ]
    ),
    dict(
        **transform_broadcast,
        transforms=[
            dict(type='RandomAffine', scaling_ratio_range=(0.65, 1.5), max_translate_ratio=0.1, max_rotate_degree=0, max_shear_degree=0, border=(-image_size[0]//2, -image_size[1]//2)),
            dict(type='RandomResize', scale=image_size, ratio_range=(0.8, 1.1), keep_ratio=True),
            dict(type='Pad', size_divisor=4, pad_val=dict(img=(114, 114, 114))),
        ],
    ),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(3, 3), keep_empty=False),
    dict(type="CustomPackDetInputs"),
]
        

网络结构

为支持双模态输入并最大化利用预训练参数，仅对 Encoder 的 Attention 做两种变形：

mean：RGB和红外特征分别计算self-attention后取平均作为输出。
concat：将RGB与红外特征在长度维度拼接作为query，特征图也拼接（4→8个尺度），输出时只取前一半query。需修改DeformAttn的reference point参数形状（scripts/patch_ckpt.py完成）。

预训练参数类型

official：直接加载官方Co-DETR权重。
pretrain：先加载官方权重，使用外部数据训练一个模型，再在其他模型上继续微调。

模型集成

9模型版本

#	数据增强	网络结构	数据版本	预训练参数	备注	文件名
1	一阶段	concat	0518	official		codetr_full_0518data
2	两阶段	concat	0527	official		codetr_full_0527data
3	两阶段	mean	0527	official		mean_fuse_full
4	两阶段	mean	0527	pretrain		mean_fuse_with_pretrained
5	两阶段	concat	0527	official	fold 0	codetr_0527fold0
6	两阶段	concat	0527	official	fold 1	codetr_0527fold1
7	两阶段	concat	0527	official	fold 2	codetr_0527fold2
8	两阶段(更强)	concat	0527	official		codetr_full_0527data_strong_aug
9	两阶段(更强)	concat	0527	pretrain		codetr_full_0527data_strong_aug_with_pretrain

Ensemble 参数：skip/nms = 0.07/0.75
A榜 0.5384122069865042，B榜 0.48887175566920904

模型5/6/7由模型2的数据五折划分的前三折训练得到（scripts/split_nfold.py）。

3模型版本

#	数据增强	网络结构	数据版本	预训练参数	文件名
2	两阶段	concat	0527	official	codetr_full_0527data
3	两阶段	mean	0527	official	mean_fuse_full
5	两阶段	concat	0527	official	codetr_0527fold0

Ensemble 参数：skip/nms = 0.05/0.7
A榜 0.5316742459794518，B榜 0.48128947817837825

其他细节

SWA（随机权重平均）
学习率调度
从A榜看，10个epoch分数最高，一般不取最后一个epoch的权重
推理时超参：soft_nms iou_thres=0.7

训练流程

执行 bash train.sh 开始训练。

测试流程

运行 bash test.sh [input_dir] [data_root] [output_json]，例如：

bash test.sh data/track1-A/test data/track1-A data/result/pred.json

具体调用方式可参考 index.py。

⚠️ 注意事项：

train.sh 未完全测试，可能无法直接跑通。
复现时可忽略采用外部数据的两个模型，仅集成其余7个模型不影响线上成绩；仅复现3个模型也不影响B榜排名。
显存不足时可设置 backbone.with_cp=True 或减小 batch_size=1，但训练时间会延长。
代码组织结构：projects 为核心代码，scripts 为工具脚本，tools 为修改自 mmdet 的训练/测试脚本，预训练权重放在 ckpt 文件夹。
训练请使用清洗后的标注文件（data/track1-A/annotations 中的版本）。

6st 方案

GAIIC2024 多模态目标检测 · 方案代码说明

训练配置

环境配置

数据准备

预训练模型

算法细节

整体思路

数据清洗

数据增强 Pipeline

常规一阶段增强

两阶段增强 Pipeline

网络结构

预训练参数类型

模型集成

9模型版本

3模型版本

其他细节

训练流程

测试流程