第 7 名解决方案 - 3D nnU-Net + blob 回归（再次）

作者: Stefan Denner (及团队成员)
发布日期: 2025-10-16
竞赛: RSNA 颅内动脉瘤检测 (RSNA Intracranial Aneurysm Detection)

感谢 @evancalabrese, @ryanholbrook, RSNA 和 Kaggle 组织本次颅内动脉瘤检测竞赛！

概述 (TLDR)

以下是我们解决方案的简要概述——它 straightforward 且易于实施：

我们将任务 formulate 为多通道 blob 回归，使用 TopK (20%) BCE 损失进行优化，然后取每个通道的最大值作为概率预测。
我们基于 nnU-Net 构建，这是领先的 3D 医学图像分割框架。我们已经在 BYU - 2025 年定位细菌鞭毛马达第 2 名解决方案中对其进行了适配。
我们的模型是一个带有残差编码器的 3D U-Net，从头开始训练。
推理使用单个模型完成， without test-time augmentation（由于时间限制）。
我们的模型在公共/私有排行榜上取得了 0.83 / 0.83 的分数。

我们是谁？

我们是一群同事（科学家和博士生），隶属于德国癌症研究中心的医学图像计算部门以及 Helmholtz Imaging。我们的专业知识在于 3D 图像分析——特别是在解决 3D 分割问题和开发将算法带入临床实践的基础设施方面。

方法

我们将任务建模为 heatmap 回归，并基于我们在 BYU - 2025 年定位细菌鞭毛马达第 2 名解决方案中的工作。

使用的数据

我们使用 pydicom 库将 DICOM 系列转换为 .nii.gz 格式，并行处理具有 2D DICOM 文件的系列。我们还使用此库提取有关 spacing、origin 和 direction 的信息。然后将 resulting image 转换为 SimpleITK 图像并将其 oriented 为 RAS 方向。

为了减少计算资源，我们在图像的中央 superior 区域推导出了一个 [200, 160, 160] mm 的立方感兴趣区域 (ROI)。我们确保训练集中的所有动脉瘤都包含在 ROI 中。图 1 描绘了图像顶部的 ROI。

我们观察到几个系列存在缺陷，例如意外的方向、空系列、分流伪影、运动伪影或具有空 superior 空间的图像。我们决定保留那些具有轻微伪影（如方向错误）的系列，并通过翻转修复它们。具有更强伪影（如完全空图像）的系列被丢弃。总共丢弃了十个 volumes。

图 1. 示例图像系列的轴向、冠状和矢状视图以及红色 ROI 框。

预处理

数据预处理是使用可自我配置的分割框架 nnU-Net 完成的，遵循 3D 全分辨率配置。volumes 使用 SimpleITK 读取器加载，该读取器还尝试强制 RAS 方向。所有图像都 resampled 到从所有训练图像中发现的中值 spacing：[0.70, 0.47, 0.47] (mm)，并通过基于从训练数据集提取的全局均值和全局标准差的 z-score 归一化进行 normalized。

图像使用 PyTorch 的特殊 resampler 进行 resampled，鉴于严格的时间限制，它比其它常用函数（如 scipy.ndimage.zoom）更快。

网络架构

我们使用 nnU-Net 的 ResEnc，这本质上是一个带有残差编码器和轻量级卷积解码器的 UNet。架构包括六个阶段，每个阶段分别具有 [32, 64, 128, 256, 320, 320] 个特征。

训练过程

我们将提供的挑战数据分为五个交叉验证 folds，在 folds 之间对模态进行 stratifying，而不是对 vessel 类别，因为我们想要确保 across all image modalities 的 adequate performance。由于我们相对较晚加入挑战并且有许多潜在的设计选择要测试，大多数 hyperparameter tuning exclusively 发生在交叉验证方案的第一个 fold 上。

使用 nnU-Net 进行 Blob 回归

nnU-Net 是为 semantic segmentation 构建的。这也包括其预期的数据结构。为了使其 compatible 于动脉瘤回归，我们将 ground truth 存储为 semantic segmentation maps，其中每个动脉瘤都 encoded 为一个 sphere (r=5 voxels)，具有代表 ground-truth 动脉瘤 vessel 类别的 integer label。这些 spheres 被 nnU-Net 视为 segmentations，并像 nnU-Net 通常那样通过 data loading 和 augmentation pipeline，从而 properly 应用 rotations、mirroring 等，尽管我们没有在 left/right 轴上应用 mirroring augmentations，因为几个 labels contained 一个 left/right codification。在 dataloading pipeline 的末尾，我们 inject 一个 custom transform，将每个 aneurysm instance 转换为 respective channel 的 blob。我们将 14 个 classes 建模为 separate channels，其中第 14 个 class (Aneurysm Present) 是 13 个 anatomical classes 的 pixelwise maximum。

我们使用'EDT blobs'，基本上是使用 Euclidean Distance Transform (EDT) 转换并 rescaled 为具有 [0, 1] 值范围的 3D spheres。第一个交叉验证 fold 中优化的 sphere size 是 65 voxels。我们 experimented 了从 15 到 95 voxel radii 的 sphere sizes。

图 2: Blob (EDT, radius 65) 位于右侧大脑中动脉（图像尚未 resampled）

超参数

我们的最终模型使用 32 的 batch size 和 96x160x128 voxels 的 patch size 进行训练。Initial learning rate 是 0.01，并使用 polyLR schedule（与默认 nnU-Net 相同）在 training 过程中 decay。我们使用 SGD 训练 3000 epochs（每个 epoch 250 iterations）。utilized 的损失函数是 binary cross-entropy，仅计算 20% worst voxels（整个 batch 中计算出的具有最高 loss value 的 voxels）。

我们的最终模型是在 4xA100 40GB 上使用 PyTorch 的 DDP 训练的。训练耗时 4.5 天。

推理

我们 largely 使用 nnU-Net 的 inference infrastructure。输入系列被 dissect 为一系列 patches。每个 patch gets blob regressed。我们 take 每个通道的 maximum 作为 class probability。我们 across patches 进行 max-aggregate 以获得 final predictions。
我们使用 2xT4 实例进行 prediction，并将 patches split 以 evenly across the GPUs 预测每个输入系列。我们 always 使用单个模型，no ensemble。Inference 大约需要 8 小时。

结果

不幸的是，我们受到了 Kaggle 平台 instabilities 的 quite hard 打击。
虽然我们的模型 finished training 3000 epochs，但只有直到 1500 epochs（截止日期前三天）的 submissions successful（私有和公共 LB 0.83）。随后的 submissions timed out，即使只有 model weights 不同。
我们的 internal validation 显示，later checkpoints、TTA 和 patches 的 Gaussian weighting 可能会 further improved 我们的 performance（之前的 submissions 也显示了这一点）。
令人惊讶的是，我们的 internal performance 达到了 0.9，但这未在 leaderboard 上 achieve。我们不知道这种 shift 来自哪里。一个原因可能是我们必须将 inference 嵌入到 try/catch block 中，否则 15 分钟后会抛出 error。我们不知道确切原因。
我们没有 exploit segmentation masks，这可以在 training 期间作为 auxiliary outputs 添加以帮助模型更好地 localize。由于加入较晚，我们没有 time investigate 这一点。然而，其他 teams 显示这 improved 他们的 performance。

什么不起作用？

我们还将问题 frame 为 detection 问题，尝试使用 self-configuration detection framework nnDetection 来解决。这种方法在 public leaderboard 上比此处 presented 的 solution 表现更好，但在 private leaderboard 和我们的 internal validation 中 underperformed。
- nnDetection 是在 instance segmentation label versions 上训练的，也在 cropped data 上，由 self-configured Retina U-Net architecture 组成，该架构从 encoded 为 boxes 的 aneurysm positions 和 aneurysm segmentations 中学习。与此处的 solution 不同，它将 input series resampled 到 [1.0, 1.0, 1.0] mm 的 isometric space，batch size 为 4 训练 100 epochs（每个 epoch 2500 iterations），hybrid loss function 结合 L1 loss 用于 box coordinates 的 regression 和 focal loss 用于 box class estimation，polynomial learning rate scheduling 从 0.001 的 initial value 开始，以及带有 Nesterov momentum 的 SGD optimizer。Predicted boxes 通过 0.1 intersection over union threshold 的 non-maximum suppression 进行 postprocessed。我们 additionally managed conduct inference 使用 8 test time augmentations 和 0.25 的 inference patch overlap。
还实施了 [1.0, 1.0, 1.0] mm 的 isometric space resampling，鉴于其 faster image processing 的潜力。然而，它 substantially worsened 我们的 results，所以 early on 就 discontinued 了。
我们还 explored 与包含 binary classes 的外部 aneurysm datasets 进行 co-training，以更好地 model Aneurysm Present class (ADAM, Large IA Segmentation dataset, INSTED, Lausanne TOF-MRA Aneurysm Cohort, Royal Brisbane TOFMRA Intracranial Aneurysm Database, Jianxiaokuang aneurysm dataset。我们 afterwards 意识到其中一些 datasets (ADAM, Large IA Segmentation dataset) 是不 allowed 的，所以我们 discarded 它们并在 without them 的情况下运行 co-training。In the end，co-training 没有 really help，所以我们 resorted back 到仅从 scratch 训练 challenge cases。
我们首先开始 processing 整个 image，但 time limitations 迫使 us crop around the ROI，这也 resulted in better performance。
如上所述，在我们的 final model 中，我们 just max-aggregate patch predictions。然而，known 的是模型在 edges 附近有一些 uncertainty。mitigate 这一点的 common strategy 是 gaussian weighting 每个 patch 的 predictions（中心 high weight，borders low weight）。在 earlier submissions 中我们看到这 improved 我们的 performance。然而，在我们的 final model 中，由于 platform instabilities，我们 could not apply 此 strategy。
我们还 tried train with larger patch sizes，这，surprisingly，did not contribute to improve 我们的 scores。

我们希望什么？

我们已经在 discussion forum 中 stated，predict 函数的 signature limiting us quite a lot 在 how we can parallelize processing 方面。更多关于此 here。
这一 requirement 使得它 even more difficult for us，since 我们 required 一个 specific numpy version 这 forced us run our code as a subprocess。We ended up spawning 一个 proxy worker 与之 communicate via the std output。这 added significant boilerplate and complexity，and felt unnatural。

We would have also wished for longer submission times, and a less dependent server architecture on the number of submissions sent by different teams, since many of our submissions timed out during the last few days of the challenge due to an increasing workload.
在更 broad scope 上：Kaggle 的 submission notebook style 使得 it quite hard for us（also the last time）。Having the possibility to just use Docker containers 会 eased our lives a lot 因为它们 allow much higher flexibility。

致谢

我们感谢 RSNA 组织以及 Kaggle hosting 本次 competition。我们 furthermore want to give a shoutout 给我们的德国癌症研究中心医学图像计算部门以及 Helmholtz Imaging

7th place solution - 3D nnU-Net + blob regression (again)