590. UBC Ovarian Cancer Subtype Classification and Outlier Detection (UBC-OCEAN) | UBC-OCEAN
WSI图像包含黑色像素(所有三个通道均为零),而TMA图像没有。因此,如果图像的宽度和高度都小于6000,但黑色像素面积超过图像总面积的5%(训练数据中所有WSI图像的黑色像素均超过10%),则将其分类为WSI;否则分类为TMA。
首先将WSI图像缩小至原尺寸的0.33倍,然后分割为512×512像素的图块。接着根据"np.sum(np.ptp(tile, axis=2) < 20)"条件识别劣质像素,将这些图块分为三个质量等级。
推理时图块生成代码:
def resize_image_and_make_tile(name, out_path, scale):
path = f"/kaggle/input/UBC-OCEAN/{inference}_images/{name}.png"
p_mask = f"{pred_mask_512_folder}/{name}.npy"
image = cv2.imread(path)
image = cv2.resize(image, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_AREA)
mask = np.load(p_mask)
os.makedirs(f"{out_path}/{name}", exist_ok=True)
count = 0
if count < 20:
idxs = [(y, x) for y in range(0, image.shape[0] // 512)
for x in range(0, image.shape[1] // 512)]
random.shuffle(idxs)
for k, (y, x) in enumerate(idxs):
tile = image[y*512:(y+1)*512, x*512:(x+1)*512, :]
bg_count = np.sum(np.ptp(tile, axis=2) < 20)
if (bg_count / (512*512)) <= 0.5:
cv2.imwrite(f"{out_path}/{name}/{x}_{y}.png", tile)
count += 1
if count >= 60:
break
if count < 20:
idxs = [(y, x) for y in range(0, image.shape[0] // 512)
for x in range(0, image.shape[1] // 512)]
random.shuffle(idxs)
for k, (y, x) in enumerate(idxs):
tile = image[y*512:(y+1)*512, x*512:(x+1)*512, :]
bg_count = np.sum(np.ptp(tile, axis=2) < 20)
if ((bg_count / (512*512)) <= 0.65) & ((bg_count / (512*512)) > 0.5):
cv2.imwrite(f"{out_path}/{name}/{x}_{y}.png", tile)
count += 1
if count >= 40:
break
if count < 10:
idxs = [(y, x) for y in range(0, image.shape[0] // 512)
for x in range(0, image.shape[1] // 512)]
random.shuffle(idxs)
for k, (y, x) in enumerate(idxs):
tile = image[y*512:(y+1)*512, x*512:(x+1)*512, :]
bg_count = np.sum(np.ptp(tile, axis=2) < 20)
if ((bg_count / (512*512)) <= 0.75) & ((bg_count / (512*512)) > 0.65):
cv2.imwrite(f"{out_path}/{name}/{x}_{y}.png", tile)
count += 1
if count >= 10:
break
仅使用WSI图块。每批次从每张图像中随机选择6个图块进行训练。
损失函数:二元交叉熵(BCE)
efficientnetb4, efficientnet_v2s, maxvit_tiny(不同骨干网络的模型设置略有差异)
使用模型预测图块。
tile_df["prob"] = np.max(tile_df[["pred_0", "pred_1", "pred_2", "pred_3", "pred_4"]], axis=1)
tile_df["pred"] = np.argmax(tile_df[["pred_0", "pred_1", "pred_2", "pred_3", "pred_4"]].values, axis=1)
tile_df = tile_df[["image_id", "pred", "prob", "aux"]].groupby(["image_id", "pred"])[["prob", "aux"]].mean().reset_index()
idx = tile_df.groupby(["image_id"])["prob"].idxmax()
wsi_df = tile_df.loc[idx].reset_index(drop=True)
aux_label的预测平均值<0.5(与不预测"Other"相比,分数几乎相同,可能+0.01)
步骤1. 裁剪TMA图像
def crop_tma(img):
ks = min(min(img.shape[0], img.shape[1]) // 150, 20)
mask = (img.max(axis=2) - img.min(axis=2)) > 20
kernel = np.ones((ks, ks), np.uint8)
mask = cv2.erode(mask.astype(np.uint8), kernel)
nonzero_pixels = np.column_stack(np.where(mask > 0))
if nonzero_pixels.size < (img.size // 60):
return img
else:
min_y, min_x = np.min(nonzero_pixels, axis=0)
max_y, max_x = np.max(nonzero_pixels, axis=0)
return img[max(0, min_y-ks):max_y+ks+1, max(0, min_x-ks):max_x+ks+1, :]
步骤2. 调整至512×512尺寸(TMA图像尺寸*0.33*0.5≈512,可直接调整至512进行预测)
步骤3. 使用WSI训练模型进行预测
aux_label预测值<0.5(与不预测"Other"的TMA相比,公开榜分数+0.03,私有榜分数+0.06)
投票法(与单模型相比,可能仅提升+0.01)
图像分割