6th place solution

633. Child Mind Institute — Problematic Internet Use | child-mind-institute-problematic-internet-use

开始: 2024-09-19 结束: 2024-12-19 健康管理与公共卫生数据算法赛

第 6 名解决方案 - m_furu

第 6 名解决方案

作者： m_furu (EXPERT)
发布日期： 2024-12-24
竞赛排名： 第 6 名

各位参赛者们好。
感谢竞赛组织者为我们提供了这次宝贵的经验。

两个月前我就放弃了追赶那些技术更娴熟的参与者。我很惊讶且困惑于放弃后排名的变化。我认识到这个结果是由于运气，而不是我的能力。

我将在下面分享给我带来如此意外结果的解决方案。

概述

模型
三个模型集成（简单平均）。
模型包括 Vision Transformer（更改了输入层代码）、lightgbm、catboost。
预处理
将 parquet 数据聚合为每个 id 一行，使用均值、标准差等。
使用 'to_dummies (polars)' 转换分类变量。
在训练数据中使用 'group_by('Basic_Demos-Age', 'Basic_Demos-Sex').mean()' 填充空值。
使用训练数据的 min_max 标准化特征。
学习
指标：mae
使用 'StratifiedKFold (n_splits=5, y:'sii')' 进行交叉验证。
像许多其他参与者一样，使用 'threshold_rounder'（在集成之后）。
Notebook
点击此处查看

结果

CV (threshold_rounder 之前) : 0.408, CV (threshold_rounder 之后) : 0.481, 公共榜单 : 0.471, 私有榜单 : 0.476

有效的改进

使用 Transformer
使用前的最佳结果（两个模型集成）
CV (threshold_rounder 之前) : 0.389, CV (threshold_rounder 之后) : 0.483, 公共榜单 : 0.463, 私有榜单 : 0.472

无效的改进

优化集成权重
最佳结果
CV (threshold_rounder 之前) : 0.398, CV (threshold_rounder 之后) : 0.483, 公共榜单 : 0.454, 私有榜单 : 0.463
指标：二次加权 kappa
最佳结果
CV (threshold_rounder 之前) : 0.429, CV (threshold_rounder 之后) : 0.485, 公共榜单 : 0.457, 私有榜单 : 0.470
使用 'PCIAT-PCIAT_Total' 作为目标
最佳结果
CV (threshold_rounder 之前) : 未计算，CV (threshold_rounder 之后) : 0.481, 公共榜单 : 0.461, 私有榜单 : 0.471

希望这篇帖子能帮助你理解令人惊讶的排名变化。
感谢阅读！

同比赛其他方案

First Place Write-Up: Or How I Won the Lottery

2nd Place Writeup

Child Mind Institute PIU 3rd Place Solution

4th Place Solution for the Child Mind Institute — Problematic Internet Use competition

5th Place Solution