1. 排行榜分数

公共分数: 0.15187; 私有分数: 0.15349

2. 数据预处理

2.1 图节点特征

输入节点特征:

RNA_ONE_HOT = {
    'A': [1, 0, 0, 0],
    'U': [0, 1, 0, 0],
    'G': [0, 0, 1, 0],
    'C': [0, 0, 0, 1]
}

输出节点特征: DMS_MaP 和 2A3_MaP 分数，在训练时被裁剪到 [0, 1] 范围内。

2.2 边特征

对于原始RNA序列中的相邻边，边特征 = 1.0；
对于Ribonanza位置对边，边特征 = 1.0 + 10 × Watson-Crick碱基对概率，这些概率记录在提供的 Ribonanza_bpp_files 文件中。

3. 模型

我选择的模型是 GraphGPS: https://github.com/rampasek/GraphGPS

3.1 基本层

基本的GraphGPS层由图Transformer和Gated GCN组成。

3.2 位置编码

我一开始尝试了Laplacian位置编码，但它并没有提升GraphGPS的性能，这可能意味着在面对拓扑多样性时，Laplacian位置编码并不有效。
因此，我在GraphGPS中移除了位置编码。

3.3 模型超参数

model:
  type: GPSModel
  loss_fun: l1
gt:
  layer_type: CustomGatedGCN+Transformer
  layers: 16
  n_heads: 8
  dim_hidden: 256
  dropout: 0.1
  attn_dropout: 0.1
  layer_norm: False
  batch_norm: True
gnn:
  head: inductive_node
  layers_pre_mp: 0
  layers_post_mp: 3
  dim_inner: 256
  batchnorm: True
  act: relu
  dropout: 0.0
  agg: mean
  normalize_adj: False
optim:
  clip_grad_norm: True
  optimizer: adamW
  weight_decay: 1e-5
  base_lr: 0.001
  max_epoch: 50
  scheduler: cosine_with_warmup
  num_warmup_epochs: 3
share:
  dim_in: 4
  dim_out: 2
  num_splits: 3
  edge_dim_in: 1

3.4 数据增强

我尝试了两种数据增强方法：子图采样和RNA序列反转。
具体来说，子图采样是从原始图中随机选择一个中心节点，并采样其k步邻居以生成新图；RNA序列反转是指在训练时随机反转输入序列。
通过实验，我发现RNA序列反转对这个任务更有效。

GraphGPS GitHub仓库 https://github.com/rampasek/GraphGPS

[37th place solution🥈] Single Graph Transformer + Gated GCN Model

[第37名解决方案🥈] 单一图Transformer + Gated GCN模型