363. Bengali.AI Handwritten Grapheme Classification | bengaliai-cv19
更新于 2020/03/19
感谢主办方和所有有帮助的讨论。 特别感谢 @hengck23 的慷慨分享。
我很幸运能获得金牌。 我原本预期公开榜/私有榜是随机划分的,所以对于未见过的图形单词没有做特殊处理。
看起来针对宏平均召回率的后期处理技巧真的很重要。

# 加载数据
train = pd.read_csv(C.datadir/'train.csv')
train_labels = train[['grapheme_root', 'vowel_diacritic', 'consonant_diacritic']].values.astype(np.int64)
# 成分标签
parts_pre = pd.read_csv(C.datadir/'class_map.csv').component
parts = np.sort(np.unique(np.concatenate([list(e) for e in parts_pre])))
parts = parts[parts != '0'] # 0 没有意义
print("parts:", parts) # 显示 61 个部分
train_labels_comp = []
for grapheme in train['grapheme'].values:
train_labels_comp.append([part in list(grapheme) for part in parts])
train_labels_comp = np.array(train_labels_comp).astype(np.int64)
print("train_labels_comp.shape", train_labels_comp.shape)
if True: # 调试
print("train_labels_comp", train_labels_comp[0].tolist())
print("train_labels_comp", train_labels_comp[1].tolist())
print("train_labels_comp", train_labels_comp[2].tolist())