11th Solution: A threshold oriented Competition is not good

第11名方案：一场以阈值调整为导向的比赛并不好

作者： oohara (Expert) | 比赛排名： 第11名

我在23天前创建的模型实际上比我最终提交的模型表现更好。在那之后，我尝试了很多方法，但真正起作用的只有调整阈值，即确定我应该将多大比例的最小值归零。

我“浪费”了大量宝贵的时间来调整阈值。但这对我参加下一场比赛没有任何帮助。

def zero_out(df, column, percent):
    """Zero out all the values that close to 0"""
    """将所有接近0的值归零"""
    index = df[column].nsmallest(int(percent*len(df))).index
    df.loc[index, column] = 0
    print(column, " remain ", len(df)-len(index))

zero_tupe = [
             ("question_type_spelling", 0.99),
             ("question_not_really_a_question", 0.94),
            ("question_type_consequence", 0.92),
            ("question_type_definition", 0.91),
            ("question_type_entity", 0.88),
            ("question_type_compare", 0.96),
            ("question_type_choice", 0.44),
            ("question_conversational",0.78),
            ("question_multi_intent",0.55),
            ]

for column, percent in zero_tupe:
    zero_out(df_sub, column, percent)

分数对比如下（PB, LB）：

未进行归零处理：0.386, 0.409
进行归零处理（23天前）：0.424, 0.450
23天“努力工作”之后：0.424, 0.465

在调整阈值之后，我后来所做的努力实际上损害了PB分数。

11th Solution: A threshold oriented Competition is not good

第11名方案：一场以阈值调整为导向的比赛并不好

同比赛其他方案