8th place solution | 优胜方案

第8名解决方案

作者：Kuro (Master) | 发布时间：2021-02-23

这是我第一次获得金牌🎉
感谢所有参加这次比赛的人。
在这次比赛中，许多选手分享了自己的想法和解决方案。
我从中学到了很多。

这是我获得金牌的笔记本。
https://www.kaggle.com/enukuro/rps-8th-place-solution

初次尝试

在早期阶段，我首先尝试使用神经网络学习顶尖选手的策略，但完全失败了。

开发基线

接下来我阅读了讨论区，决定遵循他们的想法。
我认为 Going meta with Kumoko 是一个很好的起点，于是开始开发像 Kumoko 这样的基线代理。

我知道集成是关键，所以想整合许多算法，但发现这是一项非常累人的工作……
我不擅长处理 Python 中的变量作用域问题😅
所以我需要为这个问题想出一些解决方案。
这是我的解决方案：
加载、重写并保存文件。
通过这种方式，我可以保持变量作用域的整洁，并轻松集成任何算法😄

with open("/kaggle_simulations/agent/rps-dojo-data/black_belt/centrifugal_bumblepuppy_v4.py") as f:
    s = f.read()
    s = s + "\n\ndef set_output(_output):\n    global gg\n    gg['output'] = _output\n\n"         
with open(BASE_PATH+"centrifugal_bumblepuppy_v4.py", mode='w') as f:
    f.write(s)
with open(BASE_PATH+"centrifugal_bumblepuppy_v4_mirror.py", mode='w') as f:
    f.write(s)
    
import centrifugal_bumblepuppy_v4
import centrifugal_bumblepuppy_v4_mirror

def centrifugal_bumblepuppy_v4_agent(observation, configuration, my_last_action):
    if observation.step == 0:
        importlib.reload(centrifugal_bumblepuppy_v4)
    else:
        centrifugal_bumblepuppy_v4.set_output(['R','P','S'][my_last_action])    

    return centrifugal_bumblepuppy_v4.run(observation, configuration)

def centrifugal_bumblepuppy_v4_mirror_agent(observation, configuration, my_last_action):
    if observation.step == 0:
        importlib.reload(centrifugal_bumblepuppy_v4_mirror)
        observation_mirror = observation
    else:
        observation_mirror = {"step": observation.step, "lastOpponentAction": my_last_action}
        centrifugal_bumblepuppy_v4_mirror.set_output(['R','P','S'][observation.lastOpponentAction])    

    return BEAT[centrifugal_bumblepuppy_v4_mirror.run(observation, configuration)]

其他部分没什么特别的。

动作选择

通过 dllu 分数进行加权，并使用 random.choices 进行选择。
best_index = random.choices(range(len(scores)),weights=([max(0, score) for score in scores]))[0]

一个调整

根据分数历史改变策略。
在情况糟糕时，它会使用更多的随机性。
但我不确定这是否有效。
在评估期之前，这个策略似乎效果不佳，所以我没有深入探索。
可能许多其他选手也尝试过类似的策略。

    if observation.step > 0 and strategy_type == 0 and sum(score_history[-strategy_assess_length:]) < -strategy_change_point:
        strategy_type = 1
    if  strategy_type == 1:
        my_action = BEAT[my_action]
        if random_strategy and random.random() < 0.5:
            my_action = random.randint(0, 2)
        strategy1_count += 1
    if strategy1_count > strategy1_count_length:
        strategy_type = 2
        strategy1_count = 0
    if strategy_type == 2:
        strategy2_count += 1
        my_action = random.randint(0, 2)  
    if strategy2_count > strategy2_count_length:
        strategy_type = 0
        strategy2_count = 0
    if strategy_type == 3:
        my_action = random.randint(0, 2)

结论

在后期阶段，我在中期阶段的代理全部被淘汰了，可能是因为

同比赛其他方案

1st Place Solution - Where is my bag's Journey

2nd Place Solution

4th Place Solution - Taaha Khan

10th Place - Layered Multi-window Counting

12th place solution - lightGBM model