2nd Place Solution

第二名解决方案

作者：Georg Streich
发布时间：2021-02-23

这段代码最终成为了我得分最高的提交，所以我想简单介绍一下它的工作原理。

像许多其他顶尖的提交一样，它使用了一个公共智能体的集成，其中包括来自 rpscontest.com 的前 50 名智能体以及人们在这里发布的一些强力智能体。值得注意的是，我想感谢以下几位：

memory patterns v7，作者 @yegorbiryukov
geometry bot，作者 @superant
multiarmed bandit v32，作者 @ilialar

我还为上述每一个智能体添加了一个“邪恶双胞胎”智能体。

对于这次比赛，最重要的一点可能就是如何从集成中选择行动。关于这一点，对于一个好的选择策略应该具备哪些属性，我有几点想法：

它应该能快速适应对手
因为比赛非常随机，它的决策不应仅基于少数观察结果
它应该相当随机，以免暴露底层智能体中可检测的模式

基于这些想法，我想出了这样一个策略：首先选择一组在一段时间内表现良好的智能体，然后根据它们在最后几步的表现，从这个组中选出最终的智能体。最后，智能体的得分不是确定性的，而是从分布中抽取的，通过这种方式我引入了一些随机性。我也尝试过许多其他更复杂的策略，但这一个似乎效果最好。

最后，我想感谢 Kaggle 主办这次比赛，感谢其他所有人的参与以及在论坛上进行的有趣讨论。

import numpy as np
from copy import deepcopy
from kaggle_environments import agent
from glob import glob
from functools import partial

# path = '/kaggle_simulations/agent/'
path = 'top/'

def rps_agent(filename):
    code = compile(open(filename).read(), filename, 'exec')

    gg = {}
    def run(observation, actual):
        if observation.step > 0:
            inp = 'RPS'[observation.lastOpponentAction]
            outp = 'RPS'[int(actual)]
        else:
            inp = ''
            outp = ''

        gg['input'] = inp
        gg['output'] = outp

        exec(code, gg)

        return {'R': 0, 'P': 1, 'S': 2}[gg['output']]

    return run

def kaggle_agent(filename): return agent.get_last_callable(agent.read_file(filename))

def score(history, predictions):
    actual = history[:, -1]

    n = actual.shape[0]
    p = lambda x: np.sum(predictions == (actual[:, np.newaxis] + x) % 3, axis=0) / n

    p_win = p(1)
    p_lose = p(-1)

    return np.random.uniform(0, np.maximum(0, p_win - p_lose))

def dirichlet(history, predictions):
    actual = history[:, -1]

    n, m = predictions.shape[:2]

    n_outcome = np.array([
        np.sum(predictions == (actual[:, np.newaxis] + i) % 3, axis=0)
        for i in range(3)
    ]).T

    return np.array([np.random.dirichlet(n_outcome[i] + 1) for i in range(m)])[:, 1]

def select_best(history, predictions, w, scoring_func, k):
    w = min(history.shape[0], w)

    q = scoring_func(history[-w:], predictions[-w:])
    best = np.argpartition(q, -k)[-k:]

    return best

def reverse_agent(agent):
    def f(observation, actual):
        if observation.step > 0: actual_actual = observation.lastOpponentAction
        else: actual_actual = np.random.randint(0, 3)

        observation = deepcopy(observation)
        observation.lastOpponentAction = actual

        return agent(observation, actual_actual)

    return f

rps_agents = [
    partial(rps_agent, filename) for filename in glob(path + 'rps/*.py')
]

kaggle_agents = [
    partial(kaggle_agent, filename) for filename in glob(path + 'kaggle/*.py')
]

agents = rps_agents + kaggle_agents

instances = [
    *[agent() for agent in agents],
    *[reverse_agent(agent()) for agent in agents],
]

N = 1000

predictions = np.zeros(shape=(N, len(instances))).astype(np.int8)
predictions[0] = np.random.randint(0, 3, size=len(instances))

history = np.zeros(shape=(N, 2)).astype(np.int8)

n = 0

def run(observation, configuration):

第二名解决方案

同比赛其他方案