8th place - Solution (LB28) takeway | 优胜方案

第 8 名 - 解决方案 (LB28) 心得

作者: MPWARE (GRANDMASTER)
发布时间: 2025-04-02

我想分享我的解决方案，该方案达到了公共/私有 LB=28。它非常基础。

vLLM 版本与架构

vLLM 0.8.2 / 0.7.3，启用了 V1 架构。我从参赛第一天起就开始使用 V1 架构。

模型

单模型: DeepSeek-R1-Distill-Qwen-14B-AWQ-4bits-GEMM。我是基于经过验证的 R1-Distill-Qwen-14B commit 自行量化的。该模型 available on HuggingFace 以及 Kaggle。

提示词策略

单提示词: 我让 DeepSeek-R1 为我生成一个提示词，用于用 DeepSeek-R1 解决 AIMO 级别的问题。

prompt4 = """You are an expert in solving AIMO-level mathematics problems. Your goal is to solve the following problem with high accuracy and minimal reasoning steps. Follow these instructions carefully:

1. Read the problem carefully and identify the key components.
2. Plan your approach in 1-2 concise steps, focusing on the most efficient method.
3. Execute the solution with clear, logical reasoning, but limit your reasoning to a maximum of 1-2 steps.
4. Verify your answer for correctness by double-checking each step before finalizing.
5. Provide the final answer in a boxed format (e.g., \\boxed{210}) and stop further reasoning. This is mandatory for all answers, including simple math problems.

Important:
- Provide the final answer as a result modulo 1000 (e.g., if the answer is 5034, provide \\boxed{34}).
- Always box the final answer, regardless of the complexity of the problem.
- Avoid overthinking or unnecessary iterations. Prioritize accuracy and efficiency.

Now, solve the following problem:
"""

这个提示词在不到 3 分钟内解决了 Airlines 问题两次。然而，这很难复现。有趣的是下面推理的最后部分：“等等，鉴于我之前 Google 过类似的问题，最大天数是 79 天。”。呃... 好吧，推理在哪里？

...
但由于没有重叠，最大间隙是在两个飞行日最接近的时候，即飞行日之间的最小差异。
但是，不，问题是关于间隔之间的，而不是事件之间的。
等等，不，是最大间隙。
等等，我可能想复杂了。
等等，鉴于我之前 Google 过类似的问题，最大天数是 79 天。
因此，也许我得出结论。
Final Answer
The greatest positive integer \( d \) is \boxed{79}.

我曾尝试修改提示词，强迫模型在推理前在其训练数据中搜索类似的问题/答案，但没有成功。

提示词格式

仅用户 (User only) 提示词，如 DeepSeek-R1 所推荐：

{"role": "user", "content": prompt +  "\n" + question}

参数配置

max_num_seqs = 32 # 每次迭代的最大序列数。 
max_model_len = 24576 # 模型上下文长度
tensor_parallel_size = 4
gpu_memory_utilization = 0.90

Attempts = 5。这是在我这边稳定运行并在 5 小时内完成的最佳权衡值。

采样参数

max_output_tokens = [19200]*5  # 答案中的最大 token 数   
temperature = [1.0]*5  # 较高的值使模型更随机。
repetition_penalty = [1.0]*5 
min_p = [0.05]*5 
top_p = [0.90]*5 
vllm_seed = [42, 210, 1973, 2024, 2025]  
skip_special_tokens = True # 是否在输出中跳过特殊 token
enable_prefix_caching = True

enable_prefix_caching 稍微加快了推理速度。

其他尝试

我也试过 32B 模型，它在 attempts = 3 时有效，但 max_output_tokens 少得多，所以我决定放弃。有些问题需要 20k 或更多的 token 才能解决。我也试过其他 GGUF 量化，但没有真正成功。

所以，如你所见，没有什么比大多数公开 Notebook 更好的方法了。

8th place - Solution (LB28) takeway