It feels pretty intuitive to me that the ability for an LLM to break a complex problem down into smaller, more easily solvable pieces will unlock the next level of complexity.
This pattern feels like a technique often taught to junior engineers- how to break up a multi-week project into bitesized tasks. This model is obviously math focused, but I see no reason why this wouldn't be incredibly powerful for code based problem solving.
Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.
theorem convergesTo_unique {s : ℕ → ℝ} {a b : ℝ} (sa : ConvergesTo s a) (sb : ConvergesTo s b) :
For fun I tried it on the free model on openrouter.ai. Got the answer the first time.
https://leanprover-community.github.io/mathematics_in_lean/m...
Here's the answer just to give you a feel.
by_contra h
have h₁ : a ≠ b := h
have h₂ : |a - b| > 0 := by
apply abs_pos.mpr
exact sub_ne_zero.mpr h₁
-- Use the definition of convergence to find N₁ and N₂
have h₃ := sa (|a - b| / 2) (by linarith)
have h₄ := sb (|a - b| / 2) (by linarith)
cases' h₃ with N₁ h₃
cases' h₄ with N₂ h₄
-- Choose N to be the maximum of N₁ and N₂
let N := max N₁ N₂
have h₅ := h₃ N (by simp [N, le_max_left])
have h₆ := h₄ N (by simp [N, le_max_right])
-- Derive a contradiction using the triangle inequality
have h₇ : |s N - a| < |a - b| / 2 := by simpa using h₅
have h₈ : |s N - b| < |a - b| / 2 := by simpa using h₆
have h₉ : |a - b| < |a - b| := by
calc
|a - b| = |a - s N + (s N - b)| := by ring_nf
_ ≤ |a - s N| + |s N - b| := by
apply abs_add
_ = |s N - a| + |s N - b| := by
rw [abs_sub_comm]
_ < |a - b| / 2 + |a - b| / 2 := by
linarith
_ = |a - b| := by ring
linarith
A prover model might be used as a tool in the coming future.
> The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench.
Which is 0.07% (edit: 7%) for PutnamBench
Like ollama run deepseek-ai/DeepSeek-Prover-V2-7B
Is it really opensource? Something changed?
- Making correct and smart assumption. Currently all LLM bots are too stupid at making good assumptions. I don't want to explicitly repeat and repeat again my own assumptions while the context is clear enough. Hey bots, try harder.
- LLM bot needs to bring their own secondary and contextual memory in reasoning, i don't want to do it for you, ok ? You're the bot.
- Thinking out of the box. This is the final stage of intelligence. Adapt old technique to make your own technique to solve non-existing before problems.