DeepMind AI solves math problems but not AGI

London: In a significant development in artificial intelligence research, Google DeepMind has announced that its advanced system, AlphaProof Nexus, has autonomously solved several long-standing mathematical problems. While the breakthrough highlights rapid progress in AI capabilities, the company’s CEO Demis Hassabis has clarified that such achievements do not yet amount to artificial general intelligence (AGI).

The claim comes shortly after OpenAI reported that one of its models had solved a well-known mathematical problem originally posed by Paul Erdős. Together, these developments have intensified discussions around the evolving capabilities of AI in advanced mathematics.

AI solves long-standing Erdős problems

According to DeepMind researchers, AlphaProof Nexus has solved nine open Erdős problems, some of which had remained unsolved for up to 56 years. These problems are known for their complexity and have challenged mathematicians for decades.

In addition to these, the AI system has reportedly:

Proven 44 open conjectures from the Online Encyclopedia of Integer Sequences (OEIS)
Resolved a 15-year-old problem in algebraic geometry
Discovered a new optimisation parameter that had not been identified earlier

Researchers also claimed that each problem was solved at a relatively low computational cost of just a few hundred dollars, indicating improvements not only in capability but also in efficiency.

Lean-based verification ensures accuracy

One of the most notable aspects of this development is the method used to verify the mathematical proofs. DeepMind combined large language model reasoning with Lean, a formal verification system widely used in advanced mathematics.

In this setup, the AI generates proof attempts, while Lean checks each step against strict logical rules. This ensures that every part of the solution is mathematically sound and eliminates the need for extensive manual verification by human experts.

This approach is being seen as a major advancement over earlier AI systems, which often relied on human review to validate results.

Tackling hallucinations in AI-generated maths

DeepMind’s announcement also draws attention to a known limitation in artificial intelligence — hallucinations. In the context of mathematics, hallucinations occur when AI systems generate results that appear convincing but contain logical errors.

Researchers highlighted several common issues:

AI may invent mathematical lemmas that are not proven
It may skip critical steps in complex proofs
It may present incomplete reasoning as a finished solution

Such errors can be difficult to detect through informal review because the explanations often sound technically correct. By using formal verification tools like Lean, DeepMind aims to eliminate these risks and ensure complete accuracy.

Not a step towards AGI, says Hassabis

Despite the scale of the achievement, Hassabis has emphasised that the system does not represent AGI — a form of AI that can think and reason like humans across a wide range of domains.

He stated that solving complex mathematical problems, while impressive, reflects expertise in a narrow domain rather than general intelligence. True AGI would require the ability to demonstrate creativity, adaptability and understanding across multiple disciplines.

Hassabis also pointed out that current AI systems still fall short of the kind of deep originality demonstrated by legendary mathematicians such as Srinivasa Ramanujan.

Implications for the future of mathematics

The development could significantly impact how mathematical research is conducted. By automating parts of the proof-generation process, AI systems like AlphaProof Nexus can assist mathematicians in exploring complex problems more efficiently.

Researchers noted that even when the AI does not fully solve a problem, its intermediate steps and proof attempts can help human experts better understand difficult concepts and focus on unresolved areas.

This collaborative model, where AI handles computation and verification while humans guide intuition and creativity, could redefine research workflows in mathematics and related fields.

Conclusion

DeepMind’s AlphaProof Nexus represents a major milestone in the use of artificial intelligence for solving complex mathematical problems. By combining autonomous reasoning with formal verification, the system addresses long-standing concerns about accuracy in AI-generated proofs.

However, as emphasised by the company’s leadership, such advances remain far from achieving true artificial general intelligence. For now, AI is emerging as a powerful assistant in specialised domains, but the journey towards human-like general intelligence continues.