 
 Artificial intelligence researchers claim to have made the world’s first scientific discovery using a large language model, a breakthrough that suggests the technology behind ChatGPT and similar programs can generate information that goes beyond human knowledge.
The finding emerged from Google DeepMind, where scientists are investigating whether large language models, which underpin modern chatbots such as OpenAI’s ChatGPT and Google’s Bard, can do more than repackage information learned in training and come up with new insights.
“When we started the project there was no indication that it would produce something that’s genuinely new,” said Pushmeet Kohli, the head of AI for science at DeepMind. “As far as we know, this is the first time that a genuine, new scientific discovery has been made by a large language model.”
Large language models, or LLMs, are powerful neural networks that learn the patterns of language, including computer code, from vast amounts of text and other data. Since the whirlwind arrival of ChatGPT last year, the technology has debugged faulty software and churned out everything from college essays and travel itineraries to poems about climate change in the style of Shakespeare.
But while the chatbots have proved extremely popular, they do not generate new knowledge and are prone to confabulation, leading to answers that, in keeping with the best pub bores, are fluent and plausible but badly flawed.
To build “FunSearch”, short for “searching in the function space”, DeepMind harnessed an LLM to write solutions to problems in the form of computer programs. The LLM is paired with an “evaluator” that automatically ranks the programs by how well they perform. The best programs are then combined and fed back to the LLM to improve on. This drives the system to steadily evolve poor programs into more powerful ones that can discover new knowledge.
The researchers set FunSearch loose on two puzzles. The first was a longstanding and somewhat arcane challenge in pure mathematics known as the cap set problem. It deals with finding the largest set of points in space where no three points form a straight line. FunSearch churned out programs that generate new large cap sets that go beyond the best that mathematicians have come up with.
The second puzzle was the bin packing problem, which looks for the best ways to pack items of different sizes into containers. While it applies to physical objects, such as the most efficient way to arrange boxes in a shipping container, the same maths applies in other areas, such as scheduling computing jobs in datacentres. The problem is typically solved by either packing items into the first bin that has room, or into the bin with the least available space where the item will still fit. FunSearch found a better approach that avoided leaving small gaps that were unlikely ever to be filled, according to results published in Nature.
“In the last two or three years there have been some exciting examples of human mathematicians collaborating with AI to obtain advances on unsolved problems,” said Sir Tim Gowers, professor of mathematics at Cambridge University, who was not involved in the research. “This work potentially gives us another very interesting tool for such collaborations, enabling mathematicians to search efficiently for clever and unexpected constructions. Better still, these constructions are humanly interpretable.”
Researchers are now exploring the range of scientific problems FunSearch can handle. A major limiting factor is that the problems need to have solutions that can be verified automatically, which rules out many questions in biology, where hypotheses often need to be tested with lab experiments.
The more immediate impact may be for computer programmers. For the past 50 years, coding has largely improved through humans creating ever more specialised algorithms. “This is actually going to be transformational in how people approach computer science and algorithmic discovery,” said Kohli. “For the first time, we’re seeing LLMs not taking over, but definitely assisting in pushing the boundaries of what is possible in algorithms.”
Jordan Ellenberg, professor of mathematics at the University of Wisconsin-Madison, and co-author on the paper, said: “What I find really exciting, even more so than the specific results we found, is the prospects it suggests for the future of human-machine interaction in math.
“Instead of generating a solution, FunSearch generates a program that finds the solution. A solution to a specific problem might give me no insight into how to solve other related problems. But a program that finds the solution, that’s something a human being can read and interpret and hopefully thereby generate ideas for the next problem and the next and the next.”
 
         
       
         
       
         
       
         
       
       
       
         
       
       
       
       
    