← Back

LLM needs to solve hard problems

I owe much of Standard Form's existence to Patrick Blackett. He almost single-handedly gave birth to the field of operations research, which involves solving combinatorially difficult problems. The field has since reached the plateau of productivity that we expect in impactful, legacy technologies like the ATM and mainframes.

During World War 2, Patrick gathered physicists and mathematicians to help the Allies win against formidable German U-boat submarine attacks that crippled much of their naval formations in the Atlantic. Their solution: increase the gravity bomb detonation depth in the sea by 25 meters, thereby increasing hit-rate by ten-fold. They would make countless more recommendations that eventually destroyed U-boat formations and turned the tide of the war.

Patrick and his team modeled real-world constraints of the war, like how untraceable the U-boats were, and developed tools to prescribe solutions for the military. Today, we see its application across train timetabling, routing problems, bin-packing and even efficient packing of geometrical shapes.

I've been building software to solve what is commonly called the 'nurse scheduling' problem, which is to create a schedule from staffing data, using LLMs. The process starts simple: LLMs would write the algorithm, run them, and return the solution. The majority of code generation products today can handle this, but building it and talking to people with these problems day-to-day made me realize that we need a new way of interacting with LLMs now and post-AGI that is more than just a chat interface.

For one, such problems as workforce scheduling and demand forecasting require quantitative models, which today still need a fleet of engineers to build. What we're building for at Standard Form is a simpler case of a hundred-person schedule across the year with ten different rotations. At the limit, we’re talking about simulating an entire country’s railway network schedule, accounting for demand fluctuations and disruptions like a tree falling onto the track. The latter requires a large simulation to model reality; LLMs alone cannot build accurate world models.

Simulating reality is important as the outputs of the so-called constraint programming algorithms are non-deterministic: they solve complex problems efficiently, but can declare a problem infeasible regardless of its hardness. The consequence is that the gap between what is truly feasible and what the algorithm claims is feasible is determined by how accurately the algorithm models reality.

Therefore, an LLM writing such algorithms needs the most accurate understanding of the world to produce a viable solution. This cannot be done solely on human input; we’re simply too inefficient to generate fine-grained simulations. What we need is therefore compute not only for the algorithm itself[1], but also for creating the model. Oh, and lots of sensors to observe the real world.

A product that does this may look more like a simulation tool than an enterprise software product. LLMs would then take a cognitive role, generating code to solve real-world problems from real-world simulations, rather than a front-facing character that defines many AI products today.

That may be enough to convince people to build new LLM interfaces, but that’s not all. From experience, the bar for a usable solution is high in physical scenarios as collateral damage can directly cause bodily harm, but datasets on these kinds of problems are scarce. For reinforcement learning, we need exa-scale data on this problem to reliably build LLMs that can competently replace operations researchers. Current frontier lab models are reliant on the abundance of data on the internet, but this is not the case with optimization problems, which often sit within the confines of corporate seclusion. We’re in a similar situation with real-world AI: too little data.

But here’s the exciting part: the data exists as these problems are not new. We have a good sense of constraint programming problems around us and how many there are. It's only this year, with the arrival of GPT-5 and Claude Sonnet 4.5, that LLMs have started to solve operations research problems without much assistance. Imagine if Patrick Blackett had, at his disposal, an LLM-powered constraint programming tool to help win the war. The war would have ended much sooner.

There has never been a better time to build LLM applications. People like Patrick modeled the chaos of war to make better decisions. Now, we should expect LLMs to write the next chapter of operations research.

Notes

[1] Like calculators, the compute cost for algorithms would trend to zero, so we'll see significant cost for only the most complex combinatorially-difficult problems.