The 80/20 of AI

Written By

Brentan Alexander, Ph.D.

Edward Shenderovich

ChatGPT’s public launch in 2022 was one of those rare moments in life where the science-fiction world of tomorrow is violently accelerated to the present. That viral moment reshaped our understanding of what is possible and the world in which we live, and what followed was a disorienting mixture of unbounded euphoria and existential dread. While everyone agreed that the world would never be the same, the “how” of the change landed somewhere between boundless riches and the total destruction of life itself. Nearly four years on, AI has indeed reshaped how several industries run and work, yet the extinction of white collar labor has failed to materialize.

Where AI can deliver and where it can fall short

It’s not hard to see why. Ask Claude to help write a post like this, and you’ll get an overconfident suck up who fawns over the brilliance of your insights before delivering a frustratingly vague multi-paragraph essay. The result, complete with a refined vocabulary and em dashes, masquerades as elite thought, a new entry in the uncanny valley that somehow says very little despite the impressive volume of content delivered to your screen.

Which is not to argue that LLMs are valueless. Quite the opposite! Even as you inevitably review and rehabilitate the work these tools deliver, you still find yourself coming back again for more: writing code, drafting a pitch deck, reviewing a contract, interpreting a dream. You dutifully copy and paste the results and start anew on the next task at hand. LLMs are excellent at getting you 80% there at unprecedented speeds, and getting to 80% instantly is genuinely useful. In cases, like prototyping, where 80% is actually good enough, LLMs are downright transformative, especially when they enable those without expert knowledge to reach the same bar.

The last 20% is where judgment lives

This is where the seductiveness and the perilousness of LLMs actually lies – instant gratification, and yet slightly off. All responses are lopsided, all are compromises, all get you near your destination, but none are exactly there. What remains is judgment: the capacity to know what's wrong with the output and the specific expertise to fix it. The cost of catching and fixing that last 20% becomes the challenge to solve, and success in this area will separate the solutions that deliver meaningful impact from those that fake it. The pathway to success lies in understanding how domain knowledge and the mechanisms of testing and feedback can help identify and fix that missing 20%.

From this point of view, it is not surprising that software development is the arena in which LLMs have made the most impact by far. Software development is ripe for LLM disruption because of two primary conveniences. First, domain knowledge has been collected, organized, and centralized on internet resources such as github and stackoverflow for decades, putting a wealth of expert information an internet search away. Second, you can instantly test generated code in a production-like environment, for very cheap, and fix just as fast. Linters deterministically check code for both errors and formatting. Executing the code is a literal test of its performance. Even issues that slip through can be readily patched with updates deployed at the speed of a click. In software, the knowledge and capability to catch and fix the 20% can be realized for virtually no cost.

Unfortunately, other domains are harder to solve. This is why Claude can’t get this article right on its own: despite an overwhelming database of language and written materials from which to draw (the expert knowledge), LLMs lack any deterministic or automated feedback mechanism to check the quality of their work. It’s no surprise that in the two years since slop took off as the descriptor for AI output, our inboxes are still filled with “personalized” and “targeted” messages obvious in their machine-produced construction. The bot has no way to know it sounds like a bot.

Sending a bad email is virtually cost-free, but when you start dealing with the physical world, errors are hardened in concrete. As a result, the entire industrial infrastructure industry uses a set of workflows and best-practices in the development of any facility, with a long list “if you know you know” acronyms: FEL, HMB, FEED, HAZID, FMEA, BOD, and more. Whether you are designing a gas cracker, a geothermal plant, a critical minerals refinery, or a liquid-cooled data center, those who build in the physical-world follow these pre-defined steps to manage risks and lower the probability of an expensive bug slipping through. Unfortunately, in this world there is no such thing as open source. Industrial designs, site databases, permitting know-how, all are locked away in proprietary data vaults, with lessons learned treated as trade secrets – when you make a $10M mistake, you don’t share the solution with competitors.

This presents a significant challenge to working with LLMs in this domain. Just enough exists for models to produce something that looks like the outputs infrastructure developers expect. They will have the right sections, credible terminology, and plausible numerics. They will also be wrong in ways that range from obvious on their face to completely invisible unless you have spent decades doing this work. The mass balance will not close, yields will be lifted from a grant application describing ideal lab-scale conditions, equipment lists will be missing balance-of-plant skids and regulatory requirements like firewater systems that significantly contribute to project costs. It’s 80% of the way there, masquerading as 100% complete. When getting things wrong means multi-million dollar failures, “80% there” suddenly becomes a zero.

More training and better harnesses are never going to solve this alone. The solution is not a better prompt. In domains where specialized expertise is locked away and where mistakes are financially catastrophic and physically irreversible, a different kind of approach is required, one that recognizes the need for better knowledge and effective feedback loops. What’s needed are domain-specific systems that know when they do not know and that treat the last 20% not as a gap to be ignored but as a constraint to be respected by the architecture. A system that drives down the cost of catching and fixing the 20%.

How Roebling closes the 80/20 gap

This is what we are building at Roebling. We spent four years doing the R&D work required to tackle the complexity of real-world industrial infrastructure development and poured that knowledge into a deterministically grounded AI model for capital projects — a solution for a domain where the stakes of the last 20% are categorically different. Roebling puts domain-specific expertise back at the center, using hard-won experiential knowledge and deterministic physical modeling as key pieces to improve model outputs. We can generate and validate a Class 5 estimate quicker than it takes to write an RFP for the same work while identifying areas of risk and unknowns – it’s a framework to support the user in reaching higher levels of accuracy at equally impressive speeds. Roebling runs the numbers, cites its sources, and shows you the work, generating a digital original before the physical twin is built.

For domains where “Move fast and break things” is a recipe for literal failure, these types of specialized solutions will be needed to take LLMs from “fast” to “fast and trustworthy”. Specialized systems need to be built to ensure model results reflect something that actually obeys the rules of the discipline, in the same way linters and direct code execution force software LLMs to write ‘good’ code. Any other pathway is one that leads to results that merely appear reasonable, right up to the moment things fail in the real world.

About the author

Brentan Alexander is the chief technology officer of Roebling, an AI-native platform for industrial infrastructure planning and development, with clients across biomanufacturing, chemicals, critical minerals, and energy. Prior to this role, he held successive executive roles at New Energy Risk, a technology performance insurer working to support first-of-a-kind projects, serving as chief science officer, chief commercial officer, and president. Over his career, he has reviewed over one thousand projects and supported the structuring and financing of more than $3 billion in total project capital across novel energy and industrial technologies. He has also served as an independent engineer for clients including Shell, BP, Microsoft, and General Motors, and has been a contributing writer to Forbes. Alexander holds BS and MS degrees in mechanical engineering from MIT and a PhD in mechanical engineering from Stanford University, where he studied solid fuel gasification and electrochemical energy conversion.

Edward Shenderovich is co-founder and CEO of Roebling, an AI infrastructure engineering company. He previously co-founded Knotel (acquired by Newmark), Merchantry (acquired by Tradeshift), and Kite Ventures, where he invested and served as a board member at Delivery Hero, Plated, and Domo. He started his career as an Industry Analyst at Silicon Valley Bank. Edward is a graduate of UC Berkeley and a fellow at Universita Cattolica del Sacro Cuore.

See Roebling in Action

Explore how AI-powered process engineering accelerates design, cost analysis, and infrastructure decisions.

Request Access

See Roebling in Action

Explore how AI-powered process engineering accelerates design, cost analysis, and infrastructure decisions.

Request Access

Designed for those who build.

Roebling is where the most ambitious industrial projects start. Roebling offers a first-of-its-kind platform for industrial process engineers and R&D teams in biomanufacturing, chemicals, critical minerals, and beyond.

Industries

Overview

Use Cases

Overview

Capital Projects Teams

R&D Teams

Equipment Manufacturers

Company