Towards AGI via Specialized Intelligence
The case for the Bayesian modular mind... or scaling is NOT all you need
The artificial intelligence (AI) industry currently stands at a thermodynamic and architectural inflection point. For the past decade, the dominant narrative has been simple: scaling is all you need. This conviction drove an industrial arms race to train models of staggering scale, operating on the premise that general reasoning would emerge naturally from huge pre-training runs. Yet, despite these massive investments, recent frontier models like GPT-5 and Llama 4 showed signs of diminishing returns, hitting what may be a "cognitive scaling wall."
The AI2027 scenario, proposed in April 2025, predicted that super-exponential growth of AI capabilities would lead to a superintelligence explosion by 2027. However, we are not on track. Reality is forcing a recalibration (see Figure 1). Even authors of the AI2027 essay, such as Daniel Kokotajlo, have revised their artificial general intelligence (AGI) estimates, pushing dates back to 2034. Improvements in AI capabilities have not lived up to expectations, and progress has been further constrained by the inertia and complexities of the real world.
Figure 1: The influential AI2007 essay predicted a superintelligence explosion by 2027. Although, adding recent data to the graph, this looks increasingly unlikely as we hit the hard limits of monolithic scaling. Source: Post by Samuel Albanie.
While recent developments like Gemini 3 do show impressive results—potentially bucking the trend—the nuance lies in how this was achieved. Sebastian Borgeaud, who leads pre-training for Gemini 3 at Google DeepMind, recently shared in a podcast that they are "not really building a model anymore" but are instead "building a system."
The shift from monolithic models to systems reignites the debate around Rich Sutton's famous "bitter lesson". Sutton argued that leveraging general computation and scaling always outperforms human-designed approaches. However, as Gary Marcus and others have countered, we are learning "the bitter lesson about the bitter lesson". Scaling works exceptionally well for some problems, typically pattern recognition and fluency, but it hits a wall for others, particularly robust reasoning and logic. Pure compute without structure leads to inefficiency and fragility. As Ilya Sutskever stated recently, the field is moving "from the age of scaling, to the age of research."
The Dangers of Digital Monoliths
To understand why a change of course is critical, we need to examine the inherent dangers of the current paradigm. We are currently building "digital monoliths"—systems where memory, reasoning, facts, and safety protocols are all mashed into a single, opaque set of weights. The danger of this approach is best illustrated by a tragic lesson from engineering history: the Therac-25 radiation therapy disaster.
Figure 2: The Therac-25 radiation therapy machine. Its catastrophic failure was due to a reliance on a single monolithic software system for both operation and safety, without independent checks.
Therac-25 was a radiation therapy machine that tragically delivered massive radiation overdoses that were fatal for many patients. It relied on a single software system to handle both the beam control and the safety checks. In previous models, independent electromechanical switches served as hardware interlocks. When the Therac-25 designers removed these modular checks, believing the software monolith could handle everything, they created a single point of catastrophic failure. When the software failed, there was no independent system to stop it.
Current large language models (LLMs) are the Therac-25s of intelligence. Most safety initiatives ask the single monolithic LLM system to police itself. When the monolith "hallucinates", there is no independent check to catch it.
This fragility is a primary reason why 95% of enterprise AI pilots fail to reach production, as found in a recent MIT study. Businesses operate on reliability, auditability, and state management—qualities that stochastic monoliths fundamentally lack. Safety and reliability must be architected in, not just trained in.
Biological & Philosophical Inspiration
The solution to the monolith lies in revisiting the biological and philosophical foundations of intelligence. AGI will not be a singular "ghost in the machine" but an emergent property of a structured system. To build a robust AGI, we must synthesize the insights of several intellectual giants.
Figure 3: The philosophical lineage of the modular mind: from Minsky’s Society of Mind, to Fodor’s Modularity of Mind.
Marvin Minsky argued in The Society of Mind that intelligence is not the product of a single, unified "self," but rather the emergent property of a vast, organized society of simple processes or agents. He famously rejected the idea of a single internal executive intelligence, as he feared such a "ghost in the machine" would elude further explanation. Instead, he proposed that complex intelligence arises from the organization of simple parts, much like a brain arises from simple neurons. However, we know from the physiology of the brain that it is not simply a soup of neurons. Rather, it has an architecture and structure.
Jerry Fodor took this structural view further with his theory of The Modularity of Mind. Fodor suggested the mind consists of encapsulated, specialized modules—like vision or language parsing—that operate in isolation. However, Fodor also posited a "central reasoning system" to integrate these inputs. This concept clashed with Minsky's view, as Minsky feared a central system would simply reintroduce the unexplained "ghost" or homunculus, kicking the can of understanding intelligence down the road. But perhaps there is another way to provide a central reasoning system.
One alternative is the Bayesian brain hypothesis put forward by Karl Friston, a theoretical neuroscientist. Friston proposes that the brain acts as an inference engine. It updates existing prior beliefs with new sensory information to inform posterior beliefs by following Bayes' theorem. Crucially, a Bayesian central system is a statistical mechanism, not an intelligent agent in itself. Thus, it can provide the necessary coordination for Fodor's modules without introducing the "ghost" Minsky feared.
The specialized modules of Fodor's modular mind could take a variety of forms, including fast and slow thinkers. Daniel Kahneman popularized the distinction between System I and System II thinking in his best-seller, Thinking, Fast and Slow. System I thinking is typically fast, intuitive and emotional. System II thinking, in contrast, is typically slow, deliberate and logical. LLMs are essentially System I thinkers, spitting out the next word (token) that comes to mind. System II thinking is critically lacking in most of today's AI systems (while some may argue chain of thought reasoning is System II thinking, it's difficult to justify that running System I thinking for longer really constitutes true System II thinking).
The Bayesian Modular Mind
Synthesizing these foundations gives us a blueprint for a new architecture, where a Bayesian modular mind emerges. The core thesis is that AGI will be a federated society of specialized expert modules governed by a rigorous Bayesian central executive.
Figure 4: The components of the Bayesian modular mind.
In this synthesis, Minsky provides the philosophical rationale that intelligence emerges from simple parts. Fodor provides the architectural structure of encapsulated modules, with a consensus mechanism. Friston provides the consensus mechanism, ensuring the central system is a Bayesian inference mechanism rather than a "ghost in the machine." Fodor's specialized modules can be either Kahneman's System I or System II thinkers, allowing modules to be fast "knowers" or slow "checkers." The result is a system of specialized intelligences coordinated by a unifying Bayesian framework.
Monolithic Failures and the Bayesian Modular Advantage
This architecture offers a mechanistic explanation for the failure modes of monolithic AI. The core failure of a monolith is that it lacks a mechanism to separate prior beliefs from reality.
Hallucinations can be understood as the system placing too much weight on its priors. The model trusts its internal training data more than the prompt or the reality it is presented with. It overrides the sensory evidence with its own "dream." Conversely, schizophrenic behaviour occurs when the system places too much weight on sensory inputs. It becomes ungrounded and continually surprised, leading to paranoia.
The Bayesian modular mind solves this by creating a structural gap between priors and new sensations. The executive updates beliefs rationally rather than conflating them. The architecture of the Bayesian modular mind provides four distinct advantages.
Figure 5: Key advantages of the Bayesian modular mind over monolithic approaches.
First, the system is flexible. It incorporates both System I (fast) and System II (slow) thinking modules to handle a wide variety of tasks. Second, it is reliable. The Bayesian belief update mechanism explicitly separates priors from reality, minimizing hallucinations by design. Third, it is robust. System II thinking introduces logical, deliberate "checkers" that can verify the output of fast "knowers," effectively creating an interlock against failure. Finally, it is interpretable. The system is not a black box; we can inspect the inputs and outputs of separate, specialized modules and the belief updates of the executive.
Evidence of the Modular Approach
This shift to a Bayesian modular mind is not merely theoretical; the industry is already moving in this direction. We are seeing the rise of agentic systems, where specialized agents work together in modular workflows rather than relying on single end-to-end models. In addition, many frontier LLMs are adopting mixture-of-experts (MoE) architectures, attempting to build many specialized experts within a single LLM to improve efficiency and performance.
Within agentic systems, we're also seeing the rise of small language models (SLMs), which are much more efficient and often more capable than LLMs for specialized tasks. Research from NVIDIA argues that SLMs are the future of agentic AI, offering sufficient power, better economics, and greater flexibility than monoliths. Similarly, research from Amazon has demonstrated that domain-adapted SLMs can actually outperform significantly larger LLMs on specific tasks like tool calling. For example, a 350M parameter model was shown to beat a 175B parameter model on specific benchmarks.
We are also witnessing the return of neuro-symbolic AI to provide reliable "System II" thinking. Google DeepMind's AlphaGeometry combines a neural language model with a symbolic deduction engine to solve complex geometry problems, achieving gold-medal performance at the International Mathematics Olympiad.
In AI for Science, the most successful AI systems are modular, with highly specialized components. AlphaFold, responsible for the Nobel Prize in Chemistry in 2024, remains the premier example of how extreme specialization yields breakthrough performance that generalist models cannot match.
The Path Forward
The Bayesian modular mind framework offers inspiration for further AI breakthroughs. The Bayesian consensus update mechanism could provide a blueprint for continual learning, allowing models to update their understanding without catastrophic forgetting. By maintaining and updating prior beliefs to form new posteriors, the system naturally integrates a persistent memory mechanism. Furthermore, world models could be seamlessly integrated, serving as specialized simulation modules and/or as part of the repository of the system's prior understanding.
The path to AGI is not to build a monolithic god, but to construct a Bayesian modular mind. We must stop trying to force a single next-token predictor to do everything and instead build a society of specialized intelligences, coordinated by a rational, Bayesian executive. Just as biology evolved from single-celled organisms to complex, multicellular nervous systems, AI is evolving from the monolith to the modular.






