The AI Scaling Bottleneck: Moving from Human-In-The-Loop to Human-On-The-Loop

Treating AI as a synchronous, human-gated process kills the very efficiency it promises. If organisations are going to actually scale these workflows, they need to transition from Human-In-The-Loop (HITL) to Human-On-The-Loop (HOTL).

Most of us are familiar with the concept of a Human-In-The-Loop (HITL). A predefined process workflow has one or more decision points that involve human oversight. It's synchronous, blocking, and required during each run.

This is a safe default position when automating processes. Risk is managed, and introducing reliable agentic components still yields noticeable performance improvements. However, this progress comes at a cost: the speed of the loop—and therefore the ceiling on efficiency gains—is strictly limited by human latency.

While HITL is prudent and sometimes legally mandated (e.g., Article 22 of the GDPR), true economies of scale are achieved when regulation permits carefully and programmatically moving to a Human-On-The-Loop (HOTL) model.

Human-On-The-Loop (HOTL) shifts oversight from "mandatory intervention" to "conditional oversight," triggered only when a workflow—or group of workflows—meets specific criteria. By its nature, the overwhelming majority of processes execute autonomously, drastically increasing both efficiency and performance.

Several approaches make this safe and practical:

Confidence Thresholding: The model evaluates its own uncertainty, routing workflows to a human only when its confidence score falls below a predefined baseline.
Anomaly Detection: Using automated monitoring to flag outlier data or unexpected execution paths, diverting only unusual workflows to human supervisors.
Rule-Based Routing: Establishing deterministic policies where specific, high-risk parameters (e.g., transactions over a certain value) trigger human review, while others bypass it.
Statistical Sampling: Applying the synchronous HITL model to a random, statistically meaningful subset of autonomous executions to continuously verify performance and maintain confidence intervals.

Making the leap from HITL to HOTL requires a foundational shift in trust. You cannot rely on these routing frameworks if you are flying blind. Deep observability isn't just a safety net for this transition; it is the bedrock of an entirely new operating model. Crucially, once you have the instrumentation in place to confidently monitor these workflows, a secondary, massive advantage unlocks: the hard data needed to stop over-relying on expensive frontier models.

As Peter Drucker famously noted, "What gets measured gets managed." Tracking the execution paths of workflows and the frequency of human interventions provides the baseline you need to optimise. Rather than looking at an isolated list of metrics, tracking the continuous interplay between system latency, operational cost, and output quality (like accuracy or F1 scores) dictates your strategy.

The capability gap between different classes of models is shrinking rapidly. While it is tempting to believe that massive frontier models are required for every task, recent advances in highly capable, smaller models (such as the Gemma or Qwen families) prove that frontier models are increasingly overkill for scoped, targeted workflows.

However, lacking observability and a flexible framework to accommodate different architectures—from edge deployments to the data centre—organisations will struggle to safely capture this value.

If you have no objective measure of performance and no way to seamlessly swap models, you will always be disappointed by how expensive AI appears to be.