Amdahl's Argument for AI

April 8, 2024

Amdahl's Law

Amdahl's Law approximates the maximum improvement in overall system performance when only part of the system is enhanced. It states that the overall performance gain from upgrading a component is limited by the fraction of time that the improved component is actually used.

For example: If a program spends a significant amount of time reading from and writing to a slow disk, then even a much faster processor will only marginally improve the overall performance, because the disk speed is the bottleneck.

The Bottleneck of AI Applications

The productivity speedup AI apps can provide is limited by how much human-in-the-loop work is required. Humans are ~1-3 tokens per second. They can't really be sped up – unless you're Neuralink.

So if your application requires a human completion for every LLM completion (i.e. ChatGPT or AI Copilots) then your maximum speedup is ~2x – even when LLMs become 10x faster.

Cognition Lab's Devin is better, because it needs a human completion only every ~10 iterations. At current speeds, this feels about 2.9x better than raw ChatGPT, which is nice, but not mind-blowing. But because they're frugal with human tokens, they can go to ~10x productivity speedup just by waiting for models to get faster!

The fun stuff starts when AI agents get to the 100-1000x range, i.e. only require human input every 100-1000 iterations. It's going to be a long way there – but I'm excited every time I see something that will get us closer: Like code execution from E2B.dev, browsing from Browserbase and a context engine from SID.ai.

Many copilots & current ChatGPTs will seem silly in hindsight: Like doing a 1 on 1 with your intern every 15 minutes – when you could be managing a team that does a month's worth of progress between every meeting.

Today, developers are frugal with LLM tokens (I know: they're expensive) – alas we've built tools to use them wisely: Parea, Humanloop, Langfuse, LangChain. But the most important thing to be frugal with are human tokens (both input and output) – they will define the overall productivity speedup your application can provide. Humans are insanely slow.

AI agents don't yet work well – but it won't be a competition once they do.

Naturally, there are many caveats here: Iterations are gameable, and reducing human tokens has been an important trend outside of agents, too: Google let you find information with fewer keystrokes and reading than anyone else – same holds for Perplexity today. Button presses can be tokens (depending on the action they trigger) etc.

Some chart explanations:
0. I pin human completions at 1 token per second in all calculations. That is realistic for high quality human tokens, although some people are faster or slower.
1.
BCG put this number at 1.4x. I'm fine disagreeing. The 1.8x is at current GPT-4 speeds.
2. Devin doesn't fully realize it's potential yet.
3. Let's free that y axis! "Future agent" only needs a human completion every 100 iterations.
4. Back of the napkin calculation on human token cost. At $50/h and 400 tokens per minute input (reading/listening at 2x) / 100 tokens per minute output (slow typing/speaking with breaks): $8333/1M output tokens or $2083/1M input tokens, which is ~ 100x more expensive than GPT-4 (see below).

Amdahl's Law

Amdahl's Law approximates the maximum improvement in overall system performance when only part of the system is enhanced. It states that the overall performance gain from upgrading a component is limited by the fraction of time that the improved component is actually used.

For example: If a program spends a significant amount of time reading from and writing to a slow disk, then even a much faster processor will only marginally improve the overall performance, because the disk speed is the bottleneck.

The Bottleneck of AI Applications

The productivity speedup AI apps can provide is limited by how much human-in-the-loop work is required. Humans are ~1-3 tokens per second. They can't really be sped up – unless you're Neuralink.

So if your application requires a human completion for every LLM completion (i.e. ChatGPT or AI Copilots) then your maximum speedup is ~2x – even when LLMs become 10x faster.

Cognition Lab's Devin is better, because it needs a human completion only every ~10 iterations. At current speeds, this feels about 2.9x better than raw ChatGPT, which is nice, but not mind-blowing. But because they're frugal with human tokens, they can go to ~10x productivity speedup just by waiting for models to get faster!

The fun stuff starts when AI agents get to the 100-1000x range, i.e. only require human input every 100-1000 iterations. It's going to be a long way there – but I'm excited every time I see something that will get us closer: Like code execution from E2B.dev, browsing from Browserbase and a context engine from SID.ai.

Many copilots & current ChatGPTs will seem silly in hindsight: Like doing a 1 on 1 with your intern every 15 minutes – when you could be managing a team that does a month's worth of progress between every meeting.

Today, developers are frugal with LLM tokens (I know: they're expensive) – alas we've built tools to use them wisely: Parea, Humanloop, Langfuse, LangChain. But the most important thing to be frugal with are human tokens (both input and output) – they will define the overall productivity speedup your application can provide. Humans are insanely slow.

AI agents don't yet work well – but it won't be a competition once they do.

Naturally, there are many caveats here: Iterations are gameable, and reducing human tokens has been an important trend outside of agents, too: Google let you find information with fewer keystrokes and reading than anyone else – same holds for Perplexity today. Button presses can be tokens (depending on the action they trigger) etc.

Some chart explanations:
0. I pin human completions at 1 token per second in all calculations. That is realistic for high quality human tokens, although some people are faster or slower.
1.
BCG put this number at 1.4x. I'm fine disagreeing. The 1.8x is at current GPT-4 speeds.
2. Devin doesn't fully realize it's potential yet.
3. Let's free that y axis! "Future agent" only needs a human completion every 100 iterations.
4. Back of the napkin calculation on human token cost. At $50/h and 400 tokens per minute input (reading/listening at 2x) / 100 tokens per minute output (slow typing/speaking with breaks): $8333/1M output tokens or $2083/1M input tokens, which is ~ 100x more expensive than GPT-4 (see below).