The productivity speedup AI apps can provide is limited by how much human-in-the-loop work is required. Humans are ~1-3 tokens per second. They can't really be sped up – unless you're Neuralink.
So if your application requires a human completion for every LLM completion (i.e. ChatGPT or AI Copilots) then your maximum speedup is ~2x – even when LLMs become 10x faster.
Cognition Labs' Devin is better, because it needs a human completion only every ~10 iterations. At current speeds, this feels about 2.9x better than raw ChatGPT, which is nice, but not mind-blowing. But because they're frugal with human tokens, they can go to ~10x productivity speedup just by waiting for models to get faster!
The fun stuff starts when AI agents get to the 100-1000x range, i.e. only require human input every 100-1000 iterations. It's going to be a long way there – but I'm excited every time I see something that will get us closer: Like code execution from E2B.dev, browsing from Browserbase and a context engine from SID.ai.
Many copilots & current ChatGPTs will seem silly in hindsight: Like doing a 1 on 1 with your intern every 15 minutes – when you could be managing a team that does a month's worth of progress between every meeting.
Today, developers are frugal with LLM tokens (I know: they're expensive) – alas we've built tools to use them wisely: Parea, Humanloop, Langfuse, LangChain. But the most important thing to be frugal with are human tokens (both input and output) – they will define the overall productivity speedup your application can provide. Humans are insanely slow.
AI agents don't yet work well – but it won't be a competition once they do.
Naturally, there are many caveats here: Iterations are gameable, and reducing human tokens has been an important trend outside of agents, too: Google let you find information with fewer keystrokes and reading than anyone else – same holds for Perplexity today. Button presses can be tokens (depending on the action they trigger) etc.
Back of the napkin calculation on human token cost. At $50/h and 400 tokens per minute input (reading/listening at 2x) / 100 tokens per minute output (slow typing/speaking with breaks): $8333/1M output tokens or $2083/1M input tokens, which is ~ 100x more expensive than GPT-4.