← Back
Business / Markets

Speed as a Vector of Intelligence: GPT-5.3 Spark, GPT-5.4, and Cursor Composer 1.5

Three releases in five weeks have turned raw inference speed into the central front of the AI coding market

Speed as a Vector of Intelligence: GPT-5.3 Spark, GPT-5.4, and Cursor Composer 1.5

By Negotiate the Future

3/11/26

Three product releases in five weeks have reframed a basic engineering metric, tokens per second, as the defining competitive axis in AI-assisted software development.

Cursor shipped Composer 1.5 on February 10, a proprietary model built on 20x-scaled reinforcement learning. Two days later, OpenAI released GPT-5.3-Codex-Spark in partnership with Cerebras, and on March 5 it followed with GPT-5.4, its most token-efficient reasoning model to date.

Each release made a distinct bet on what speed means and how to achieve it.

Codex-Spark is the most literal interpretation: raw throughput on specialized hardware. The model generates over 1,000 tokens per second when served on Cerebras' Wafer Scale Engine 3, a chip containing 4 trillion transistors and 900,000 AI-optimized cores. That rate is roughly 15 times faster than the flagship GPT-5.3-Codex.

OpenAI reduced time-to-first-token by 50 percent and per-roundtrip overhead by 80 percent, gains achieved through persistent WebSocket connections and a rewritten inference stack. The speed comes at a cost. On SWE-Bench Pro, Spark scores 56 percent compared to 56.8 percent for the full Codex model, and the gap widens on Terminal-Bench 2.0, where Spark drops to 58.4 percent against the flagship's 77.3 percent.

OpenAI released Spark as a research preview for ChatGPT Pro subscribers, positioning it as a tool for interactive development rather than autonomous engineering.

GPT-5.4, released three weeks later, defines speed differently. Rather than increasing raw token velocity, the model reduces the number of tokens required to reach a correct answer. OpenAI reported that GPT-5.4 is 33 percent less likely to make errors in individual claims compared to GPT-5.2, while using significantly fewer reasoning tokens overall.

Its fast mode in Codex delivers up to 1.5 times faster token velocity. The model also introduced native computer-use capabilities and a one-million-token context window.

Cursor Composer 1.5 represents a third approach entirely. Released by a company valued at $29.3 billion since its November 2025 Series D, the model was built by scaling reinforcement learning 20 times beyond its predecessor, with post-training compute exceeding the amount used to pretrain the base model. Composer 1.5 introduces adaptive thinking that calibrates reasoning depth to task complexity and a trained self-summarization mechanism that maintains accuracy as context overflows. The model is priced at $3.50 per million input tokens and $17.50 per million output tokens.

The convergence on speed reflects a market that has matured past the question of whether AI can write code. Cursor crossed $2 billion in annualized revenue in February 2026. OpenAI's partnership with Cerebras, valued at over $10 billion, commits 750 megawatts of wafer-scale systems to inference serving alone.

These are infrastructure-scale commitments predicated on a specific theory: that the constraint on AI coding adoption is no longer capability but latency. Codex-Spark trades accuracy for interactive speed, GPT-5.4 preserves accuracy by compressing reasoning, and Composer 1.5 uses reinforcement learning to make an agent faster at choosing what to do next.

Each model answers the same question with a different definition of what "fast" means.

More from NtF

Continue reading

Stay Informed

Stay Informed