What is Cursor's Composer model?

Cursor trained Composer 2.5 on Qwen K2.5 using new RL techniques, competing with GPT 5.5 and Opus.

The Cursor Story: From Code Editor to AI Lab

When Cursor launched, it was a smart wrapper around GPT-4—a code editor that made AI support seamless within the development workflow. Today, the company behind it, Anysphere, is training its own advanced models and competing directly with OpenAI and Anthropic in benchmark performance. That's a significant turnaround in a very short time.

The highlight of this transformation is Cursor's Composer model—specifically, the version trained on Qwen 2.5 Coder using new reinforcement learning techniques. Understanding what Composer is, how it works, and its evolving implications within the broader AI landscape is crucial not only for developers but for anyone considering the direction of specialized AI applications.

This article will analyze in detail what Cursor's Composer model is, how Anysphere built it, which competitors it competes with, and what the company's broader ambitions mean for the field of programmable AI.

What is Cursor's Composer feature?

What is Cursor's Composer model? Picture 1

Before discussing the model, it's necessary to clarify the product context. Cursor is a code editor – essentially a developed version of VS Code – built by Anysphere. Its most powerful feature is called Composer, which acts as an automated programming assistant capable of editing multiple files simultaneously.

Unlike simple auto-completion features or chat assistants that only edit a single file, Composer works like a programmer apprentice you can assign tasks to:

You describe a feature, bug fix, or refactoring using natural language.
Composer reads the relevant codebase context.
It proposes and applies changes across multiple files.
It can be repeated based on your feedback.

This is completely different from early AI programming tools that only suggested code directly. Composer infers the project structure, understands dependencies, and executes multi-step plans — which is why it's classified as 'agentic'.

The difference between Composer and Cursor Tab

Many users confuse Cursor Tab (the auto-completion feature) with Composer. They are two different tools:

Cursor Tab handles completion in real time, recognizing context as you type. It's fast, local, and responsive.
Composer is proactive, supports multiple files, and is conversational. You provide the goal; it will figure out how to achieve that goal across your entire codebase.

Composer is where Anysphere has invested heavily in training its own proprietary models instead of relying entirely on API calls to third-party providers.

How Anysphere trains Composer models

Anysphere's decision to train its own models instead of relying solely on OpenAI or Anthropic API calls marks a step forward for the company in the field of advanced artificial intelligence. Here's what's known about the engineering approach.

Qwen 2.5 Coder Platform

The Composer model is built on the Qwen 2.5 Coder platform, an open-source weighted model released by Alibaba's Qwen team. Qwen 2.5 Coder stood out upon its release due to its robust performance on programming tests — in some reviews, it achieved performance comparable to or significantly better than larger closed-source models on tasks such as HumanEval and SWE-bench.

Using an open-weighted model as a foundation is a deliberate strategic choice. This allows Anysphere to:

Refine the model on proprietary encrypted data without paying API fees per token.
Direct control of inference infrastructure
Modify the model's behavior at a fundamental level instead of just reminding it of limitations.

This method, sometimes called ongoing pre-training or domain-specific fine-tuning, is becoming increasingly popular among companies seeking model-level control without the costly process of initial training.

Enhanced learning for code agents

The more interesting part of training Composer is the application of reinforcement learning techniques specifically designed for agentic programming tasks.

Standard supervised fine-tuning teaches the model to mimic good outputs. Reinforcement learning-based training teaches the model to maximize reward signals — in the coding context, this could be things like:

Does the generated code pass the testing tool?
Does the modified code still compile and run correctly?
Does the agent complete the task described in the prompt without breaking existing functionality?

This is a more challenging problem than training on static examples, but it produces better models for multi-step reasoning and error recovery. The model learns not only how to write code that looks right, but also how to write code that actually works.

Anysphere's approach is based on similar research traditions to what DeepMind used for AlphaCode and what companies like Cognition (the maker of Devin) have been exploring in the field of automated programming agents.

The true meaning of 'new RL techniques'

The company has deliberately kept specific information about its RL implementation under wraps, which is understandable from a competitive standpoint. But based on public signals and what's known from the broader research community, these techniques could include some combination of:

Process Reward Model (PRM) — models that score intermediate steps, not just the final output.
Execution feedback — uses actual code execution results (pass/fail, runtime errors, test coverage) as reward signals.
Orbital-level optimization — training the model to optimize across the entire editing chain, not just individual completions.

This type of training is computationally expensive and requires significant infrastructure. This is also where the lines between an 'AI-powered product company' and an 'AI research lab' begin to blur.

Benchmark performance and real-world competition

So how exactly does the Composer model work? The straightforward answer is: It depends on who you ask and what criteria you're considering.

Composer's position relative to mainstream models

Anysphere has released performance review results showing that the Composer model competes quite well with:

GPT-4.5 is useful for some code generation and multi-file editing tasks.
Claude Opus on software engineering benchmarks such as SWE-bench

The SWE-bench is particularly relevant here because it measures a model's ability to solve real-world problems on GitHub—rather than just writing clean code snippets individually. This aligns much better with what Composer needs to do in practice.

It's important to clarify that benchmark performance and the developer's real-world experience don't always correlate perfectly. Cursor's strength lies in how Composer integrates with the editor—the context it can access, the user interface for examining differences, and the iterative loop. A model that scores slightly lower on performance tests may still feel better in practice if the surrounding product experience is robust.

Advantages of specialization

Advanced, versatile models like GPT-4o and Claude Opus are trained to excel at everything—writing, reasoning, programming, analysis, etc. Composers are specifically trained for programming tasks, particularly automated multi-file editing workflows.

This specialization offers real benefits. A model with 30% of the parameters of GPT-40 can match or outperform it in specific coding tasks if it is extensively trained on the right data and with the right reward signals. This is also what has helped models like DeepSeek Coder and Qwen Coder compete, despite being much smaller than OpenAI's flagship products.

What is Cursor's Composer model?

The Cursor Story: From Code Editor to AI Lab

What is Cursor's Composer feature?

The difference between Composer and Cursor Tab