OpenAI Introduces ChatGPT Agent That Can 'Use Virtual Computers' to Complete Complex Tasks

OpenAI has offered two distinct types of agents: Operators (which can browse the web and perform tasks independently) and Deep Researchers (which specialize in synthesizing large amounts of online information). Today, the company announced ChatGPT agents, a new AI that combines Operator's web browsing capabilities, Deep Research's research strengths, and ChatGPT's conversational skills into a single powerful agent.

ChatGPT agents can now perform tasks using their own virtual desktop. Based on user queries, they can navigate websites, filter results, prompt users to log in when required, run code, perform analytics, create spreadsheets and PowerPoints, and more.

ChatGPT agent will have access to the following tools to complete tasks assigned by the user:

A visual web browser that interacts with the web through a GUI
A text-based browser for simple web queries based on reasoning
A terminal
Direct API access
Ability to connect to ChatGPT connectors.

Since the ChatGPT agent does all of its work using its own virtual machine, it will have all the context it needs to complete the task. For example, the agent can access a website using a browser, download a file from the website, manipulate that same file by running a command in the terminal, and then view the output back in a visual browser.

OpenAI claims that the ChatGPT agent achieves state-of-the-art (SOTA) performance on various benchmarks that measure web browsing and real-world task completion. Here are some highlights:

Humanity's Last Exam: ChatGPT agent achieved a new SOTA pass@1 score of 41.6. When running up to 8 trials at once and choosing the trial with the highest self-reported confidence, the score increased to 44.4.
FrontierMath: ChatGPT agent achieves 27.4% accuracy.
OpenAI's internal benchmark (evaluating model performance on complex, economically valuable knowledge work tasks): ChatGPT agent output is equal to or better than humans in about half of the cases.
DSBench⁠: ChatGPT agent outperforms human performance by a significant margin on data science tasks.
SpreadsheetBench: ChatGPT agent scored 45.5%, compared to 20.0% for Copilot in Excel.
BrowseComp⁠: ChatGPT agent sets new SOTA at 68.9%.
WebArena: ChatGPT agent reached 65.4%.

ChatGPT agent is now available in the ChatGPT tools menu with a new 'agent mode'. While the agent is performing a task, users can find a narration on the screen; they can also interrupt and take control of the browser whenever needed.

ChatGPT agents will be available to all ChatGPT Pro users soon. ChatGPT Plus and Team users will get access in the coming days, while Enterprise and Education users will have to wait a few more weeks. ChatGPT Pro users will be limited to 400 messages per month with the agent, while other paid plans are limited to 40 messages per month. However, users can purchase additional agent usage with flexible credit-based options.

OpenAI Introduces ChatGPT Agent That Can 'Use Virtual Computers' to Complete Complex Tasks

Was this article helpful?

Reader Comments 0

Was this article helpful?

Reader Comments 0

Related Articles