How to Setup and Run Qwen 3 Locally with Ollama

Qwen3 is Alibaba's latest generation of large-scale language models. With support for over 100 languages and strong performance on reasoning, coding, and translation tasks, Qwen3 is comparable to many leading models today, including DeepSeek-R1, o3-mini , and Gemini 2.5 .

This guide will explain step by step how to run Qwen3 locally using Ollama. The guide will also build a lightweight local application using Qwen 3. This application will allow you to switch between Qwen3's reasoning modes and translate between different languages.

Why run Qwen3 locally?

Running Qwen3 locally offers several key benefits:

Privacy : Data never leaves your machine.
Latency : Faster local inference without round-trip APIs.
Cost Savings : No token fees or cloud bills.
Control : You can adjust prompts, choose patterns, and configure thinking modes.
Offline Access : You can work without an Internet connection after downloading the model.

Qwen3 is optimized for both deep reasoning (thinking mode) and quick response (non-thinking mode) and supports over 100 languages.

Setup Qwen3 locally using Ollama

Ollama is a tool that allows you to run language models like Llama or Qwen locally on your computer using a simple command line interface.

Step 1: Install Ollama

Download Ollama for macOS, Windows or Linux from: https://ollama.com/download.

Follow the installation instructions and once installed, verify by running this command in terminal:

ollama --version

Step 2: Download and run Qwen3

Ollama offers multiple Qwen3 models designed to fit a wide range of hardware configurations, from lightweight laptops to high-end servers.

ollama run qwen3

Running the above command will launch the default Qwen3 model in Ollama, which currently defaults to qwen3:8b. If you are working with limited resources or want faster startup times, you can explicitly run smaller variants like the 4B model:

ollama run qwen3:4b

Qwen3 is available in several variants, starting from the smallest 0.6b (523MB) parameter model to the largest 235b (142GB) parameter model. These smaller variants offer impressive performance for reasoning, compilation, and code generation, especially when used in think mode.

The MoE models (30b-a3b, 235b-a22b) are particularly interesting because they only activate a subset of experts for each inference step, allowing for large parameter counts while keeping runtime costs efficient.

In general, use the largest model your hardware can handle and drop back to 8B or 4B models for local responsive testing on consumer machines.

Here's a quick summary of all the Qwen3 models you can run:

Model	Ollama Command	Best suited for
Qwen3-0.6B	`ollama run qwen3:0.6b`	Lightweight tasks, mobile applications and edge devices
Qwen3-1.7B	`ollama run qwen3:1.7b`	Chatbots, assistants and low-latency applications
Qwen3-4B	`ollama run qwen3:4b`	General purpose tasks with balanced performance and resource usage
Qwen3-8B	`ollama run qwen3:8b`	Multilingual support and moderate reasoning ability
Qwen3-14B	`ollama run qwen3:14b`	Advanced reasoning, content creation and complex problem solving
Qwen3-32B	`ollama run qwen3:32b`	High-level tasks require strong reasoning and extensive context processing
Qwen3-30B-A3B (MoE)	`ollama run qwen3:30b-a3b`	Efficient performance with 3 operating parameters, suitable for encryption tasks
Qwen3-235B-A22B (MoE)	`ollama run qwen3:235b-a22b`	Large-scale applications, deep reasoning, and enterprise-grade solutions

Step 3: Run Qwen3 in background (optional)

To serve the model via the API, run this command in Terminal:

ollama serve

This will make the model available for integration with other applications at http://localhost:11434.

Use Qwen3 locally

This section will walk you through a number of ways you can use Qwen3 locally, from basic CLI interaction to integrating models with Python.

Option 1: Run Inference via CLI

Once the model is downloaded, you can interact with Qwen3 directly in Terminal. Run the following command in Terminal:

echo "What is the capital of Brazil? /think" | ollama run qwen3:8b

This is useful for quick or lightly interactive tests without writing any code. The /think tag at the end of the prompt instructs the model to engage in deeper, step-by-step reasoning. You can replace this with /no_think for a faster, shallower response, or skip it altogether to use the model's default inference mode.

How to Setup and Run Qwen 3 Locally with Ollama Picture 1

Option 2: Access Qwen3 via API

While ollama serve is running in the background, you can interact with Qwen3 programmatically using the HTTP API, perfect for backend integration, automation, or REST client testing.

curl http://localhost:11434/api/chat -d '{ "model": "qwen3:8b", "messages": [{ "role": "user", "content": "Define entropy in physics. /think" }], "stream": false }'

Here's how it works:

curl makes a POST request (how we call the API) to the local Ollama server running at localhost:11434.
Payload is a JSON object that has:
1. "model": Specifies the model to use (here: qwen3:8b).
2. "messages": List of chat messages containing roles and content.
3. "stream": false: Ensures the response is returned all at once, not token by token.

How to Setup and Run Qwen 3 Locally with Ollama Picture 2

Option 3: Access Qwen3 via Python

If you're working in a Python environment (like Jupyter, VSCode, or scripting), the easiest way to interact with Qwen3 is through the Ollama Python SDK. Start by installing ollama:

pip install ollama

Then run your Qwen3 model using this script (example is using qwen3:8b below):

import ollama response = ollama.chat( model="qwen3:8b", messages=[ {"role": "user", "content": "Summarize the theory of evolution. /think"} ] ) print(response["message"]["content"])

In the code above:

ollama.chat(.) sends a chat-style request to the local Ollama server.
You specify the model (qwen3:8b) and the list of messages in a format similar to OpenAI's API.
The /think tag asks for a step-by-step explanation model.
Finally, the response is returned as a dictionary and you can access the model's answer using ["message"]["content"].

This approach is ideal for local testing, prototyping, or building LLM-powered applications without relying on cloud APIs.

How to Setup and Run Qwen 3 Locally with Ollama Picture 3

Qwen3 brings advanced inference, fast decoding, and multi-language support to your local machine with Ollama.

With minimal setup you can:

Run LLM inference locally without relying on the cloud
Switch between quick and thoughtful responses
Use API or Python to build smart apps

Marvin Fry

Update 29 May 2025

You should read it

May be interested

Fix 'Setup is already running. Hãy thử khởi động sau khi sau khi gỡ bỏ Avast
in the process of removing the avast antivirus program to reinstall or install another antivirus program on the computer, users often encounter the error message setup is already running. hãy thử khởi động sau đặt thời gian và không thể gỡ bỏ application.
Copy Page Setup settings to another Sheet in Excel
the setup of page setup is very simple, but if you work with lots of sheet in excel you spend a lot of time setting up this. the following article shows you how to copy page setup settings to another sheet in excel.
How to setup web based code server in Linux
setting up a code-server ensures a consistent and accessible development environment, while your code and data remain under your control.
How to initial setup Windows Server 2019
after downloading and installing windows server 2019, you will go to the initial setup process. let's find out details through the following article!
How to set up your own Git server on Linux
while you can count on globally renowned git hosting services like github, in some cases it is better to host a personal git server for enhanced privacy, customizability, and security.
Tips for setting up Wi-Fi networks useful
if you are going to install a wireless router in your office or home, refer to the tips in this article to save costs and time.
Is Gemma 2 or Llama 3 the best open source model?
the new gemma 2 27b model is said to be very promising, outperforming some larger models such as the llama 3 70b and qwen 1.5 32b.
What is the error Press F1 to run Setup? How to solve it?
the error press f1 to run setup is a common problem when we start the computer. if your computer is also having this error, check it out now!
How to set up media server at home with Jellyfin on Ubuntu
jellyfin is a media server for streaming and organizing images, videos, and audio files. jellyfin is completely free and gives users full control of their media.
Use IIS to set up FTP Server on Windows
set up an ftp server (file transfer protocol server) to share and convert large files with unlimited traffic.