How to run LLM on Android phone using MNN Chat

MNN Chat is an open-source project developed by Alibaba. The inference engine itself is specifically built to run efficient LLM models on mobile hardware, without requiring the high-end capabilities of a GPU.

Many people have been tinkering with large language models (LLMs) locally on their computers for a while now. It started as a hobby running DeepSeek-R1 locally on a Mac, and now it's become a pretty great part of their workflow.

They tested most of the popular local AI inference applications on Android, and performance was consistently the biggest weakness. They had to work with severe hardware limitations because, after all, it's just a phone. That makes the software aspect extremely important. That's where MNN Chat excels.

Download MNN Chat on Google Play Store MNN Chat on GitHub

MNN Chat is the best local LLM application you can try.

Images 1 of How to run LLM on Android phone using MNN Chat

The first interesting thing about MNN Chat is that it's actually an open-source project developed by Alibaba. The inference engine itself is specifically built to run efficient LLM models on mobile hardware, without needing the high-end features of a GPU . Although the app is available on the Play Store, you can still view the source code on its GitHub page.

It had the best performance ever tested when running local models on Android. But before we begin, you need to know a few things. First of all, you need a relatively powerful phone. The author of this article ran all his models on a Samsung Galaxy S24 Ultra with 12GB of RAM, which is in the high-end segment by phone standards.

However, if you want to save costs, you should have at least 8GB of free RAM for a good user experience with smaller models. It also comes with many other useful features. If you're unsure which model to run because you don't know which one performs best, there's a built-in performance test to help you decide.

You also don't need to search the internet for working models. MNN Chat includes an in-app library so you can download models directly without leaving the app.

You get a whole library of models, ready to use.

Images 2 of How to run LLM on Android phone using MNN Chat

Images 3 of How to run LLM on Android phone using MNN Chat

Images 4 of How to run LLM on Android phone using MNN Chat

Images 5 of How to run LLM on Android phone using MNN Chat

Setting up MNN Chat is actually quite easy. All you need to do is open the app and access the Models Market. Here, you'll see a complete list of available models that you can download via Hugging Face. If you're unfamiliar with Hugging Face , it's basically one of the largest open-source AI model repositories.

Here, all you need to do is click download next to the model you want, and it will be ready to use as soon as the download is complete. The harder part is actually deciding which model to choose.

These models can range in size from a few hundred megabytes to several gigabytes. You should ensure you have sufficient free storage space, especially if you plan to download larger models or install multiple models simultaneously.

In the list, you'll see a series of familiar names like Qwen, DeepSeek , or Llama. One thing you'll quickly notice is that each model name includes a number followed by the letter B, for example, gemma-7b.

Images 6 of How to run LLM on Android phone using MNN Chat

The letter B stands for billions of parameters. Simply put, the higher the number, the more capable the model is, but it also takes up more memory and runs slower on the phone. For most mid-range or high-end smartphones, the article recommends models with up to 4 billion parameters, but that really depends on your phone. In experience, Qwen models are generally the best and even support more modes.

After downloading, simply go to My Models and start chatting with it. You can even modify the system prompt by clicking the three-line menu icon in the upper right corner and navigating to Settings > System Prompt .

You can also change the maximum number of new tokens here, which simply controls the length of the model's response before it stops generating text.

It's not just about large language models (LLMs).

Images 7 of How to run LLM on Android phone using MNN Chat

In the Models Market, you may notice there are several categories dedicated to creating images, audio, video, etc. Basically, it's exactly as the name suggests. You can download and run models that not only create text, but also include multimedia models that can work with images.

One really interesting thing you can do with this is integrate different model types to get something similar to ChatGPT 's voice mode . When running an LLM, you might notice there's a phone icon in the upper right corner.

From here, you need to download a text-to-speech model of your choice. You also need an automatic speech recognition (ASR) model to convert your speech into text. Then, everything is set up, and you can start speaking to your local LLM using your voice.

However, keep in mind that all of these models quickly take up a lot of space, as mentioned earlier. If you want to use a model that isn't available on HuggingFace, you can import it yourself via ADB.

You need to know how to adjust your own expectations.

This is obvious; don't expect the same quality as ChatGPT or Gemini, especially for tasks like image creation. The main advantage here is that you can run these models locally without an internet connection, and your data remains on your device. There are many other open-source local LLM applications you can use to improve your experience.

Unfortunately, running large models on a small device like a phone is impossible. But even so, you can still do a lot with this technology, such as creating a copy of Perplexity using local LLM models.

Close
Category

System

Windows XP

Windows Server 2012

Windows 8

Windows 7

Windows 10

Wifi tips

Virus Removal - Spyware

Speed ​​up the computer

Server

Security solution

Mail Server

LAN - WAN

Ghost - Install Win

Fix computer error

Configure Router Switch

Computer wallpaper

Computer security

Mac OS X

Mac OS System software

Mac OS Security

Mac OS Office application

Mac OS Email Management

Mac OS Data - File

Mac hardware

Hardware

USB - Flash Drive

Speaker headset

Printer

PC hardware

Network equipment

Laptop hardware

Computer components

Advice Computer

Game

PC game

Online game

Mobile Game

Pokemon GO

information

Technology story

Technology comments

Quiz technology

New technology

British talent technology

Attack the network

Artificial intelligence

Technology

Smart watches

Raspberry Pi

Linux

Camera

Basic knowledge

Banking services

SEO tips

Science

Strange story

Space Science

Scientific invention

Science Story

Science photo

Science and technology

Medicine

Health Care

Fun science

Environment

Discover science

Discover nature

Archeology

Life

Travel Experience

Tips

Raise up child

Make up

Life skills

Home Care

Entertainment

DIY Handmade

Cuisine

Christmas

Application

Web Email

Website - Blog

Web browser

Support Download - Upload

Software conversion

Social Network

Simulator software

Online payment

Office information

Music Software

Map and Positioning

Installation - Uninstall

Graphic design

Free - Discount

Email reader

Edit video

Edit photo

Compress and Decompress

Chat, Text, Call

Archive - Share

Electric

Water heater

Washing machine

Television

Machine tool

Fridge

Fans

Air conditioning

Program

Unix and Linux

SQL Server

SQL

Python

Programming C

PHP

NodeJS

MongoDB

jQuery

JavaScript

HTTP

HTML

Git

Database

Data structure and algorithm

CSS and CSS3

C ++

C #

AngularJS

Mobile

Wallpapers and Ringtones

Tricks application

Take and process photos

Storage - Sync

Security and Virus Removal

Personalized

Online Social Network

Map

Manage and edit Video

Data

Chat - Call - Text

Browser and Add-on

Basic setup