How to run LLM on Android phone using MNN Chat

MNN Chat is an open-source project developed by Alibaba. The inference engine itself is specifically built to run efficient LLM models on mobile hardware, without requiring the high-end capabilities of a GPU.

Many people have been tinkering with large language models (LLMs) locally on their computers for a while now. It started as a hobby running DeepSeek-R1 locally on a Mac, and now it's become a pretty great part of their workflow.

They tested most of the popular local AI inference applications on Android, and performance was consistently the biggest weakness. They had to work with severe hardware limitations because, after all, it's just a phone. That makes the software aspect extremely important. That's where MNN Chat excels.

Download MNN Chat on Google Play Store MNN Chat on GitHub

MNN Chat is the best local LLM application you can try.

images 1 of How to run LLM on Android phone using MNN Chat
Images 1 of How to run LLM on Android phone using MNN Chat

The first interesting thing about MNN Chat is that it's actually an open-source project developed by Alibaba. The inference engine itself is specifically built to run efficient LLM models on mobile hardware, without needing the high-end features of a GPU . Although the app is available on the Play Store, you can still view the source code on its GitHub page.

It had the best performance ever tested when running local models on Android. But before we begin, you need to know a few things. First of all, you need a relatively powerful phone. The author of this article ran all his models on a Samsung Galaxy S24 Ultra with 12GB of RAM, which is in the high-end segment by phone standards.

However, if you want to save costs, you should have at least 8GB of free RAM for a good user experience with smaller models. It also comes with many other useful features. If you're unsure which model to run because you don't know which one performs best, there's a built-in performance test to help you decide.

You also don't need to search the internet for working models. MNN Chat includes an in-app library so you can download models directly without leaving the app.

You get a whole library of models, ready to use.

images 2 of How to run LLM on Android phone using MNN Chat
Images 2 of How to run LLM on Android phone using MNN Chat
images 3 of How to run LLM on Android phone using MNN Chat
Images 3 of How to run LLM on Android phone using MNN Chat
images 4 of How to run LLM on Android phone using MNN Chat
Images 4 of How to run LLM on Android phone using MNN Chat
images 5 of How to run LLM on Android phone using MNN Chat
Images 5 of How to run LLM on Android phone using MNN Chat

Setting up MNN Chat is actually quite easy. All you need to do is open the app and access the Models Market. Here, you'll see a complete list of available models that you can download via Hugging Face. If you're unfamiliar with Hugging Face , it's basically one of the largest open-source AI model repositories.

Here, all you need to do is click download next to the model you want, and it will be ready to use as soon as the download is complete. The harder part is actually deciding which model to choose.

These models can range in size from a few hundred megabytes to several gigabytes. You should ensure you have sufficient free storage space, especially if you plan to download larger models or install multiple models simultaneously.

In the list, you'll see a series of familiar names like Qwen, DeepSeek , or Llama. One thing you'll quickly notice is that each model name includes a number followed by the letter B, for example, gemma-7b.

images 6 of How to run LLM on Android phone using MNN Chat
Images 6 of How to run LLM on Android phone using MNN Chat

The letter B stands for billions of parameters. Simply put, the higher the number, the more capable the model is, but it also takes up more memory and runs slower on the phone. For most mid-range or high-end smartphones, the article recommends models with up to 4 billion parameters, but that really depends on your phone. In experience, Qwen models are generally the best and even support more modes.

After downloading, simply go to My Models and start chatting with it. You can even modify the system prompt by clicking the three-line menu icon in the upper right corner and navigating to Settings > System Prompt .

You can also change the maximum number of new tokens here, which simply controls the length of the model's response before it stops generating text.

It's not just about large language models (LLMs).

images 7 of How to run LLM on Android phone using MNN Chat
Images 7 of How to run LLM on Android phone using MNN Chat

In the Models Market, you may notice there are several categories dedicated to creating images, audio, video, etc. Basically, it's exactly as the name suggests. You can download and run models that not only create text, but also include multimedia models that can work with images.

One really interesting thing you can do with this is integrate different model types to get something similar to ChatGPT 's voice mode . When running an LLM, you might notice there's a phone icon in the upper right corner.

From here, you need to download a text-to-speech model of your choice. You also need an automatic speech recognition (ASR) model to convert your speech into text. Then, everything is set up, and you can start speaking to your local LLM using your voice.

However, keep in mind that all of these models quickly take up a lot of space, as mentioned earlier. If you want to use a model that isn't available on HuggingFace, you can import it yourself via ADB.

You need to know how to adjust your own expectations.

This is obvious; don't expect the same quality as ChatGPT or Gemini, especially for tasks like image creation. The main advantage here is that you can run these models locally without an internet connection, and your data remains on your device. There are many other open-source local LLM applications you can use to improve your experience.

Unfortunately, running large models on a small device like a phone is impossible. But even so, you can still do a lot with this technology, such as creating a copy of Perplexity using local LLM models.

5 | 1 Vote
« PREV : Guide to AI text...
The quickest way to... : NEXT »