3 local LLMs that can run on a phone.
Having local AI on mobile devices has proven incredibly useful during downtime – at least when both the internet service provider and the mobile base station lose connection at the same time, which occasionally happens here.
LLMs have been a part of many people's desktop setups for a long time. However, many people have never really taken the mobile virtual assistant feature seriously, and even after their initial experience running Gemma 4 on mobile, they often choose Claude, and this probably won't change anytime soon.
But having local AI on mobile has proven incredibly useful during downtime – at least when both the internet provider and the cell tower lose connection at the same time, which happens occasionally here. So it's worth taking this more seriously and exploring other options like Llama, Gemma, or Qwen. All of them have been tested using the same benchmark to determine which apps are truly worth keeping on your phone.
Gemma 4 E2B
The Gemma 4 E2B is Google 's edge-device-prioritized model , designed from the ground up to run on phones rather than scaled down from a larger device. Its architecture allows it to operate with around 2GB of memory and handle image processing natively. It's the first and virtually only mobile model, and because it performs so well, there's no reason to change it for real-world use cases. So let's see how this basic model performs against newer competitors.
First, run a test with a few questions that the local LLM community uses to quickly check responses – the strawberry test, the marble in the cup test, and Sally's brother. These questions were taken from the website of Mervin Praison , an AI developer.
According to Gemma, the word "Strawberry" has two "R"s, a well-known failure for small models—related to how they parse words, and even some cloud bots struggle with it for some reason. The marble case is even stranger; it goes through the logical steps, correctly identifying that the cup is upside down, and then somehow still concludes the marble is in the microwave along with the cup. Sally's brothers have a similar model and their reasoning points to the correct answer without actually reaching the conclusion. In all three cases, it follows the same pattern: recounting the steps logically, but failing to find the answer. Not exactly stupid, but more like the line of reasoning breaks off right at the end.
Then, it was given a real task: a weekend Python 101 course, designed for complete beginners, with code examples and exercises in each section. The course was pretty good. It really put effort into structure, breaking everything down into one-hour blocks with code examples and exercises. The course's weakness was the technical guidance – it just said "open an online Python text editor or interpreter" without giving detailed instructions for beginners, as if they already understood those tools. Gemma is mostly a visual model as courses don't seem to be its strong point.
Llama 3.2 3B
Meta's conversation expert, on the phone.
The Llama 3.2 3B is Meta's small conversational model, specifically tuned for the chat and summary tasks you'll actually be running on your phone. Meta's performance tests claim it beats the Gemma 2 2.6B and the Phi-3.5 mini. Llama is the default name people often think of when they want a small model and don't really know how it compares to the previous Gemma model.
Llama also got the word Strawberry wrong, but in a funnier way. She spelled out each letter (STRAWBERRY) and still said there were two R's. The question about the marble was okay, using the same logic but arriving at the wrong conclusion as Gemma. Sally's question about her brothers was worse; her answer was two, reasoning "each one has a brother," which was completely illogical. Overall, it was pretty much as expected, as most models struggled with these questions, but Llama was a little worse.
The new course section is where Llama truly excels. The lesson structure seems simpler than Gemma's, with fewer emojis and bold headings, providing clean code blocks with explanations. The code itself also seems more practical, including input fields and a while loop in the final project, which is better than Gemma's "Hello World" demo.
Qwen 3.5 4B
The Qwen 3.5 4B is Alibaba's compact model, strong inference capabilities and also features image recognition. It's much larger than the Phi but still runs smoothly.
In the basic prompts, Qwen was actually the only one of the three programs that answered correctly about the word Strawberry – it listed each letter in ascending order and arrived at the answer 3. However, it still answered the question about the marble incorrectly, just like the other programs, using illogical logic. The question about Sally's siblings was the strange one. Qwen started with 3, then lowered itself to 2, and then halfway through the answer started doubting itself with "wait, let me re-evaluate" before going back to listing the family from the beginning. It seems to be doubting itself, which isn't necessarily a bad thing.
However, the course was the strongest part of the three tools. Qwen named specific editors that beginners could actually open – VS Code, PyCharm, Replit – instead of just generally saying "a text editor," and the final project was a tip calculator with actual variables instead of a greeting loop.
What is the best local LLM for phones?
The final choice is Qwen. It significantly outperforms other software in terms of inference and structure. However, Gemma excels when working with images due to its superior analysis capabilities. Frankly, after trying other models, many people always return to Gemma and Qwen, so this isn't too surprising.
If you want to try it yourself, the app used in this article is PocketPal, available on Android and iOS . It integrates directly with the Hugging Face hub, so you have plenty of options to choose from.
- Why is it easier to use Linux for local LLMs than Windows?
- Local LLMs cannot replace ChatGPT or Gemini.
- Stop using Perplexity! Your local LLM does everything better.
- 9 pros and cons of using a local LLM
- 'AI Judge' Can Learn New Tricks to Better Check Information and Program
- Photos of Nokia making the phone 'brick' run Android
- How does Google determine local rankings?
- Fixes Chrome not being able to access the local network on Mac.