Is Gemma 2 or Llama 3 the best open source model?

The new Gemma 2 27B model is said to be very promising, outperforming some larger models such as the Llama 3 70B and Qwen 1.5 32B.

At I/O 2024, Google announced its next line of Gemma 2 models, and now the company has finally released the lightweight models under an open source license. The new Gemma 2 27B model is said to be very promising, outperforming some larger models such as the Llama 3 70B and Qwen 1.5 32B. So, to test this claim, let's compare Gemma 2 and Llama 3 - two leading open source models today.

Creative writing

First, let's check out how good Gemma 2 and Llama 3 are when it comes to creative writing. The author of the article asked both models to write a short story about the relationship between the moon and the sun. Both do a great job, but Google's Gemma 2 stands out for its interesting prose and a good story.

Picture 1 of Is Gemma 2 or Llama 3 the best open source model?
Picture 2 of Is Gemma 2 or Llama 3 the best open source model?

On the other hand, Llama 3 seems a bit dull and robotic. Google has always been good at creating text with its Gemini models, and the smaller Gemma 2 27B model is no exception.

Winning option: Gemma 2

Multilingual testing

In the next round, find out how well both models handle languages ​​other than English. Since Google advertises that Gemma 2 is good at understanding multiple languages, the author compared it with Meta's Llama 3 model. The author asked both models to translate a passage in Hindi. Both Gemma 2 and Llama 3 performed excellently.

Picture 3 of Is Gemma 2 or Llama 3 the best open source model?
Picture 4 of Is Gemma 2 or Llama 3 the best open source model?

The author also tried another language, Bengali, and the models showed similarly good results. At least, for Indian languages, it can be said that Gemma 2 and Llama 3 are well trained on a large corpus. However, the Gemma 2 27B is almost 2.5 times smaller than the Llama 3 70B, which makes it even more impressive.

Winning options: Gemma 2 and Llama 3

Test your reasoning

While Gemma 2 and Llama 3 aren't the smartest models out there, they can perform some common reasoning tests like on much larger models. In the previous comparison between Llama 3 and GPT-4, Meta's 70B model was impressive as it demonstrated quite good intelligence even at its smaller size.

Picture 5 of Is Gemma 2 or Llama 3 the best open source model?
Picture 6 of Is Gemma 2 or Llama 3 the best open source model?

In this round, Llama 3 defeated Gemma 2 with a clear score. Llama 3 answered 2 out of 3 questions correctly while Gemma 2 struggled to answer even one correctly. Gemma 2 is simply not trained to solve complex reasoning questions.

Llama 3, on the other hand, has a solid foundation of reasoning, which can most likely be inferred from the coded data set. Despite its small size - at least compared to trillion-parameter models like GPT-4 - it exhibits more than a decent level of intelligence. Ultimately, using more tokens to train the model actually results in a more robust model.

Winning option: Llama 3

Follow the instructions

In the next round, the author asks Gemma 2 and Llama 3 to produce 10 words ending with the word 'NPU'. And Llama 3 achieved 10/10 correct answers. In contrast, Gemma 2 only produced 7 correct sentences out of 10. In many past releases, Google models including Gemini did not follow user instructions well. And the same trend continues with Gemma 2.

Picture 7 of Is Gemma 2 or Llama 3 the best open source model?
Picture 8 of Is Gemma 2 or Llama 3 the best open source model?

Following user instructions is crucial for AI models. It ensures reliability and produces accurate responses to what you have instructed. On the safety front too, it helps keep the model grounded for better compliance with safety protocols.

Winning option: Llama 3

Find information

Both Gemma 2 and Llama 3 have a context length of 8K tokens. The author added a huge block of text, sourced directly from the book Pride and Prejudice, containing over 17,000 characters and 3.8K tokens. As always, the author places a random quote somewhere in the text and asks both models to find it.

Picture 9 of Is Gemma 2 or Llama 3 the best open source model?
Picture 10 of Is Gemma 2 or Llama 3 the best open source model?

Gemma 2 quickly found out the information and pointed out that the quote was inserted randomly. Llama 3 also found this statement to be out of place. Regarding long context memory, despite being limited to 8K tokens, both models are quite strong in this regard.

Note that the author ran this test on HuggingChat (web) because meta.ai refused to run this prompt, most likely due to copyright content.

Winning options: Gemma 2 and Llama 3

Check for hallucinations

Smaller models tend to experience AI illusions due to limited training data, often fabricating information when the model encounters unfamiliar topics. Therefore, the author threw in the country names he made up to check whether Gemma 2 and Llama 3 were hallucinating or not. And surprisingly, they didn't, which means that both Google and Meta have pretty good foundations for their models.

Picture 11 of Is Gemma 2 or Llama 3 the best open source model?
Picture 12 of Is Gemma 2 or Llama 3 the best open source model?
Picture 13 of Is Gemma 2 or Llama 3 the best open source model?

The author also gave another (false) question to test the authenticity of the models, but again, they did not cause illusions. By the way, the author tested Llama 3 on HuggingChat as meta.ai browses the Internet to find current information on related topics.

Winning options: Gemma 2 and Llama 3

Conclude

Although Google's Gemma 2 27B model doesn't perform well in reasoning tests, it is capable of several other tasks. It's great at creative writing, supports multiple languages, has good memorization, and best of all, doesn't cause hallucinations like previous models.

Of course, Llama 3 is better, but it's also a significantly larger model, trained on 70 billion parameters. Developers will find the Gemma 2 27B model useful for many use cases. And for inference, Gemma 2 9B is also available.

Additionally, users should check out the Gemini 1.5 Flash, which is again a much smaller model and also supports multi-modal input. Not to mention, it's incredibly fast and efficient.

Update 03 July 2024
Category

System

Mac OS X

Hardware

Game

Tech info

Technology

Science

Life

Application

Electric

Program

Mobile