Google Translate adds support for 110 new languages, including Cantonese

Google Translate is currently having its 'biggest expansion ever' with 110 new languages.

Google credits its PaLM 2 large language model (from 2023 before Gemini) for making this possible:

PaLM 2 is an important piece of the puzzle, helping Translate learn closely related languages ​​more effectively, including languages ​​close to Hindi, like Awadhi and Marwadi, as well as Creole languages ​​of French as Seychellois Creole and Mauritian Creole.

Google Translate adds support for 110 new languages, including Cantonese Picture 1Google Translate adds support for 110 new languages, including Cantonese Picture 1

These additions benefit more than 614 million people, thus 'opening up translation to approximately 8% of the world's population' . This is Google's biggest African language expansion to date, accounting for a quarter of the additional options.

Some of the world's major languages ​​with over 100 million speakers. Other languages ​​are spoken by small indigenous communities. A small number of languages ​​are almost no longer spoken by native speakers but are being actively revived.

  1. Afar is a tonal language spoken in Djibouti, Eritrea and Ethiopia. Among all the languages ​​in this launch, Afar has the most volunteer community contributions.
  2. Cantonese has long been one of the most requested languages ​​for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it is difficult to find data and train models.
  3. Manx is the Celtic language of the Isle of Man. It nearly became extinct after the death of the last native in 1974. But thanks to an island-wide revival movement, there are now thousands of speakers.
  4. NKo is a standardized form of the West African Manding languages, uniting many dialects into a common language. NKo's unique alphabet was invented in 1949 and there is an active research community developing resources and technology for it today.
  5. Punjabi (Shahmukhi) is a variety of Punjabi written in the Perso-Arabic script (Shahmukhi), and is the most spoken language in Pakistan.
  6. Tamazight (Amazigh) is a Berber language spoken throughout North Africa. Although there are many dialects, the writing form is generally similar. It is written in Latin and Tifinagh script, both of which are supported by Google Translate.
  7. Tok Pisin is an English-based creole and the lingua franca of Papua New Guinea. If you speak English, try translating to Tok Pisin - you might understand the meaning!

In the future, Google wants to 'support a wider variety of languages ​​and spelling conventions over time' . The broader goal is to 'build AI models that support the 1,000 most used languages ​​worldwide' .

4 ★ | 1 Vote