Study Reveals Google Translate's Tulu Feature Needs Major Improvements
Research Finds Google's Tulu Translation Lacks Accuracy

A recent study conducted by researchers has uncovered significant shortcomings in Google's translation feature for the Tulu language. The research, carried out by two final-year MCA students from the Manipal Institute of Technology (MIT), provides a critical evaluation of the tool's performance since its launch.

Student-Led Research Uncovers Translation Gaps

The investigation was spearheaded by students Prajna Devadiga and Kavyashree. Their interest was sparked after Google added Tulu to its translation service on June 27, 2024. Prajna explained to TOI that the rapid adoption of the feature, including its use for uploading Wikipedia articles, prompted a deeper look into its accuracy. "This prompted us to study how accurate these translations actually are," she stated.

Rigorous Evaluation Using Four Key Metrics

The team employed a robust, multi-metric approach to assess the quality of Google's Tulu output. They used four established benchmarks in machine translation evaluation:

  • BLEU (Bilingual Evaluation Understudy): Measures the similarity between machine-generated translations and high-quality human translations.
  • chrF++ (Character F-score plus plus): A metric often used for evaluating low-resource languages, focusing on character-level accuracy.
  • TER (Translation Error Rate): Calculates the number of edits needed to make the machine translation match a human reference.
  • Comet (Crosslingual Optimised Metric for Evaluation of Translation): A modern, neural network-based metric for assessing translation quality.

To conduct the analysis, the researchers first created a dataset by manually translating selected English sentences into accurate Tulu. These human-crafted translations served as the reference point against which Google's automated outputs were compared.

Key Findings and the Path Forward for Tulu

The study's results clearly indicate a significant scope for improvement in Google's Tulu translation system. A primary issue identified is the model's struggle with word-level meaning and proper context. Due to Tulu being a low-resource language in the digital realm, the system frequently defaults to substituting words from Kannada, a related but distinct language.

UB Pavanaja, director of the Vishwa Kannada Foundation who served as the external guide for the project, emphasized the implications. He noted that enhancing Google's Tulu language corpus and providing more focused, high-quality training data are crucial steps not only for accurate translation but also for the wider digital adoption and preservation of the Tulu language.

The research was conducted under the guidance of Adarsh Rag, Ashwath, and Musica Supriya from the computer engineering department. The complete study is expected to be published in the near future, offering detailed insights and data to support the call for technological improvement.