Our Terms & Conditions | Our Privacy Policy
AI4Bharat and IBM Research India Release New Indic Language LLM Benchmark
AI4Bharat, in partnership with IBM Research India, has introduced MILU (Multi-task Indic Language Understanding Benchmark), an extensive new evaluation benchmark for Indic languages.
This benchmark, developed under The AI Alliance, includes 85,000 multiple-choice questions across 11 Indian languages, covering eight diverse domains and over 40 subjects with an India-centric focus on both general and cultural knowledge.
MILU’s evaluation shows that GPT-4 achieved the highest accuracy among 40+ tested models, scoring 72%. Open-source LLMs, such as Llama 3.1 and Gemma, outperformed Indic language-specific models, though cultural knowledge presented more difficulty for the models than STEM-related questions.
This study has a few limitations, including the availability of resources for low-resource languages, restricting the benchmark to 11 languages. Additionally, computational constraints limited the evaluation of larger models like LLAMA-3.1-70B and LLAMA-3.1-405B, gaps that future work will seek to address for broader inclusion.
The MILU benchmark builds upon earlier Indic language benchmarks, such as INDICGLUE (2020) and INDICNLG2 (2022), which focused on language understanding and generation tasks in 11 Indian languages.
INDICXTREME (2023) further expanded these efforts to cover all 22 scheduled Indian languages for natural language understanding, while newer benchmarks like INDICGENBENCH (2024) provide extensive evaluations for multilingual generation.
Additional projects, such as INDICQA (2024) and L3CUBE-INDICQUEST (2024), focus on question-answering and regional knowledge, while AIRAVATA and the INDICLLM-LEADERBOARD facilitate translation of English benchmarks into Indian languages.
Notably, Adithya S. Kolavi, founder and CEO of CognitiveLab, developed the INDICLLM-LEADERBOARD to support the evaluation of LLMs specifically within Indian linguistic contexts.
In a related effort, Guneet Singh Kohli of GreyOrange AI and Daniel van Strien of Hugging Face introduced Sanskriti Bench under the Data is Better Together initiative.
The aim of Sanskriti Bench is to develop an Indian cultural benchmark to test the increase of Indic AI models. By crafting a benchmark with the help of native speakers from different regions across India, the initiative aims to take into account the country’s cultural diversity.
With these benchmarks, MILU and related initiatives aim to support the development of AI systems that are culturally aware and linguistically competent, serving India’s 1.4 billion people more effectively.
Images are for reference only.Images and contents gathered automatic from google or 3rd party sources.All rights on the images and contents are with their legal original owners.
Comments are closed.