Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Livemint
Livemint
Technology
Sohini Bagchi

Artpark-IISc, Google to bring innovation to India’s diverse languages

Artpark-IISc, Google to bring innovation to India’s diverse languages. (Photo: Reuters)

The new initiative, touted ‘Vaani’ launched at the “Google for India 2022" event in New Delhi, “brings together high-quality datasets that reflect the true diversity of natural spoken language and transcribed text from every district of India".

With this launch, Vaani joins the Bhāshā AI umbrella of Artpark and IISc’s pan-India language initiatives that include SYSPIN (Synthesizing Speech in Indian languages) and RESPIN (Recognizing Speech in Indian languages) which cover nine languages including Magadhi and Maithili.

“To propel research and innovation these datasets are being open-sourced via Vaani’s website (vaani.iisc.ac.in) and in the future may also be available through other platforms like ‘Bhashini’ of MeitY (Ministry of Electronics and Information Technology)," according to a statement.

Globally, there is a lot of hype about large language models like GPT-3. But they require huge text corpora and humongous computing power to train, as Prasanta Kumar Ghosh, IISc, who leads these initiatives, “in our work, we found at least 50 varieties of ‘Bengali’ and some that even I, as a native Bengali speaker, had difficulty understanding". Even Hindi, with its more than four-dozen dialectal variations does not have nearly as much text data. “Machines have no hope! So research and innovation for inclusive language AI require capturing this diversity in our datasets," he said.

Also, as Indians primarily communicate by speech, it warrants very different approaches and breakthroughs for machines to transcribe, understand, or translate while also taking into account the language variations every few kilometres. In such a context, technologies like automatic speech recognition (ASR) and natural language processing (NLP) can only be unleashed through open-source and mission-mode efforts.

Raghu Dharmaraju, president, Artpark, added, “Over the past decade, most apps for frontline health and agriculture workers have failed because digital interfaces feel alien to them. More than 1 billion Indians still cannot speak or type in English…So, if citizens can communicate with digital services in their mother tongue… over the next decade, that will be key to India’s economic growth and for a more equitable distribution of its benefits," Dharmaraju said.

The initiative, currently focused in 80 districts of 10 states, will expand to every district over the next couple of years. Artpark and IISc will also launch challenges for researchers and startups to build applications in areas like health, agriculture, and financial inclusion using these datasets.

ABOUT THE AUTHOR

Sohini Bagchi

"Sohini Bagchi is a senior assistant editor with TechCircle with over 15 years of experience in technology journalism. She has previously worked with IDG Media and Trivone Digital Services. Sohini is also a published author of fiction and non-fiction books. Her debut novel 'Road to Cherry Hills' enjoyed critical acclaim worldwide. Her second book 'Techtonic Shift' traces the history and evolution of computers and the Internet. Sohini has a masters degree in communications from Manipal Institute of Communication, Karnataka. She is trained in Karate and enjoys blogging and stargazing when she is not working. "
Catch all the Technology News and Updates on Live Mint. Download The Mint News App to get Daily Market Updates & Live Business News.
More Less
Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.