In essence, queries on the chatbot could be simply asked through voice notes, following which it would return a voice-based response generated by ChatGPT.
Even as the battle between Google and Microsoft over the future of internet search heats up, WhatsApp could soon become a key search engine for key government programs serving India’s estimated 150 million farmers – fueled by the hugely sensational AI chatbot ChatGPT and an ambitious national-level program that aims to create huge datasets of samples of Indian voices in multiple local languages through a crowdsourcing model.
Bhashini, a small team at the Ministry of Electronics and IT (MeitY), is currently building a WhatsApp-based chatbot that relies on information generated by ChatGPT to return appropriate responses to queries. And because people, especially farmers in rural areas, may not always want to type out their questions, questions can be asked via voice memos on the chatbot.
Essentially, requests to the chatbot could simply be made via voice memos, after which it would return a voice-based response generated by ChatGPT.
According to a senior government official, a mockup of this bot was shown to Microsoft CEO Satya Nadella, who mentioned it during the World Economic Forum in Davos earlier this year. The Indian Express also saw a demo of the chatbot in action, where it seamlessly responded to a question – asked via a voice memo – about Prime Minister Awas Yojana’s details.
The chatbot, which is currently being tested, is being developed with consideration for India’s rural and agricultural populations – the segments of society most dependent on government programs and subsidies – and the different languages spoken by them. And in this regard, it becomes important to build a language model that can successfully identify and understand the local languages spoken by the country’s rural population, said another senior government official linked to the project.
While the responses generated by ChatGPT have so far impressed many with its ability to answer complex queries in an intriguing and eloquent manner, building a national Indian language public digital platform will be key to the success of the WhatsApp chatbot that the Bhashini team building. Building such a language model would require large datasets of the various national languages spoken in India on which to train the model, the official said.
This is where an initiative called Bhasha Daan comes in, he explained. It is an ambitious project aiming to collect speech datasets in several Indic languages. On the project’s website, people can contribute in three ways: by recording their speech samples in several Indic languages, by reading a text, typing in a played sentence, and by translating text from one language to another.
“A majority of people using this chatbot will not speak English. In order for your voice inputs to work on the chatbot, it is therefore important that we train our language processing models in as many Indic languages as possible. We have a decent collection of voices in many Indian languages contributed by the people of the country through the Bhasha Daan portal. We also have a huge database of all the languages Doordarshan broadcasts in. So we used the chatbot’s language model with these datasets,” the second officer explained.
Currently in testing phase, the model supports 12 languages including English, Hindi, Tamil, Telugu, Marathi, Bengali, Kannada, Odia and Assamese. This means that if a user sends a voice memo to the chatbot in one of these languages, the chatbot will successfully return with an answer.
In a country where there is a severe digital divide despite increasing rural connectivity to the global internet, the official said the choice of WhatsApp as the delivery platform was deliberate.
“WhatsApp has more than 500 million users, and even those with relatively little digital literacy are familiar with the app,” he added.
However, there are currently some limitations. In the test phase, the chatbot can only answer simple inquiries about government regulations, among other things. This is mainly due to the current limitation of ChatGPT itself – the fact that it cannot access real-time information from the internet. ChatGPT’s language model was trained on a huge dataset to generate text based on input, and the dataset currently only contains information up to 2021.
However, that could change soon. On Wednesday, Microsoft announced a new version of its Bing search engine, powered by an updated version of the same AI technology that powers ChatGPT. Microsoft said the feature will be supported by an updated version of GPT 3.5, the AI language model created by OpenAI that powers ChatGPT. It called this the “Prometheus model” and said it was more powerful than GPT 3.5 and better able to answer searches with more up-to-date information and annotated answers. The first official said once ChatGPT can scour the web and return with real-time results, the scope of the WhatsApp chatbot could go well beyond what is currently being tested. “People will not only be able to get information about various government programs in a concise manner, but also inquire about their eligibility for some of those programs,” the official said.
While both officials remained noncommittal about the chatbot’s public release, they said its demo impressed Microsoft’s Nadella. However, it is worth noting that Microsoft has reportedly invested $10 billion in OpenAI, which developed ChatGPT.
“One demo I saw was a rural Indian farmer trying to gain access to a government scheme. He just expressed a complex thought in a speech in one of the local languages, which was translated and interpreted by a bot, and the answer came back: “Go to a portal and here’s how to access the program”. He said “I won’t go to the portal, I want you to do this for me.” The bot completed it and the reason it was able to complete it was because a developer who created it, GPT had taken and trained it with all the Indian government documents and then built it up with the speech recognition software,” Nadella had said a year earlier.