According to Two Platform founder and CEO Pranav Mistry, the artificial intelligence (AI)-powered messaging and social app backed by billionaire Mukesh Ambani and South Korea’s Naver Corp is also planning to launch an artificial intelligence (AI)-powered messaging and social app in India soon. named Zappa.
“Our mission is Sutra to fix the language gap in artificial intelligence language models. We are committed to pioneering AI solutions for non-English markets. We believe our Sutra models will unlock AI development opportunities in huge economies such as India, Korea, Japan and the MEA (Middle East and Africa) region,” Mistry said in an interview Mint.
But there are some fundamental differences “in our approach to building these models,” he insisted. First, unlike most other startups and companies that build “local” or “Indian” LLMs for India by fine-tuning global LLMs, “we have built a basic, not a refined model,” he said.
Basic general-purpose models such as Google’s BERT and Gemini, OpenAI’s pre-trained generative transformer (GPT) variants, and Meta’s LlaMA series have been pre-trained on huge amounts of data from the Internet, books, media articles, and other sources. But most of this training data is in English.
Transformational approach
Most companies in India build their Indian LLMs on these basic models (hence they are called “packages”), fine-tuning these general-purpose LLMs on a smaller, task-specific dataset (such as regional languages such as Hindi, Marathi, Gujarati, Tamil, Telegu, Malayalam, etc. and their dialects), which allows models to learn the nuances of the language and improves its performance.
Instead, Sutra uses two different transformer architectures. Google’s transformers predict the next word in a text sequence based on huge, convoluted data sets. Because they process words in a single sequence while understanding their interrelationships, transformers are very effective in tasks such as language translation.
According to Mistry, the multilingual LLM sutra combines the LLM architecture with neural machine translation (NMT). Reason: While LLM may struggle due to a lack of specialized training data when translating specific language pairs, NMT systems are typically better equipped to translate idiomatic expressions and colloquial language.
Second, while “GPT-4 is great in Korean and Hindi too, its size and cost make it more steep in a country like India,” Mistry argued. The architecture of the Sutra “separates concept learning (we learn concepts by associating novel information with existing knowledge, e.g., learning that apples and oranges are both fruits) from language learning. So, if you employ Sutra, the number of tokens used is similar to that used for English tokens. This saves almost five to eight times on costs,” it explains.
Third, “our specialized DTM models have much smaller parameters and require much less data to train,” Mistry said. When you add more data, say Korean or some Indian language, you also augment the tokens (loosely the fragments of words and subwords that the LLM can understand. For example, banana is a word, while homework can be broken down into two words, home and work). This makes the model larger, but also slows it down. This also increases costs because similar information content in English expressed in a language like Hindi would require three to four times as many tokens.
“Besides, with this approach, the quality of, say, Hindi will never exceed the quality of the English original,” Mistry added. For example, approximately 80% of the initial training in a general basic model will typically come from sources such as the Internet, books and media articles, which are predominantly in English.
Innovation, not fine-tuning
However, if you adapt this model to Hindi data from India, for example, “most of the data will be cricket, data found on Twitter or from people discussing news articles etc. in Hindi. Therefore, a Hindi model built on a base model that has been previously trained primarily in English will not be able to fully represent the results in Hindi.”
“For example, if you want to translate Gujarati to Tamil, most models first translate from Gujarati to English and then from English to Tamil because they trained on that data. Our model doesn’t do this, so we also need fewer tokens, which also lowers the costs of running the model,” he explained. Mistry adds that the Two Platforms model is also aligned with human values, a process technically known as “AI alignment.”
Sutra, which is currently available in three versions – Lightweight (56 billion parameters), Online (multi-lingual Internet-connected model with 56 billion parameters) and Pro (150 billion parameters) – supports over 50 languages, “of which 31 are fully tested ”, claims Mistry. He emphasized that Sutra’s architecture and employ of “synthetically translated data” not only reduces the computational costs of running these models, but also increases model performance.
“Sutra maintains an impressive English performance of 77% on the Massive Multitask Language Understanding (MMLU) test. It also shows excellent and consistent performance in the range of 65-75% across languages. In contrast, many leading language models achieve scores closer to 25% on non-English MMLU tasks,” Mistry said.
Two Platforms uses “its own GPU (graphics processing unit) cluster and rents high-end GPUs in the cloud as needed.” “As we grow, rising training costs will require us to create specialized models for different areas such as images and video,” Mistry added. His company is also in the process of raising a Series A round “to accelerate Sutra’s development into a platform-as-a-service (MaaS) platform.” In February 2022, Jio Platforms invested $15 million in two platforms for 25% equity stake, while Naver Corp unit Snow Corp. invested $5 million.
Apart from Sutra, India is home to Sarvam AI, a generative artificial intelligence (GenAI) startup that launched the Open Hathi series; Tech Mahindra’s Indus Project; the ‘Hanooman’ model, which was jointly launched this month by SML India and 3AI Holding, an Abu Dhabi-based investment firm; Chatbot powered by CoRover’s BharatGPT LLM; and Ola Cabs and Krutrim AI co-founders Bhavish Aggarwal, Ola Electric. Meanwhile, ‘Nilekani Center at AI4Bharat’ at IIT Madras has also released ‘Airavata’ an open source LLM tool for Indian languages.
As per a research report released by MarketsandMarkets in March, the broader LLM market will grow from $6.4 billion in 2024 to $36.1 billion by 2030. Moreover, India-specific LLM services are certainly the need of the hour , but “we need faster, cheaper, multilingual and energy-efficient LLMs that can fill existing market gaps,” concluded Mistry, who hopes Sutra will be one of those companies that “fills this gap.”