‘India’s bet on smaller AI models may overlook CPUs’: Ziroh Labs CEO Hrishikesh Dewan
India’s strategic roadmap for AI development is coming into sharper focus, with policymakers and government officials backing a bottom-up approach that prioritises small, sector-specific AI models, shared infrastructure, and distributed access to compute.
While this potentially marks a departure from global AI development trends, India is looking to provide access to low-cost compute by investing in GPUs (Graphics Processing Units). Under the IndiaAI Mission, which has a financial outlay of Rs 10,372 crores over five years, the Indian government plans to triple its installed GPU capacity (from 38,000 to 1,00,000 GPUs) by the end of 2026.
However, small language models require less computational power than large frontier models, allowing the former to be deployed and executed on CPUs (Central Processing Units), according to Hrishikesh Dewan, the co-founder and CEO of Ziroh Labs. The Bengaluru-based AI start-up has developed Kompact AI, a platform that enables developers to deploy AI models that run on cost-effective CPUs instead of pricey GPUs, across cloud, on-premises, and on-device environments.
In a conversation with The Indian Express, Dewan discusses running AI workloads on CPUs, the trade-offs involved, and how Kompact AI looks to solve these challenges.
Edited excerpts from the interview:
Q: Due to the AI boom, back-end hardware has become increasingly GPU-intensive. Why do you think CPUs are still relevant in the era of GPUs? Are they more energy efficient? What are the operational benefits?
Hrishikesh Dewan: GPUs were initially add-ons to CPUs many years ago. At the time, if someone wanted to play games, they needed more processing power because games are very graphics-intensive. Back then, people bought GPUs that came as separate cards. They still come as PCI cards, which are inserted into the slots on the motherboard. While a game is running, the system uses the GPU rather than the CPU.
This trend, which has become widespread over the last 10 to 15 years, shows one thing: if you have a compute-intensive job, then a GPU is the way to go. It will help you perform compute-intensive jobs effectively on a GPU.
For example, consider a typical workflow where you have a video that needs to be processed. So, you process the video using a GPU, but when you want to serve it to different people, you use CPU-based servers because no further processing is required; you just have to serve it. As a result, the GPU came to the forefront as a machine capable of handling large amounts of computational work.
From 2017 onwards, AI took the front seat and many models were developed, these models require significantly more computational power to produce results. A large model has high computational requirements because it involves many equations or variables. Therefore, for huge models that intend to solve many use cases, it definitely requires a GPU.
But do you really want to have such big models? For example, if a model is designed to answer questions on a particular type of cancer, such as oral cancer or breast cancer, does it need to answer questions on philosophy? It does not. In that case, you are developing a model that solves a specific problem.
Small models require less computation, allowing them to be deployed and executed on CPUs. However, if you deploy a model to answer questions about oral cancer and then put it countrywide, and have everyone ask that model a question, your computational requirements will be high because thousands of people will be asking questions simultaneously.
But then you have to distribute the load. In these cases, CPUs can also be used. You don’t need to use a GPU. Small models have lower computational requirements and can be executed very efficiently on CPUs rather than GPUs.
Story continues below this ad
Q: CPUs are better suited for deploying small language, domain-specific models as opposed to large, frontier models. But what about AI agents? Would CPUs also be able to handle that workload?
Dewan: There is a model that delivers intelligence, and agentic AI is an application that uses that intelligence. Now, that intelligence can be delivered by a small model or a huge model.
Generally, in the case of agents, they tend to use small models because it’s straightforward. Let’s say you want to book a ticket from Delhi to Bangalore, and you are using an AI agent. The AI agent will require public information such as which flights are available from Delhi to Bangalore. It also needs some personal information, such as which flight you want to take.
Therefore, this work can be done using models that only do two things: Providing personal preferences accurately, and providing public information accurately. Agentic AI is really about distributed AI. That means intelligence comes from multiple sources, and you combine them to act. Therefore, CPUs are very well suited for this kind of distributed computation.
Q: When Google released Gemini 3, it attracted a lot of attention because inference was done on its own custom TPUs. As the AI race evolves, many startups and labs are looking to optimise the inference layer. Are CPUs better equipped to handle inferencing?
Dewan: It depends on the size of the model. There are two phases in a model: model development, also known as training, and deployment for inference. Inferencing is essentially the use of the trained model.
TPUs, developed by Google, are specialised, meaning they are optimised for a specific class of models. GPUs, on the other hand, are more generic hardware used for both training and inference.
It is good that such variations are emerging, as they give people more choices in selecting the proper hardware for their work. CPUs cannot solve huge models; that is not possible, but they can handle small ones. For small models, and when you understand the scale of deployment and who you are serving, CPUs are a practical and effective hardware choice for deployment and scaling.
Q: What are the barriers to wider adoption of AI CPUs?
Dewan: The main challenge is throughput—that is, how many tokens can be delivered per second. If the throughput is low, the model becomes frustrating to use. For example, if I ask a question and it takes 1 minute to get a reply, people will stop using it over time.
The second issue is quality. The model must provide accurate and correct answers. For instance, if I’m a class 10 student and I ask a biology model to explain photosynthesis, and the model gives an incorrect answer, that’s a serious problem. So, there are two main challenges: 1) increasing the speed of delivery and 2) ensuring the quality of the answers.
Multiple solutions have been developed to address these problems. For most of the models deployed using CPUs, the throughput is often insufficient and does not meet the required SLA. Kompact AI exists because we solve this problem.
When you deploy a model with Kompact AI, the throughput is almost 2X what you get today. That means if something currently takes one minute, it will take just 30 seconds to deliver on Kompact AI, with no compromise on the quality of the results provided by the model.
Q: How does Kompact AI specifically achieve this? Could you walk us through the development cycle of Kompact AI?
Dewan: Back in 2021, when these frontier models were being developed, and model sizes were increasing every quarter, we realised that if we continued with the existing approach—given the mathematics involved in delivering an AI model—it would become tough to scale.
So, we started working on this in 2021, focusing mainly on the science part. We spent several years on that. By 2024, we began to see results from our scientific work, and that is when we moved into the engineering phase. Ultimately, we developed a runtime called Kompact AI Runtime that delivers very high throughput without sacrificing quality.
Kompact AI is straightforward. There are many models available on Hugging Face, and most people publish their models there. You can download these models and run them on Kompact AI.
We use the quality metrics specified by the model designer to evaluate the model. We download the benchmarks, run them, and verify whether the model meets the designer’s metrics.
Kompact AI does something similar to what Nvidia’s CUDA tech stack does for GPUs, but it also goes further. If you just take the same equations that run on a GPU and run them on a CPU without modification, the GPU will naturally perform better. The equations themselves need to be modified. This is why Kompact AI is more of a scientific problem than just an engineering problem.
Q: Does Kompact AI provide ready-to-use AI models that are already fine-tuned and optimised for developers to build applications on top of it?
Dewan: Yes. We have developed these derivative or fine-tuned models. For a developer, there is very little to worry about. They can click a button in Google Cloud, and the model is deployed automatically. They do not have to write a single line of code, and the process is straightforward.
Story continues below this ad
Today, Kompact AI enables you to run these models on CPUs, but in the future, we will support GPUs as well, so you will be able to run the Kompact AI runtime on them. And after that, we are also very open to supporting TPUs.
Ultimately, we have done a lot on the science side. Because of that, we can manifest it on different hardware. If we can make a model produce 2X tokens per second compared to the current availability, we can produce at least 1.5X more tokens than CUDA.
Q: Which CPUs are used by Kompact AI, and where are they located?
Dewan: It’s global. We use CPUs from Indian data centres, OEMs like Intel and AMD, and CPUs available on Google Cloud, AWS, and Azure. They are located all over the globe—some in Ohio, some in other regions. Developers and enterprises who want to use Kompact AI can decide where to deploy it.
Q: If I am an India-based developer and I want my data to stay in India, can they select CPUs explicitly located in India?
Dewan: Yes, absolutely. We do not provide the hardware; we provide the software. Any organisation or enterprise can deploy it wherever they want, depending on their regulatory requirements or other needs.
For example, an organisation dealing with healthcare data may wish to ensure that everything is deployed only in India. That’s completely fine. They can choose the hardware, whether from a cloud provider or their own data centre. We give them the software and help deploy it on that hardware.
Q: What is required for India to get ahead in the AI race, or to have a foothold in AI?
Dewan: In technology, you have to compete globally. We are in Bangalore, but that does not mean we can just say we are an India-based company and compete only with Indian companies. That’s not how it works. We have to compete with companies everywhere: the Bay Area, Tel Aviv, New York, and Boston.
So, when we say we want to have a foothold in AI, the question is: in which area? If you want to develop a frontier model or a huge model, then you have to compete with DeepSeek, Gemini, OpenAI, and others. If you cannot beat them technically in that space, it becomes difficult.
Every organisation today has access to a wide range of resources. For example, the linguistic data India has—do you think Google doesn’t have access to it? They do. The barriers to accessing resources are not unique to any one country.
Q: If I am looking to adopt CPU-based compute solutions, what are the switchover costs?
Dewan: Zero cost. In fact, it actually brings down the cost. When you use a GPU, you are paying a lot of money to keep it running. If you move the workload to a CPU with Kompact AI, it will cost much less. The migration cost is next to nothing. You download the runtime, get the model from a platform like Hugging Face, and run it. It hardly takes two minutes.
Q: What is your perspective on this emerging idea of GPU-based data centres in space?
Dewan: It’s an exciting and positive initiative. If it succeeds, overall power requirements could be significantly lower. The energy that large GPU-based data centres would otherwise consume could be redirected to other uses, such as irrigation, rural electrification, and similar needs. And CPUs can also go to space, potentially at even lower cost, since their power requirements are much lower.
Comments are closed.