Artificial intelligence is changing the way businesses store and access data. This is because traditional data storage systems are designed to handle simple commands from a few users at once, and today, AI systems with millions of agents require constant access and processing of large amounts of data. Traditional data storage systems now have complex layers that slow down the AI system because data must go through multiple levels before reaching the graphics processing unit (GPU) of the brain cells of AI.
Founded by Michael Tso ’93, SM ’93 and Hiroshi Ohta, Cloudian is helping storage keep pace with the AI revolution. The company has developed a scalable storage system for enterprises that helps data flow seamlessly between storage and AI models. The system reduces complexity by applying parallel computing to data storage, merging AI capabilities and data onto a single parallel processing platform, which stores, retrieves and processes to scale data sets and performs direct high-speed transmission between storage and GPU and GPU and GPU and GPU and CPU.
Cloudian’s integrated storage computing platform simplifies the process of building commercial-scale AI tools and provides enterprises with a storage foundation that can keep up with the rise of AI.
“One of the things people miss out on AI is that it has to do with data,” TSO said. “You can’t get 10% of your data or even add 10% of your AI performance to 10% — you need 1000 times more data. Being able to store data in a way that is easy to manage, and in a way that you can embed your computing into the data so that you can run the data while running in the data without moving it, that’s where the industry is.”
From MIT to Industry
As an undergraduate student at MIT in the 1990s, Professor William Dally introduced TSO to parallel computing, a calculation that performs many calculations simultaneously. TSO also collaborated with Associate Professor Greg Papadopoulos to perform parallel calculations.
“It was an incredible moment because most schools have a supercomputing project going on – MIT has four,” TSO recalls.
As a graduate student, TSO worked with David Clark, a senior research scientist at MIT, who contributed to the early architecture of the Internet, especially the Transmission Control Protocol (TCP) that delivers data between systems.
“As a graduate student at MIT, I did disconnect and intermittent network operations for large-scale distributed systems,” TSO said. “It’s been fun – 30 years, and that’s what I’m still doing today.”
After graduation, TSO worked in Intel’s Architecture Labs, where he invented the data synchronization algorithm used by BlackBerry. He also created specs for the ringtone download industry for Nokia. He then joined Inktomi, a startup company led by Eric Brewer SM ’92, PhD’94, which pioneered search and web content distribution technologies.
In 2001, TSO founded Gemini mobile technology with Joseph Norton ’93, SM ’93 and others. The company continues to build the world’s largest mobile messaging system to handle massive data growth in camera phones. Then, in the late 2000s, cloud computing became a powerful way for enterprises to rent virtual servers during production and operation. TSO noticed that the amount of data collected grew much faster than the network speed, so he decided to transfer the company.
“Data is created in many different places, and the data has its own gravity: it will cost you money and time to move it,” TSO explained. “This means that the ultimate state is a distributed cloud that can reach edge devices and servers. You have to bring the cloud into the data, not into the cloud.”
TSO officially launched Cloudian from Gemini mobile technology in 2012, with its new focus on helping customers use scalable, distributed, cloud-compatible data storage.
“What we didn’t see when we first started a company is that AI will be the end use case for Edge data,” TSO said.
Although TSO’s research at MIT began more than two decades ago, he believes that the work he is engaged in has a strong connection to the industry today.
“It’s like I’ve been fighting back all my life, because David Clark and I are dealing with disconnected and intermittently connected networks, which is part of every edge use case today, and Professor Dali is working on very fast, scalable interconnects,” TSO said. “Now, when you look at the modern NVIDIA chip architecture and how it communicates with chips, that’s all Dally’s work. Using Professor Papadopoulos, I used Accelerate Application software for parallel computing hardware without having to rewrite the application, which is what we’re working on solving NVIDIA’s problem.”
Today, Cloudian’s platform uses an object storage architecture where various data (DOCOUMENT, video, sensor data) are stored as unique objects with metadata. Object storage can manage large amounts of data sets in flat file components, making it ideal for unstructured data and AI systems, but traditionally it cannot send data directly to the AI model without first copying the data into the computer’s storage system, creating latency and energy bottlenecks for the enterprise.
In July, Cloudian announced that it had extended the object storage system using a vector database that stores data in an instant-usable form of AI models. With data ingestion, Cloudian is computing vector forms of that data in real time to provide AI tools such as recommendation engines, searches and AI assistants. Cloudian also announced a partnership with NVIDIA to allow its storage systems to work directly with AI companies’ GPUs. Cloudian said the new system could make AI operations faster and reduce computing costs.
“About a year and a half ago, NVIDIA contacted us because GPUs can only be useful with the data that keeps them busy,” TSO said. “Now people realize that moving AI to data is easier than moving large datasets. Our storage system embeds many AI capabilities, so we are able to preprocess and postprocess data near collecting and storing data.”
AI priority storage
Cloudian is helping around 1,000 companies worldwide get more value from their data, including large manufacturers, financial service providers, healthcare organizations and government agencies.
Cloudian’s storage platform, for example, is helping a large automaker use AI to determine when each of its manufacturing robots needs to be repaired. Cloudian has also worked with the National Library of Medicine to store research articles and patents, as well as the National Cancer Database to store tumor DNA sequences—a rich data set that AI models can process to help research and develop new treatments or gain new insights.
“GPUs have always been an incredible enabler,” TSO said. “Moore’s law doubles the number of computations every two years, but GPUs are able to operate in parallel on chips, so you can tie the GPU to Moore’s law.