Coordinating complex interactive systems, whether it is different modes of transportation in cities or the need to work together to create various components of effective robots, is increasingly important for software designers. Now, MIT researchers have developed a completely new approach to solving these complex problems, using simple graphs as tools to reveal ways to better optimize software in deep learning models.
The new approach makes solving these complex tasks so simple that it can be simplified to drawings that fit the back of a napkin, they say.
New methods are described in journals Transactions in machine learning researchIn one article, PhD students from the Laboratory of Information and Decision Systems (LIDS) at MIT, Professors Vincent Abbott and Gioele Zardini.
“We designed a new language to talk about these new systems,” Zadini said, explaining that the “language” based on the new graph is based on the so-called category theory.
This is all about designing the basic architecture of computer algorithms, which actually end up sensing and controlling various parts of the system being optimized. “Components are different parts of the algorithm, they have to talk to each other, exchange information, but also take into account energy use, memory consumption, etc..” As we all know, this optimization is difficult because each change in one part of the system may in turn lead to changes in other parts, thereby further affecting other parts, and so on.
The researchers decided to focus on specific categories of deep learning algorithms that are currently a hot topic in research. Deep learning is the basis of large artificial intelligence models, including large language models such as image generation models such as Chatgpt and Midjourney. These models manipulate data through a series of “deep” matrix multiplications scattered with other operations. The numbers in the matrix are parameters and are updated over a long period of training, so that complex patterns can be found. The model consists of billions of parameters, making the calculation expensive, thereby improving the preciousness of resource usage and optimization.
Graphs can represent detailed information on parallel operations of deep learning models, including the relationship between algorithms and the hardware of parallel graphics processing units (GPUs) they run, provided by companies such as NVIDIA. “I’m very excited about this because “we seem to find a language that describes deep learning algorithms well, explicitly expressing everything important, the operator you use,” Zardini said, such as energy consumption, memory allocation, and any other parameters you’re trying to optimize.
Many advances in deep learning stem from resource efficiency optimization. The latest DeepSeek model shows that small teams can compete with the top models of OpenAI and other major labs by focusing on resource efficiency and the relationship between software and hardware. Often, when derive these optimizations, he says, “People need a lot of trial and error to discover new buildings.” He says, for example, a widely used optimization program called Flashattention took more than four years to develop. But with the new framework, they developed “we can really solve this problem in a more formal way.” All of this is visually represented in a precisely defined graphical language.
But the methods used to find these improvements are “very limited”, he said. “I think this suggests a significant gap because we don’t have a formal system approach that associates the algorithm with its best execution, and don’t even really understand how much resources it needs to run.” But now, with the new approach based on new graphs they designed, this system exists.
Classical theory based on this approach is a way of mathematically describing the different components of a system and how they interact in a generalized, abstract way. Different perspectives may be relevant. For example, mathematical formulas can be related to algorithms that implement them and use resources, or a description of the system may be related to a reliable “single string diagram.” These visualizations allow you to play directly and try the connections and interactions of different parts. He said they developed the equivalent of “string diagrams on steroids” that contained more graphical conventions and more features.
“Category theory can be considered as abstract and compositional mathematics,” Abbott said. “Any compositional system can be described using category theory, and then the relationships between compositional systems can also be studied.” He said that algebraic rules that are usually associated with functions can also be represented as graphs. “There is then a lot of visual tricks we can use charts to deal with, and we can relate to algebraic tricks and functions. So it creates this correspondence between these different systems.”
As a result, he said, “This solves a very important problem, that is, we have these deep-learning algorithms, but they are not clearly understood as mathematical models.” But by representing them as graphs, they can be approached formally and systematically.
One thing this enables is that the way parallel real-world processes can be represented by parallel processing in the multi-compute computer GPU. “That way,” Abbott said, “the graph can represent a function and then reveal how to perform it best on the GPU.”
The “note” algorithm is used by deep learning algorithms that require general context information and is a critical stage in the serialization blocks that make up large language models such as ChatGpt. Flashertention is an optimization that took years to develop, but the attention algorithm is six times faster.
Zardini said when applying his method to established flash algorithms, “Here, we were able to literally deduce it on napkins.” He then added, “Well, maybe a big napkin.” But to make their new approach simplify the focus of dealing with these complex algorithms, they titled their formal research paper “Flash on napkins.”
Abbott said this approach “allows to really come up with optimizations compared to the common approach.” Although they initially applied this approach to an already existing flash algorithm, thus verifying its effectiveness, “we hope to use this language now to automate improved detection,” Zadini said, besides being the lead researcher of the cover, he is civil and Alan Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen Allen
The plan will eventually develop software, he said, so that “the researchers upload their code, and with the new algorithm, you will automatically detect what can be improved, what can be optimized, and you will return an optimized version of the algorithm to the user.”
In addition to automated algorithm optimization, Zardini also notes that in-depth analysis of how learning algorithms are related to the scope of hardware resource usage allows for systematic design of hardware and software. This work focuses on co-designing with Zardini, which uses the tools of category theory to simultaneously optimize various components of the engineering system.
“I think the entire field of optimized deep learning models is very rare, and that’s why these charts are so exciting,” Abbott said. “They open the door to a systematic approach to solving this problem.”
“I was impressed by the quality of this research. …. The new approach to in-depth learning algorithms used in this article may be a very important step,” said Jeremy Howard, founder and CEO of Answers.ai, who has nothing to do with the work. “This article is the first time I’ve seen such symbols for in-depth analysis of the performance of real-world hardware deep learning algorithms. …The next step will be to see if real-world performance improvements can be achieved.”
“This is a beautifully executed theoretical study, which is also designed to enable people to have high access to readers – rare in such papers,” said Petar Velicovic, a senior research scientist at Google DeepMind and lecturer at the University of Cambridge, who has no connection to the work. He said these researchers are obviously excellent communicators and I can’t wait to see what they come up with next! ”
The new language based on new images released online has attracted great attention and interest from software developers. A reviewer of Abbott’s previous paper introduced the chart, noting: “From an artistic point of view, the proposed neural circuit diagram looks great (as far as I can tell). “It’s technical research, but it’s also flashy! ” Zardini said.