Neural network artificial intelligence models used in applications such as medical image processing and speech recognition operate on very complex data structures that require a lot of computation to process. This is one of the reasons why deep learning models consume a lot of energy.
To improve the efficiency of AI models, researchers at MIT have created an automated system that enables developers of deep learning algorithms to take advantage of two types of data redundancy simultaneously. This reduces the amount of computation, bandwidth and memory storage required for machine learning operations.
Existing techniques for optimizing algorithms can be troublesome, often allowing developers to exploit only sparsity or symmetry – two different types of redundancy present in deep learning data structures.
By enabling developers to build algorithms immediately with two redundant algorithms, the MIT researchers’ approach has increased the computing speed by nearly 30 times in some experiments.
Since the system uses a user-friendly programming language, machine learning algorithms can be optimized for a wide range of applications. The system can also help scientists who are not experts in deep learning but want to increase the efficiency of their AI algorithms used to process data. In addition, the system can be applied in scientific computing.
“For a long time, capturing this data redundancy required a lot of implementation work. Instead, scientists can tell our systems what they want to compute in a more abstract way, without telling the system to compute the system accurately.
Lead author Radha Patel ’23, SM ’24 and senior writers Saman Amarasinghe (Professor of the Department of Electrical Engineering and Computer Science) and lead researcher at the Laboratory of Computer Science and Artificial Intelligence (CSAIL) joined her reference.
Cut calculations
In machine learning, data is often represented and manipulated into a multidimensional array, called a tensor. A tensor is like a matrix, which is a rectangular array of values arranged on two axes, rows, and columns. However, unlike a two-dimensional matrix, tensors can have many dimensions or axes, making tensors more difficult to manipulate.
Deep learning models operate on tensors using repeated matrix multiplication and additions – this process is how neural networks learn complex patterns in data. The huge calculations that must be performed on these multidimensional data structures require a lot of calculations and energy.
However, because of the way tensor data is arranged, engineers can often increase the speed of neural networks by cutting off redundant calculations.
For example, if a tensor represents the user reviewing data from an e-commerce website, most of the values of that tensor may be zero because not every user reviews every product. This type of data redundancy is called sparseness. Models can only save time and calculations by storing and operating with nonzero values.
Also, sometimes tensors are symmetric, which means that the upper and lower halfs of the data structure are equal. In this case, the model needs to operate only halfway, reducing the amount of calculations. This type of data redundancy is called symmetry.
“But when you try to capture both optimizations, the situation becomes very complicated,” Ahrens said.
To simplify the process, she and her collaborators built a new compiler, a computer program that translates complex code into a simple language that can be processed by a computer. Their compiler is called Systems, which optimizes the calculations by automatically leveraging sparsity and symmetry in tensors.
They began the process of building SYSTEC by identifying three key optimizations that can be performed using symmetry.
First, if the output tensor of the algorithm is symmetric, then only half of it is calculated. Second, if the input tensor is symmetric, the algorithm only needs to read half of it. Finally, if the intermediate result of the tensor operation is symmetric, the algorithm can skip redundant calculations.
Optimize simultaneously
To use Systemc, the developer enters its program and the system automatically optimizes its code for all three symmetry codes. The second phase of Systemc then performs additional conversions to store only non-zero data values, thus optimizing the program for sparse programs.
Finally, Systemec generates ready-made code.
“That way, we can get the benefits of optimization. The interesting thing about symmetry is that as your tensor has more sizes, you can get more savings computationally,” Ahrens said.
The researchers demonstrated that the code automatically generated using Systec has almost 30 times faster acceleration.
Since the system is automated, it can be especially useful in situations where scientists want to process data using algorithms they write from scratch.
In the future, researchers hope to integrate SYSTEC into existing sparse tensor compiler systems to create seamless interfaces for users. Additionally, they want to use it to optimize the code of more complex programs.
This work is funded in part by Intel, the National Science Foundation, the Defense Advanced Research Projects Agency and the Department of Energy.