Sarah Alnegheimish’s research interests lie at the intersection of machine learning and systems engineering. Her goal is to make machine learning systems easier to access, transparent and trustworthy.
Alnegheimish is a doctoral student in the Data and AI team, principal research scientist Kalyan Veeramachaneni in the Data and Decision System (LIDS) of the MIT Laboratory. Here she devotes most of her energy to developing Orion, an open source, user-friendly machine learning framework and time series library capable of detecting supervisorless anomalies in large industrial and operational environments.
Early impact
She is the daughter of a university professor and teacher educator and learned from a very young age that knowledge is free to share. “I think growing up in a family that values education is part of the reason I want to get access to machine learning tools.” Alnegheimish’s personal experience in open source resources only adds to her motivation. “I learned to see accessibility as the key to adoption. To work hard for impact, new technologies need to be accessed and evaluated by those who need it. That’s the whole purpose of doing open source development.”
Alnegheimish received his bachelor’s degree from King Saudi University (KSU). “I was the first cohort in computer science majors at the time. Before creating the program, the only other available specialties was IT (Information Technology).” Being part of the first cohort was exciting, but it brought its own unique challenges. “All teachers are teaching new materials. Success requires independent learning experience. That was the first time I encountered opencourseware at MIT: as a resource for self-study.”
Shortly after graduation, Alnegheimish became a researcher at Abdulzis Science and Technology (KACST) of the Saudi Arabian National Laboratory. Through the Center for Complex Engineering Systems (CCE) of KACST and MIT, she began research with Veeramachaneni. When she applied to MIT for graduate school, his research team was her best choice.
Create Orion
The focus of Alnegheimish’s main paper is time series anomaly detection – the identification of unexpected behaviors or patterns in the data that can provide users with key information. For example, abnormal patterns in network traffic data may be a sign of a cybersecurity threat, abnormal sensor readings in heavy machinery can predict potential future failures, and monitoring patient vital signs can help reduce health complications. Through her master’s study, Alnegheimish first began designing Orion.
Orion uses statistical and machine learning-based models that are continuously recorded and maintained. Users do not need to become machine learning experts to use code. They can analyze signals, compare anomaly detection methods, and study anomaly situations in end-to-end programs. Framework, code and datasets are all open source.
“With open source, accessibility and transparency are implemented directly. Your access to the code is not limited, where you can study how the model works by understanding the code. We have improved transparency with Orion: we tag every step in the model and display it to the user.” This transparency helps enable users to finally see its reliability themselves, so that users can start trusting the model,” Alnegheimish said.
“We are trying to use all of these machine learning algorithms and put them in one place so that anyone can use our model on the spot,” she said. “It’s not just a sponsor of our partnership at MIT. Many public users use it. They come to the library, install and run it on their data. It proves itself a good source for people to find some of the latest ways to detect anomaly detection.”
Repurpose the model for anomaly detection
In her PhD, Alnegheimish is further exploring innovative methods for abnormal detection using Orion. “When I first started researching, all machine learning models needed to train your data from scratch. Now we can be in a time when we can use pre-trained models.” Using pre-trained models can save time and computational costs. However, the challenge is that time series anomaly detection is a completely new task for them. “In the initial sense, these models have accepted predictions, but no exceptions are found,” Alnegheimish said. “We push their boundaries through timely engineering without any other training.”
Since these models have captured patterns of time series data, Alnegheimish believes they already have everything they need to enable them to detect exceptions. Her current results so far support this theory. They didn’t exceed the success rate of models that independently trained specific data, but she thinks they will one day.
Accessible designs
Alnegheimish talks in detail about her efforts to make Orion more accessible. “Before I came to MIT, I used to think that the key part of my research was developing a machine learning model itself or improving its current state. Over time, I realized that the only way you can make your research accessible and adaptable is to develop systems that make it accessible. During my graduate study, I took the approach to my models and systems in tandem.”
The key element of her system development is finding the right abstraction to work with her model. These abstractions provide a common representation for all models with simplified components. “Any model will have a series of steps from the original input to the desired output. We have standardized the input and output, which makes the middle flexible and fluid. All the models we run so far have been able to refurbish it into our abstraction.” The abstraction she used for the past six years has been stable and reliable.
The value of building systems and models simultaneously can be seen in the work of Alnegheimish as a mentor. She had the opportunity to earn an engineering degree with two master’s students. “What I showed them was the system itself and the documentation of how to use it. Both students were able to develop their own models with abstractions we fit. It reiterated that we were on the right path.”
Alnegheimish also investigates whether large language models (LLMs) can be used as mediators between users and systems. The LLM proxy she implements can connect to Orion without the need for users to understand the little details of how Orion works. “Think about chatgpt. You don’t know what the model behind it is, but it’s easy for everyone to access.” For her software, users know only two commands: fit and detection. FIT allows users to train their own models while detecting allows them to detect exceptions.
“The ultimate goal of what I’m trying to do is to make AI more accessible to everyone,” she said. So far, Orion has reached over 120,000 downloads, and more than a thousand users have marked the repository as one of their favorites on Github. “Traditionally, you use to measure the impact of your research by citations and paper publications. Now you can adopt it in real time through open source.”