Ambiguity in medical imaging can present major challenges to clinicians trying to identify diseases. For example, in chest X-rays, pleural effusion (an abnormal accumulation of fluid in the lungs) looks very much like lung infiltration, which is the accumulation of pus or blood.
Artificial intelligence models can help clinicians perform X-ray analysis by helping identify subtle details and increasing the efficiency of the diagnostic process. However, because there may be many possible conditions in one image, clinicians may want to consider a set of possibilities rather than just having an AI prediction for evaluation.
A promising method of generating a set of possibilities, called conformal classification, is convenient because it can be easily implemented on the basis of existing machine learning models. However, it can produce unrealistic collections.
MIT researchers have now developed a simple and effective improvement that can reduce the size of the prediction setup by 30%, while also making the prediction more reliable.
Having a smaller prediction set can help clinicians to make zeros more efficiently in the correct diagnosis, which can improve the treatment of patients. This approach may be useful in a range of classification tasks – for example, to identify species of animals in images in wildlife parks, as it provides a smaller but more accurate set of options.
“With fewer courses, natural predictions are the information you do between choosing fewer options. In a sense, you are not really sacrificing anything because something is more beneficial,” Divya Shanmugam Phd ’24, said after the study of this study, a MIT graduate student.
Helen Lu ’24 joins Shanmugam on paper; Swami Sankaranarayanan is a former MIT postdoctoral fellow and now a research scientist at Lilia Biosciences; Professor Dugald C. Jackson, MIT professor of computer science and electrical engineering, and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The study will be presented at the Computer Vision and Pattern Recognition Conference in June.
Predictive Guarantee
AI assistants for high-risk tasks, such as classifying diseases in medical images, are often designed to produce probability scores as well as each prediction so that users can evaluate the confidence of the model. For example, the model may predict an image at 20%, corresponding to a specific diagnosis, such as pleuritis.
But it is difficult to believe the predictive confidence of the model, as many previous studies have shown that these probabilities may be inaccurate. Through conformal classification, the model’s predictions are replaced by a set of most likely diagnoses and ensure that the correct diagnosis is in the set.
However, the uncertainty inherent in AI predictions often leads to the model’s output set being too large to be useful.
For example, if the model classifies animals in the image as one of 10,000 potential species, it may output 200 predictions, thus providing a strong guarantee.
“There are several courses that allow someone to sift out the right course,” Shanmugam said.
The technique may also be unreliable, as small changes to the input, such as slightly rotated images, can produce completely different sets of predictions.
To make conformal classification more useful, the researchers applied a technique to improve the accuracy of computer vision models called test time enhancement (TTA).
TTA creates multiple enhancements to a single image in the dataset, perhaps by cropping the image, flipping it, zooming in, etc. It then applies the computer vision model to each version of the same image and summarizes its predictions.
“That way, you can get multiple predictions from one example. Aggregating predictions in this way can improve accuracy and robustness,” Shanmugam explains.
Maximize accuracy
To apply TTA, the researchers mastered the image data of some markers used in the conformal classification process. They learn to aggregate the enhancements of these held data to maximize the accuracy of the predictions of the underlying model automatically enhance the images.
They then shared the predictions of the new, TTA transformations of the model. The conformal classifier provides a smaller set of possible predictions for the same confidence guarantee.
“Combining the increase in testing time with conformal predictions is very easy to implement, effective in practice, and does not require model retraining,” Shanmugam said.
Compared with previous work on conformal prediction in several standard image classification benchmarks, its TTA enhancement method reduces the prediction set across experiments to 10% to 30%.
Importantly, this technology enables a reduction in predicted set size while maintaining probability guarantees.
The researchers also found that even though they sacrificed some of the tag data that is usually used in conformal classification programs, TTA was enough to improve accuracy to exceed the cost of losing that data.
“It raises interesting questions about how we use labeled data after model training. Allocation of labeled data between different post-training steps is an important direction for future work,” Shanmugam said.
In the future, researchers hope to verify the effectiveness of this approach in the context of models that classify text rather than images. To further improve the work, researchers are also considering ways to reduce the amount of computation required for TTA.
This study was funded in part by Wistrom Corporation.