Helping machines use AI to understand visual content

Data should drive every decision made by modern enterprises. But most businesses have a huge blind spot: they don’t know what’s happening with visual data.

Working together is working hard to change that. The company was founded by Cody Coleman ’13, Meng ’15 and William Gaviria Rojas ’13 created an AI-powered platform that understands data like images, audio, and video to unlock new insights.

Coactive’s platform can immediately search, organize and analyze unstructured visual content to help businesses make faster and better decisions.

“In the first big data revolution, it’s better for companies to get value from structured data,” Coleman said. “But now, about 80 to 90% of the world’s data is unorganized. In the next chapter of big data, companies will have to process data like images, videos, and audio at scale, and AI is a key part of unlocking the feature.”

CoActive is already working with several large media and retail companies to help them understand their visual content without relying on manual classification and tagging. This helps them remove the right content from their platform faster and discover how specific content affects user behavior.

From a broader perspective, the founders view shared action as an example of how AI can make humans work more efficiently and solve new problems.

“Working together means working together at the same time, and that’s our grand vision: helping humans and machines work together,” Coleman said. “We think that the vision now is more important than ever because AI can pull us apart or bring us together. We want to act together to be an agent that pulls us together and provides a new superpower for humanity.”

Provide computer vision

Coleman met the first MIT mid-term edge program with Gaviria Rojas in the summer. Both will continue to hold majors in electrical engineering and computer science majors and are committed to bringing MIT OpenCourse software content to universities in Mexico, among others.

“It’s a great example of entrepreneurship,” Coleman recalls. “It really has the ability to take charge of business and software development. It led me to subsequently start my own small web development business and take the journey of the founder of (MIT courses).

While working as a graduate researcher at the Office of Digital Learning (now MIT Open Learning), Coleman first explored the AI power of MIT, where he used machine learning to study how humans learn about MITX, how large, open online courses created by MIT teachers and lecturers.

“For me, you can democratize this transformative journey you’ve experienced at MIT with digital learning, and you can use AI and machine learning to create adaptive systems that not only help us understand how humans learn, but also provide more personalized learning experiences for people around the world,” Coleman told MITX. “This is also the first time I’ve explored video content and applied AI to it.”

After MIT, Coleman went to Stanford University to earn his PhD, where he lowered the barriers to using AI. The research led him to work with companies such as Pinterest and Meta on AI and machine learning applications.

“That was where I could see the future of people want to deal with AI and its content,” Coleman recalls. “I saw how leading companies use AI to drive business value, and that’s the initial spark that happened simultaneously. I thought, ‘What if we created enterprise-class operating systems for content and multimodal AI to make this simple easy?’’

Meanwhile, Gaviria Rojas Move to the Bay Area in 2020 and start working Data scientists at eBay. As part of the move, he needed help with the sofa, and Coleman was his lucky friend to call.

“In the car, we realized that we both were exploding around data and AI,” said Gaviria Rojas. At MIT, we were in the front seats of the big data revolution, and we saw people inventing technology to unlock value from data at scale. Cody and I realized we had another powder keg to explode, and the business gathered a lot of data, but this time it was multi-modal data such as images, videos, audio and text.

The platform the founders continue to build – what Coleman calls the “AI operating system” is model agnostic, which means the company can swap AI systems under the hood as the model continues to improve. COACTIVE’s platform includes pre-built applications that enterprise customers can use to search, generate metadata and perform analysis to extract insights.

“Before AI, computers would see the world through bytes, and humans would see the world through vision,” Coleman said. “Now with AI, machines can finally see the world like we do, which will cause the digital and physical worlds to become blurred.”

Improve the human computer interface

Reuters’ image database provides millions of photos for world journalists. Before co-execution, the company relies on reporters to manually enter the tags for each photo, so the correct image will appear when reporters search for certain topics.

“It’s incredibly slow and expensive to browse all of these original assets, so people just don’t add tags,” Coleman said. “That means when you search for things, even the photos related are limited in the database.”

Now, journalists who choose “Enable AI Search” on the Reuters website can extract relevant content based on the AI system’s understanding of the detailed information in each image and video.

“This greatly improves the quality of journalists’ results, which allows them to tell better and more accurate stories than ever before,” Coleman said.

Reuters is not alone in trying to manage all its content. Digital asset management is an important part of many media and retail companies, and they often rely on manual input of metadata to classify and search for that content today.

Another common client is Fandom, one of the largest platforms in the world, with information from over 300 million active users in TV shows, video games and movies. Fandom is using visual data that is simultaneously aware of its online community and helps eliminate excessive bloody and sexualized content.

“The fanatic used to take 24 to 48 hours to view each new content,” Coleman said. “Now there is a common action, they have compiled community guidelines and can generate finer information at an average rate of about 500 milliseconds.”

In each use case, the founder will simultaneously transform into a way to achieve a new paradigm in the way humans and machines cooperate.

“In the entire history of human computer interaction, we had to bend the keyboard and mouse to enter information in a way that the machine could understand,” Coleman said. “Now, for the first time, we can speak naturally, we can share images and videos with AI and understand content. This is a fundamental change in the way we think about the way human computers interact. The shared core vision is because this change is because we need a new operating system and a new way of working with content and AI.”

Source link

Helping machines use AI to understand visual content | MIT News

Recent Posts