BMClogo

Furniture robots trained to perform housework tasks in factories may not be able to effectively scrub the sink or remove garbage when deployed in the user’s kitchen, because this new environment is different from the training space.

To avoid this, engineers often try to match the real world where the agent is deployed as much as possible with the simulated training environment.

But researchers from MIT and elsewhere now find that despite this traditional wisdom, training in completely different environments sometimes results in better performing AI agents.

Their results show that in some cases, training simulated AI agents in a world of uncertainty or “noise” allows them to perform better than competing AI agents in the same noisy world they use to test both agents.

Researchers call this unexpected phenomenon an indoor training effect.

“If we learn to play tennis in an indoor environment without noise, we might be able to master different shots more easily. Then, if we move to a noisy environment, such as a windy tennis court, we can joke in the wind instead of the possibility of learning tennis,” explains Serena Bono, research assistant at MIT Media and a research assistant at the leader, and a research assistant at the MIT Media, and a higher likelihood of starting to learn tennis. ”

Video thumbnails

Play video

Indoor training effect: unexpected gains in the distribution of transforming functions
Video: MIT’s Brain, Thought and Machine Center

The researchers studied this phenomenon by training AI agents to play Atari games, and they modified it by adding some unpredictability. They were surprised to find that indoor training effects always occur in Atari games and game variants.

They hope these results drive more research into AI agents developing better training methods.

“This is a completely new axis. Instead of trying to match the training and testing environments, we might be able to build a simulation environment where AI brokers learn better,” added Spandan Madan, a Harvard graduate student.

Ishaan Grover, a graduate student at MIT, joined Bono and Madan. Mao Yasueda, a graduate student at Yale University; Cynthia Breazeal, a professor of media arts and sciences at MIT Media Lab and head of the personal robotics group; Hanspeter Pfister, Professor of Computer Science at Harvard University; Gabriel Kreiman, a professor at Harvard Medical School. The research will be conducted at the Development Association of the Artificial Intelligence Conference.

Training questions

Researchers set out to explore why reinforcement learners tend to perform so miserably when tested in a different environment than the training space.

Reinforcement learning is a trial-and-error approach in which agents explore training spaces and learn to take actions that maximize their rewards.

The team developed a technique that explicitly adds a certain amount of noise to an element of reinforcement learning problem called transitional features. The transition function defines the probability that the agent will move from one state to another based on its selected action.

If the agent is playing PAC-Man, the transition feature may define the probability that the ghost on the game board will move up, down, left or right. In standard enhanced learning, the AI ​​will be trained and tested using the same transition function.

The researchers added noise to the transition function through this conventional approach, and as expected, it hurts the agent’s Pac-Man performance.

But when the researchers trained agents with noise-free Pac-Man games and then tested in an environment where noise was injected into transition functions, it performed better than agents trained in noisy games.

“The rule of thumb is that you should try to capture the transitional capabilities of the deployment conditions and get the most out of your ability during training. We do verify this insight as death because we can’t believe it ourselves.”

Injecting different amounts of noise into the transition feature allowed researchers to test many environments, but did not create realistic games. The more noise they inject into the Pac-Man, the ghosts will randomly teleport to different squares.

To see if the indoor training effect occurs in a normal Pac-Man game, they adjust the potential probability, so the ghost moves normally, but is more likely to move up and down rather than left and right. In these realistic games, AI agents trained in noise-free environments still perform better.

“It’s not only due to the way we add noise to create temporary environments. It seems like a property of reinforcement learning problems. It’s even more surprising to see this,” Bono said.

Exploration explanation

As the researchers looked deeper into the explanation, they saw some of the correlations of how AI agents explored the training space.

Agents trained in non-noise environments will perform better when two AI agents mainly explore the same area, perhaps because agents are more likely to learn the rules of the game without being disturbed by noise.

Agents trained in noisy environments tend to perform better if their exploration patterns are different. This may happen because the proxy needs to understand patterns that it cannot learn in a noise-free environment.

“If I only learned to play tennis with my forehand in a non-noise environment, but in a noisy environment, I also had to play with backhand, I would play in a non-spicy environment.”

In the future, researchers hope to explore how indoor training effects can occur in more complex reinforcement learning environments or in other technologies such as computer vision and natural language processing. They also want to create a training environment designed to leverage the effectiveness of indoor training, which can help AI agents perform better in uncertain environments.

Source link