Travel agencies help provide end-to-end logistics for businessmen, vacationers, and everyone in between, such as transportation, accommodation, meals and accommodation. For those who wish to make their own arrangements, Large Language Models (LLMS) appear to be a powerful tool to iterate interactions using natural language, providing some common sense reasoning, gathering information and calling other tools to aid the task at hand. However, recent work has found that state-of-the-art LLMs struggle with complex logistical and mathematical reasoning, as well as problems with multiple constraints, such as Trip Planning, during which they are found to provide viable solutions of 4% or less time, even with other tools and application programming interfaces (APIs).
Subsequently, the research teams at MIT and MIT-IBM Watson AI Labs restructured the question to see if they could improve the success rate of LLM solutions to solve complex problems. “We think many planning issues are naturally a combination optimization problem,” says Associate Professor (AeroAstro) and Information and Decision Systems (LIDS), MIT’s AeroAstro, Department of AeroAstro and Astronautics (AeroAstro), said. She is also a researcher at the MIT-IBM Watson AI Lab. Her team applies machine learning, control theory and formal approaches to develop safe and verifiable control systems for robotics, autonomous systems, controllers and human-machine interactions.
The team noted the transferable nature of their work for the travel plan, trying to create a user-friendly framework that could act as an AI travel agent to help develop realistic, logical and complete travel plans. To achieve this, the researchers combined common LLM with algorithms and fully satisfied solvers. Solvers are mathematical tools that strictly check whether conditions can be met and how standards can be met, but they require complex computer programming to be used. This makes them natural companions for LLMS, as these issues require timely planning without programming knowledge or research on travel options. Furthermore, if the user’s constraints cannot be met, the new technology can identify and articulate the location of the problem and propose alternative measures to the user, which can choose to accept, reject or modify them until an effective plan is developed (if present).
“The different complexities of travel planning are something that everyone has to deal with at some point. You can gather different needs, requirements, constraints, and real-world information,” Fan said. “Our idea is not to ask LLM to come up with a travel plan. Instead, the LLM here acts as a translation, turning this natural language description of the problem into a problem that the solver can handle (and then give it to the user),” Fan said.
Yang Zhang of MIT-IBM Watson AI Lab, Yilun Hao, graduate student at Aeroastro and graduate student Yongchao Chen and graduate student at Harvard University. This work was recently presented at the Conference of the American States of the Society for Computational Linguistics.
Decomposition solver
Mathematics is often a domain-specific one. For example, in natural language processing, LLMS performs regression to predict the next token (also known as “word”) to analyze or create documents. This can give a good overview of human input. However, LLMS alone cannot be used for formal verification applications, such as in aerospace or cybersecurity, where circuit connection and constraint tasks need to be completed and proven, otherwise vulnerabilities and vulnerabilities can be sneakily and cause critical security issues. Here, Solvers stand out, but they need fixed format input and struggle with unsatisfactory queries. However, a hybrid technology provides an opportunity for people who are intuitive every day to develop solutions for complex problems such as travel planning.
“The solver is really the key here because when we develop these algorithms, we know exactly how the problem is solved,” Fan said. Specifically, the team used a solver called “Satisfied Modulus Theory” (SMT), which determines whether the formula can be satisfied. “With this particular solution, it’s not just about optimization. It’s there a lot of different algorithms to understand whether the planning problem is possible. This is a very important thing in travel planning. It’s not a very traditional mathematical optimization problem because people put forward all these restrictions, constraints, restrictions, restrictions, restrictions, restrictions.”
Translation action
The “travel agency” can repeat four steps as needed in four steps. The researchers used GPT-4, Claude-3, or Mismtral-large as LLMs for this method. First, LLM parses the user-required travel planning tips into planning steps and points out budget, hotel, transportation, destination, attractions, restaurants and travel durations, and any other user prescription preferences. These steps are then converted into executable Python code (each constraint has natural language annotation), called APIs such as CitySearch, FlightSearch, etc., to collect data, while the SMT solver starts executing the steps specified in the constraint satisfaction problem. If sound and a complete solution can be found, the solver outputs the result to the LLM and then provides the user with a coherent stroke.
If one or more constraints cannot be met, the framework begins looking for alternatives. The solver outputs the code to identify the conflict constraints provided by the LLM (and its corresponding comments) provided by the LLM that then provides potential remedies. The user can then decide how to proceed until the solution (or maximum number of iterations) is reached.
Promoable and robust program
The researchers used the LLM above to test other baseline LLMs: GPT-4 itself, OpenAi O1-preiview itself, GPT-4 with tools to collect information, and a search algorithm that optimizes the total cost. Using the TravelPlanner dataset (including data for feasible plans), the team looked at multiple performance metrics: the method can provide a solution frequently, if the solution meets common-sense criteria, such as not visiting two cities in a day, the ability of the method to satisfy one or more constraints, and the ultimate pass rate indicates that it can satisfy all constraints. New technologies typically exceed 90% pass rate, while the benchmark speed is 10% or less. The team also explored adding JSON representations to the query step, which further makes the method easier to provide a pass rate of 84.4-98.9% for the solution.
The MIT-IBM team presents other challenges to their approach. They looked at the importance of each component of the solution, such as eliminating human feedback or solvers, and how the plan used the new dataset they created to adjust unsatisfactory queries in 10 or 20 iterations, created with UnscatchSristmas, which included not seeing constraints, and a modified version of TravelPlanner. On average, the MIT-IBM Group’s framework achieved 78.6 and 85% success and rose to 81.6 and 91.7% in the case of other program modifications. The researchers analyzed their ability to handle new, invisible constraints and interpreted query steps and step code hints. In both cases, it performed very well, especially with the 86.7% pass rate for the interpretation test.
Finally, MIT-IBM researchers applied their framework to other fields and through tasks such as block pickup, task assignment, travel salesman issues, and warehouses. Here, the method must select numbered colored blocks and maximize their scores; optimize robot task allocation for different schemes; minimize distances to schedule travel; and robot task completion and optimization.
“I think it’s a very powerful and innovative framework that can save a lot of time for humans, and it’s a very novel combination of LLM and solvers,” Hao said.
This work is funded in part by the Office of Naval Research and the MIT-IBM Watson AI Lab.