AutoToS: the Future of Self-Programming AI Systems

AI that codes its own solutions

Sep 17, 2024

In brief: an introduction to Automated Thought of Search through a simplified Vehicle Routing Problem, showcasing how AI generates its own solutions and refines them autonomously. Discover the power of AI that writes, tests, and improves its own code, revolutionizing complex problem-solving without direct human intervention in programming. The complete code is available on the GitHub repository autotos-vrp-example.

Introduction

In the rapidly evolving landscape of artificial intelligence, a groundbreaking approach is emerging: Automated Thought of Search (AutoToS). This innovative system represents a significant leap forward in AI's capability to solve complex problems by essentially programming itself. Today, we'll explore AutoToS through its application to the Vehicle Routing Problem (VRP), a classic optimization challenge in logistics and transportation.

Understanding the Vehicle Routing Problem (VRP)

Before diving into AutoToS, it's crucial to understand the problem it's solving in this context.

What is VRP?

The Vehicle Routing Problem is a complex optimization challenge in logistics. It involves finding the most efficient routes for a fleet of vehicles to deliver goods to a set of customers, minimizing the total distance traveled or cost incurred.

Key Components of VRP

Depot: The starting and ending point for all vehicles, represented as (0,0) in our coordinate system.
Customers: Points on a 2D plane, each represented by an (x, y) coordinate on a grid.
Routes: Sequences of points (including the depot) that a vehicle will follow.
State: A complete solution, consisting of routes for all vehicles.

Interpreting the Code Representation

In our implementation, we use the following data structures:

Point = Tuple[int, int]  # e.g., (1, 5) represents a customer at x=1, y=5
Route = List[Point]      # e.g., [(0,0), (1,5), (2,3), (0,0)] is a route
State = List[Route]      # e.g., [[(0,0), (1,5), (2,3), (0,0)], [(0,0), (5,1), (3,2), (0,0)]]

Let's break down a sample state:

state = [
    [(0,0), (1,5), (2,3), (0,0)],  # Route 1
    [(0,0), (5,1), (3,2), (0,0)]   # Route 2
]

This state represents:

Two vehicles (two routes)
The depot at (0,0)
Vehicle 1 visits customers at (1,5) and (2,3)
Vehicle 2 visits customers at (5,1) and (3,2)
Both vehicles start and end at the depot

Understanding this representation is crucial for grasping how AutoToS tackles the VRP.

What is AutoToS?

AutoToS is an AI-driven system that can generate, test, and refine its own code to solve specific problems. Unlike traditional programming methods where human developers write and debug code, AutoToS leverages large language models (LLMs) to create functional code, then automatically tests and improves this code until it meets predefined criteria.

AutoToS in Action: solving a simplified VRP

Now that we understand VRP, let's examine how AutoToS approaches solving it.

The Core Components of AutoToS

Problem Definition and Initial Prompt:

We start by defining the problem and the functions we need. Here's a snippet of the initial prompt:

base_prompt = """
    Implement two Python functions for the Vehicle Routing Problem (VRP):

    1. A 'goal' function that calculates the total cost of a state.
    2. A 'successor' function that generates all possible successor states.

    Use these definitions:

    - Point = Tuple[int, int]  # A point (x, y)
    - Route = List[Point]      # A route (list of points)
    - State = List[Route]      # A state (list of routes)

    The distance function is already provided:

    def distance(p1: Point, p2: Point) -> float:
        return math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2)

    Implement these functions:

    1. def goal(state: State) -> float:
       This function should calculate and return the total cost of all routes in the state.
       - For each route, sum the distances between each pair of consecutive points.
       - Sum the costs of all routes.

    2. def successor(state: State) -> List[State]:
       This function should generate and return all possible successor states.
       - Generate new states by swapping pairs of customers within the same route.
       - Generate new states by swapping pairs of customers between different routes.
       - Never swap the depot (0,0), which must always be the first and last point of each route.
       - All successors must be valid and unique states, without duplicates.

    Example state:
    state = [
        [(0,0), (1,5), (2,3), (0,0)],  # Route 1
        [(0,0), (5,1), (3,2), (0,0)]   # Route 2
    ]

    Provide only the function code, without additional explanations.
    """

The base_prompt instructs the AI to create two Python functions for solving the VRP:

A 'goal' function to calculate the total cost of a given state.
A 'successor' function to generate all possible next states.

It defines key data structures (Point, Route, State) and provides a distance function. The prompt specifies that the 'goal' function should sum distances between consecutive points in each route, while the 'successor' function should generate new states by swapping customers within and between routes, always keeping the depot (0,0) as the start and end point. It emphasizes generating only valid, unique successors and includes an example state for clarity.

LLM Function Generation

AutoToS uses an LLM (in this case, GPT-4o-mini) to generate the required functions based on the prompt:

def get_llm_functions(feedback=""):
    # ... [code to set up the prompt] ...

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant that writes Python code."},
            {"role": "user", "content": prompt}
        ]
    )

    return extract_python_code(response.choices[0].message.content)

Note: while in this example we use a GPT-4 model, the o1 models from OpenAI are very strong as function generators for this type of approach! The Sonnet and Opus models from Anthropic are equally impressive. Of course, the super-simplified version of VRP that we're considering in this example could also be handled by GPT-3.5-turbo.

Automated Testing

The system then tests the generated functions:

def test_vrp_functions(goal: callable, successor: callable) -> str:
    """
    Tests the 'goal' and 'successor' functions for the Vehicle Routing Problem (VRP).
    
    Args:
    - goal: The function to calculate the total cost of a state.
    - successor: The function to generate successor states.
    
    Returns:
    - An empty string if all tests pass, or an error message indicating the failed test.
    """
    
    test_state = [
        [(0,0), (1,5), (2,3), (0,0)],  # Route 1
        [(0,0), (5,1), (3,2), (0,0)]   # Route 2
    ]
    
    # Test the 'goal' function
    expected_cost = (
        distance((0,0), (1,5)) + distance((1,5), (2,3)) + distance((2,3), (0,0)) +
        distance((0,0), (5,1)) + distance((5,1), (3,2)) + distance((3,2), (0,0))
    )
    calculated_cost = goal(test_state)
    if not math.isclose(expected_cost, calculated_cost):
        return "Goal function test failed"
    
    # Test the 'successor' function
    successors = successor(test_state)
    expected_num_successors = calculate_expected_successors(test_state)
    print(f"Number of successors: {len(successors)}")
    if len(successors) != expected_num_successors:
        return f"Incorrect number of successors. Expected {expected_num_successors}, got {len(successors)}"
    
    # Verify that all successors are unique
    unique_successors = set(tuple(tuple(tuple(p) for p in route) for route in state) for state in successors)
    if len(unique_successors) != expected_num_successors:
        return f"Incorrect number of unique successors. Expected {expected_num_successors}, got {len(unique_successors)}"
    
    # Verify that the depot (0,0) is always the first and last point of each route
    for state in successors:
        for route in state:
            if route[0] != (0,0) or route[-1] != (0,0):
                return "Depot constraint violated"
    
    visualize_states(successors)
    
    print("All tests passed successfully!")

    return ""

The test_vrp_functions method validates the AI-generated 'goal' and 'successor' functions for the VRP. It uses a predefined test state and performs the following checks:

For the 'goal' function:
- Calculates an expected cost
- Compares it with the result from the AI-generated function
- Fails if the values don't match closely
For the 'successor' function:
- Verifies the correct number of successors are generated
- Checks that all successors are unique
- Ensures the depot (0,0) remains the first and last point in each route
- Visualizes a subset of the generated successors

The function returns an empty string if all tests pass, or an error message describing the failure. This testing process is crucial for ensuring the correctness and reliability of the AI-generated functions.

Iterative Refinements

If the tests fail, AutoToS provides feedback to the LLM and requests a new implementation:

def autotos():
    """
    Automatically generates and tests 'goal' and 'successor' functions for the VRP using a language model.
    """
    max_attempts = 5
    attempt = 0
    feedback = ""

    while attempt < max_attempts:
        print(f"\nAttempt {attempt + 1}:")
        llm_functions = get_llm_functions(feedback)

        # Execute the code generated by the language model
        try:
            exec(llm_functions, globals())
        except Exception as e:
            feedback = f"Error in code execution: {str(e)}"
            print(feedback)
            attempt += 1
            continue

        # Test the functions
        feedback = test_vrp_functions(goal, successor)
        
        if not feedback:
            print("Success! All tests passed.")
            break
        else:
            print(f"Test failed.\nPrevious wrong functions:\n{llm_functions}\nFeedback: {feedback}")
            attempt += 1

    if attempt == max_attempts:
        print("Maximum attempts reached. Could not generate correct functions.")

The autotos function is the core of the AutoToS system. It orchestrates the process of generating, testing, and refining VRP functions:

It iteratively calls get_llm_functions to obtain AI-generated code.
The crucial line exec(llm_functions, globals()) dynamically executes the AI-generated code in the global namespace. This step effectively 'brings to life' the AI-created functions, making them callable within the Python environment.
It then tests these functions using test_vrp_functions.
If tests fail, it provides feedback and requests new implementations.
This process repeats up to 5 times or until correct functions are generated.

The exec function is key here: it allows the system to seamlessly integrate AI-generated code into the running program, enabling real-time testing and refinement of the AI's solutions. This dynamic execution is what allows AutoToS to truly 'program itself' by turning AI-generated text into functional Python code.

The complete code is available on the GitHub repository autotos-vrp-example.

Note: it’s very important to clearly understand the role of successor states. In this regard, if you'd like to dive into more details, it might be helpful to take a look at the section "Understanding generated states: some important observations" at the end of this article.

The power and implications of AutoToS

Rapid Problem Solving: AutoToS can quickly generate and refine solutions to complex problems, potentially outpacing human programmers in certain tasks.
Adaptability: the system can easily adapt to different problem domains by modifying the initial prompt and test criteria.
Continuous Improvement: with each iteration, AutoToS learns from its mistakes, refining its approach to problem-solving.
Reduced Human Error: by automating the coding and testing process, AutoToS minimizes the risk of human-introduced bugs.
Scalability: this approach can be scaled to tackle a wide range of computational problems beyond VRP.

Challenges and Future Directions

While AutoToS represents a significant advancement, it's not without challenges:

Dependence on Quality of Initial Prompt: the system's success heavily relies on well-defined problem statements and test cases.
Computational Intensity: multiple iterations of generating and testing code can be computationally expensive.
Ethical Considerations: as AI systems become more capable of self-programming, questions arise about oversight and control.

Conclusion

AutoToS exemplifies the cutting edge of AI's capability to not just assist in coding, but to autonomously create and refine software solutions. Through our exploration of the Vehicle Routing Problem, we've seen how AutoToS can tackle complex optimization challenges, generating and improving its own code to find solutions.

This approach opens up new possibilities across numerous fields, from logistics and supply chain management to scientific computing and business process optimization. As we continue to develop and refine systems like AutoToS, we're moving closer to a future where AI can tackle complex programming tasks with minimal human intervention.

The journey of AutoToS is just beginning, and its evolution will undoubtedly play a crucial role in shaping the future of AI and computational problem-solving. By understanding both the underlying problem (VRP) and the innovative solution method (AutoToS), we gain insight into the powerful intersection of AI, optimization, and self-improving systems.

References

AutoToS is thoroughly introduced and explored in Automating Thought of Search: A Journey Towards Soundness and Completeness. Essentially, it presents an approach aimed at automating the generation of search components (successor and goal functions) for planning problems using LLMs. AutoToS builds on previous work from Thought of Search (ToS) but seeks to eliminate the need for human feedback in the process.

The code we examined is a practical implementation of AutoToS. Here’s how it connects to the key concepts from the article:

Automatic Code Generation: the system uses LLMs (such as GPT-4) to generate successor and goal functions for various planning domains (24 Game, BlocksWorld, Mini Crosswords, PrOntoQA, Sokoban).
Unit Testing and Automatic Feedback: instead of relying on human feedback, the system employs predefined unit tests to verify the correctness of the generated functions. This corresponds to the "goal unit tests", "successor unit tests," and "partial successor soundness test" in the code.
Iteration and Refinement: if the tests fail, the system provides feedback to the LLM and requests a new generation, following an iterative process until correct functions are obtained.
Soundness and Completeness Verification: the system aims to ensure that the generated functions are both sound (correct) and complete, as described in the article.
Various Test Domains: the experiment covers a variety of planning domains, aligning with the evaluation described in the article.
Automation of the Process: the entire process, from initial generation to verification and refinement, is automated, eliminating the need for direct human intervention.
Performance Evaluation: the code includes mechanisms to track the number of LLM calls and the accuracy of the generated functions, corresponding to the evaluation metrics discussed in the article.

Understanding generated states: some important observations

When we talk about the states generated by the 'successor' function in AutoToS, it's important to understand their role in the broader context of solving the Vehicle Routing Problem. Here's a more detailed explanation:

Nature of generated states: these states are intermediate steps in the problem-solving process, not final solutions. They represent different configurations of routes that vehicles could take.
Role in the search process: these states serve as potential paths that an optimization algorithm might explore. They're like different scenarios or possibilities that the algorithm considers.
Not necessarily optimal: just because a state is generated doesn't mean it's better than the previous one or close to the optimal solution. Some might be worse, some better, but all are valid according to the problem constraints.
Exploration vs. Exploitation: by generating these various states, the system allows for both exploration (trying significantly different route configurations) and exploitation (making small tweaks to promising routes).
Building blocks for optimization: these states form the foundation upon which optimization algorithms can work. By having a diverse set of possible states, the algorithm has a better chance of finding a path to the optimal solution.
Iterative improvement: in a full VRP solver, these states would be evaluated, compared, and used to guide the search towards better solutions iteratively.
Completeness of search: by generating all possible neighboring states, the successor function ensures that no potential improvement is overlooked, contributing to the completeness of the search process.

In essence, these states are not the end goal but rather the means by which an intelligent search algorithm can navigate the complex landscape of possible solutions in the VRP. They provide the raw material that optimization algorithms can work with to eventually find high-quality or optimal solutions.

Thanks for reading! This post is public so feel free to share it.

Lorenzo Toscano

Discussion about this post