Making tic-tac-toe interesting with AI

Lachlan Gibson
Lachlan Gibson
, updated
Player X's turn
Precise ↔ Chaotic

Tic-tac-toe, also known as noughts and crosses, is a simple two-player game played on a 3x3 grid. Each player alternates marking a space in the grid with their symbol: the first player uses 'X' and the second 'O'. The goal is to be the first to get three of their symbols along a row, column, or diagonal. The game is a draw if all nine squares are filled and no player has achieved this goal. Under these rules, tic tac toe is a perfect information zero-sum game. This page includes my implementation of tic-tac-toe above where both X and O can be optionally controlled by a custom designed AI algorithm. I explain below how I designed the AI to work with varying degrees of proficiency, and how this feature can inform on how optimal strategies can depend on the skill level of each player.

It is possible to calculate an upper bound on the total number of achievable unique board states as 39=196833^9=19683, since there are nine squares, each with three possible states: empty, X, or O. However, most of these states are unreachable, such as those where two players have three in a row simultaneously, or where player O has taken more turns than player X. Therefore, the actual number of unique accessible board states is only 54785478. This means a brute force algorithm which explores all possibilities when calculating the optimal move for each board state is feasible. A typical approach to solving a two-player zero-sum game is finding the Nash equilibrium using the minimax algorithm, which identifies the optimal move for each player, assuming the opponent also plays optimally. This involves recursively exploring all possible moves and their outcomes, with player X aiming to maximise the score and player O aiming to minimise it. Scores are assigned to each board state: a win for X as +1+1, a win for O as 1-1, and a draw as 00.

In tic-tac-toe, both players choosing moves this way always results in a draw. However, optimal play by one player can depend on the other player's strategy, and many human players do not choose moves optimally. To make the game more interesting, I implemented an AI using a softmax-based probabilistic variant of the minimax algorithm, where the AI considers actions as if both players take them randomly, with probabilities based on the expected scores. More specifically, the log-probability of a move depends linearly on its expected score, meaning the probability of a move is calculated using the softmax function with a temperature parameter. This means that moves with higher expected scores are more likely to be chosen, but not every time. The temperature parameter controls the randomness of the AI's play. Lower temperatures mean the AI is more likely to choose the best move, while higher temperatures makes all moves more equally likely.

To compute the expected resulting score of a move, the AI needs to predict how its opponent will play. Therefore, the AI actually utilises two temperature parameters, one for choosing its own moves, and the other for how it models the other player's strategy. Interpreting temperature as a kind of proxy for skill, then computing expected scores with two paramters in this way lets the algorithm account for the different 'skill' levels of each player. Given a starting board state, the algorithm for computing the expected score works as follows:

  1. Check if the game is over. If so, then return 1 if X won, -1 if O won, or 0 if the game is a tie.
  2. Identify all available moves (all empty cells).
  3. Recursively compute expected scores of each move based on player temperatures.
  4. Compute probabilities of each move using the softmax function.
  5. Return the average score weighted by their respective probabilities.

Let's get into the details of the mathematics of the algorithm. For a vector x\mathbf{x} with components xix_i and a temperature TT, the softmax function is

Softmax(xi,T)=exi/Tjexj/T.\text{Softmax}(x_i, T) = \frac{e^{x_i / T}}{\sum_j e^{x_j / T}}.

This function normalises the vector so that all probabilitis are strictly positive and sum to 1 (try prove it yourself by adding the components). The probability, PP, that player, P\mathcal{P}, chooses to move to the board state, CC from the board state BB is

CC(B), P(CP)=Softmax(PS(C,P),TP),\forall C\in\mathcal{C}(B),~ P(C|\mathcal{P})=\text{Softmax}(\mathcal{P}S(C,-\mathcal{P}), T_\mathcal{P}),

where C(B)\mathcal{C}(B) is the set of all board states that can be reached in one move from BB, P=+1\mathcal{P}=+1 when it is player X's turn, and P=1\mathcal{P}=-1 when it is player O's turn. SS is the expected score given a board state and whos turn it is not. If the game is over then the score is +1+1 if X wins, 1-1 if O wins, and 00 if it is a draw. Otherwise, we can calculate the expected score as

S(B,P)=CC(B)P(CP)S(C,P)S(B,\mathcal{P})=\sum_{C\in\mathcal{C}(B)}P(C|\mathcal{P})S(C,-\mathcal{P})

These two expressions provide a recursive relationship for calculating the expected score of any board state. The AI uses this recursive function to compute the expected scores of all its available next moves. It then uses the softmax function to convert these scores into probabilities and randomly chooses a move based on those probabilities. The figure below illustrates an exmaple where player X is playing less randomly than player O with T1=0.2T_1=0.2 and T1=1T_{-1}=1.

0.9813XXOOXO0.26892.52%XXOOXXO197.46%XXXOOXO-0.73110.02%XXOOXOX073.11%XXOOOXXO126.89%XXOOXXOO026.89%XXOOOXOX-173.11%XXOOOXOX0100%XXOOOXXOX1100%XXXOOXXOO0100%XXOOOXXOX
The top board is the initial board state and it is X's turn. The expected score of each board state is shown above each baord. The probability of each move is shown as a percentage above each node in the tree. X, with a temperature of 0.2, is more likely to choose better moves than O which chooses with a temperature of 1.

In this example, if X is controlled by the AI, then it has a 97.46% chance of picking the winning move immediately. It also has a 0.02% chance of choosing the bottom right corner which would allow O to win next turn. However, O has a high temperature of 1, so it still has a 26.89% chance of not choosing that winning move.

I mentioned before that the temperature parameter can be interpreted as a kind of proxy for skill. To illustrate this, I simulated games between to AI controlled players with varying temperatures. For each combination of temperatures, I simulated 1000 games and estimated the win probabilities of each player. This produced some interesting results, which are illustrated in the figure below.

Estimated win probabilities of each player depending on TOT_O when TX=1T_X=1 (left) and TX=0.1T_X=0.1 (right). Probabilities were estimated by simulating 1000 games for each combination of temperatures.

There are two key take-aways from these results. Firstly, player X has a large advantage by going first. Secondly, players do perform better when they play with a lower temperature. When X plays with a temperature of 1 then there is approximately a 20% chance of a draw regardless of O's temperature. In this case, if O plays with a low temperature, then X will almost never win. However, if O plays with a temperature of 1 also, then X has more than 50% chance of winning. Both players have an equal chance of winning when O plays with a temperature of about 0.5. When X plays with a temperature of 0.1, then the probability of O winning is almost zero regardless of O's temperature. This is because X is almost always choosing the best move. When O's temperature is low, the game usually results in a draw, but when O's temperature is high, X wins roughly 90% of the time.

One thing I found interesting was how the optimal strategy depends on both temperature parameters. For example, at the start of the game, X has three options: choosing a corner, the centre, or an edge. The figures below illustrate how the expected scores of these moves depend on the player temperatures.

Expected final scores when X starts on a corner, the centre or an edge, depending on TOT_O when TX=1T_X=1 (left) and TX=0.1T_X=0.1 (right). If player X plays with a high temperature then their best first move is always the centre, but if they play with a low temperature then their best first move is usually a corner.

If X plays with a high temperature then their best first move is always the centre, probably because it allows for the most possible winning options. However, if X plays with a low temperature then their best first move is usually a corner, probably because it can guarentee a win if O does not subsequently choose the centre. When O plays with a high temperature then the expected score is slightly higher when X chooses the centre.

Below is my Python code to compute these results.

import matplotlib.pyplot as plt
import numpy as np

_scores = {}


def score(board_state, player, temperatures, return_probs=False):
    pos_str = stringify_board(board_state)
    if not return_probs and pos_str in _scores:
        return _scores[pos_str]

    game_over, winner = check_gameover(board_state)
    if game_over:
        _scores[pos_str] = winner
        return _scores[pos_str]
    moves = get_moves(board_state)
    values = np.array(
        [
            score(make_move(board_state, player, move), -player, temperatures)
            for move in moves
        ]
    )
    probs = softmax(player * values, temperatures[player])
    _scores[pos_str] = np.dot(values, probs)
    if return_probs:
        return moves, values, probs
    return _scores[pos_str]


def stringify_board(board_state):
    return "\n".join("".join("_XO"[cell] for cell in row) for row in board_state)


def softmax(z, T):
    z = np.array(z)
    e_z = np.exp(z / T)
    return e_z / e_z.sum(axis=0)


def get_moves(board_state):
    return list(zip(*np.where(board_state == 0)))


def make_move(board_state, player, move):
    new_board = board_state.copy()
    new_board[move] = player
    return new_board


def check_gameover(board_state):
    sums = np.stack(
        [
            *board_state,  # Rows
            *board_state.T,  # Columns
            board_state.diagonal(),  # Main diagonal
            np.fliplr(board_state).diagonal(),  # Anti-diagonal
        ]
    ).sum(1)
    if 3 in sums:
        return (True, 1)
    if -3 in sums:
        return (True, -1)
    return (0 not in board_state, 0)


def print_probs(board_state, player, temperatures):
    move_scores = score(board_state, player, temperatures, return_probs=True)
    board_score = score(board_state, player, temperatures)
    print("Current board: {0:.4f}".format(board_score))
    print(stringify_board(board_state))
    for m, v, p in zip(*move_scores):
        print("Move [{0},{1}]: {2:.4f}, {3:.4f}%".format(*m, v, p * 100))


def simulate_game(temperatures, print_game=False):
    board = np.zeros([3, 3], dtype=int)
    player = 1
    game_over = False
    while not game_over:
        moves, _, probs = score(board, player, temperatures, return_probs=True)
        move = moves[np.random.choice(len(moves), p=probs)]
        board = make_move(board, player, move)
        game_over, winner = check_gameover(board)
        player = -player
        if print_game:
            print("\n" + stringify_board(board))
    return winner


def count_wins(temperatures, trials):
    X, O = 0, 0
    for t in range(trials):
        winner = simulate_game(temperatures)
        if winner > 0:
            X += 1
        elif winner < 0:
            O += 1
    return X / trials, O / trials, (trials - X - O) / trials


# print example
board = np.array([[1, 1, 0], [-1, -1, 0], [1, -1, 0]])
print_probs(board, 1, {1: 0.2, -1: 1})

# plot win probabilities via simulation
temps = np.arange(0.1, 1.1, 0.1)
num_trials = 1000
np.random.seed(745)
for T in [0.1, 1]:
    counts = []
    for t in temps:
        _scores = {}
        counts.append(count_wins({1: T, -1: t}, num_trials))
    plt.figure(figsize=[4, 3])
    plt.plot(temps, counts, label=["X Win", "O Win", "Tie"])
    plt.axis([0, 1.1, 0, 1])
    plt.xlabel("$T_O$")
    plt.ylabel("Probability")
    plt.title(f"Win Estimates with $T_X$={T}")
    plt.legend()
    plt.savefig(f"p_estimates_T{T}.svg", bbox_inches="tight")

# plot expected scores based on first move
board_corner = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
board_centre = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]])
board_edge = np.array([[0, 1, 0], [0, 0, 0], [0, 0, 0]])
for T in [0.1, 1]:
    scores = []
    for t in temps:
        _scores = {}
        scores.append(
            [
                score(board_corner, -1, {1: T, -1: t}),
                score(board_centre, -1, {1: T, -1: t}),
                score(board_edge, -1, {1: T, -1: t}),
            ]
        )
    plt.figure(figsize=[4, 3])
    plt.plot(temps, scores, label=["Corner", "Centre", "Edge"])
    plt.axis([0, 1.1, -1, 1])
    plt.grid()
    plt.xlabel("$T_O$")
    plt.ylabel("Score")
    plt.title(f"Expected Scores with $T_X$={T}")
    plt.legend(loc="lower right")
    plt.savefig(f"scores_T{T}.svg", bbox_inches="tight")