Showing posts with label Game Development. Show all posts
Showing posts with label Game Development. Show all posts

Sunday, October 20, 2024

Reinforcement Learning for Tic-Tac-Toe Using Q-Learning

You want to create and train an artificial agent to play **Tic-Tac-Toe** using **Reinforcement Learning**. Specifically, you will use a Q-Learning algorithm, where the agent learns to make optimal decisions by exploring different game states and learning from the rewards it gets. The game environment will provide feedback to the agent by updating the board with each move and indicating when a player has won or the game has ended in a draw. The agent will learn over many rounds (episodes) and gradually improve its performance in selecting the best moves.

### Solution:
1. **Environment Setup**: 
   - A custom Tic-Tac-Toe game environment is created. The game board is a 3x3 grid, initially empty. Two players take turns placing their respective markers ('X' and 'O').
   - The game tracks the current player and provides actions (placing a marker in an empty cell). The possible actions (placing in 9 different cells) are represented by numbers 0 to 8.
   - The environment checks for a winner after each move, and if no winner exists and the board is full, the game ends in a draw.
   - Each game step returns an observation (the board's state), a reward (positive for a win, negative for an invalid move), and a signal if the game is over (done).

2. **Q-Learning Agent**:
   - The agent’s job is to learn the optimal strategy for playing Tic-Tac-Toe. It does this by using a **Q-table**, where each possible board configuration (state) is mapped to a value representing the expected reward for taking each action (placing a marker in one of the 9 cells).
   - At the start, the agent explores different actions to learn the effects. As it gains experience, it balances between exploration (trying new actions) and exploitation (selecting actions based on what it has learned).
   - The Q-table is updated using the **Q-Learning update rule**, which uses feedback from each step (reward and next state) to adjust the action values.

3. **Training the Agent**:
   - The agent is trained over 10,000 episodes. In each episode, it plays a game of Tic-Tac-Toe by selecting actions (moves) based on the current state of the board.
   - At each step, the agent takes an action, receives feedback from the environment, and updates its Q-table based on the rewards.
   - As training progresses, the agent becomes better at identifying the best moves by using the Q-values stored in the Q-table. The exploration rate (chance of taking a random action) gradually decreases, meaning the agent increasingly exploits its learned knowledge to win games.

4. **Testing the Agent**:
   - After training, the agent is tested over a set number of games. During testing, the agent plays without much exploration, meaning it mainly uses the strategies it learned during training to win.
   - During the test games, the board is displayed after each move, and the result (win or loss) is printed at the end of the game.

### Key Concepts:
- **Exploration vs Exploitation**: In the beginning, the agent explores by making random moves to gather information. Over time, it starts exploiting its knowledge by choosing the best possible move based on what it has learned.
- **Q-Learning**: The Q-learning algorithm updates the value for each state-action pair based on the rewards received and the estimated value of future states. This helps the agent learn an optimal strategy for playing Tic-Tac-Toe.
- **Game Feedback**: Each game gives feedback in the form of rewards (positive for winning, negative for invalid moves) and the game status (ongoing, won, or draw), which the agent uses to adjust its strategy.

### Final Outcome:
After training, the agent learns to play Tic-Tac-Toe optimally. In the testing phase, it can play and display the game with significantly improved decision-making skills, increasing the likelihood of winning or drawing, depending on its opponent.

Thursday, August 15, 2024

Building a Tic-Tac-Toe Game: Code Structure and AI Strategies

Tic-Tac-Toe in Python – Complete Guide with AI Logic

๐ŸŽฎ Building a Tic-Tac-Toe Game in Python (With AI)

Let’s walk through how a simple Tic-Tac-Toe game evolves into something smarter—with AI that actually thinks.


๐Ÿ“š Table of Contents


๐Ÿš€ Game Initialization

The game begins with an empty board:

board = [" " for _ in range(9)]

The user selects player types:

start user medium
๐Ÿ‘‰ This defines who plays X and O.

๐Ÿ”„ Game Loop

The loop controls turn-taking:

  • X plays
  • O plays
  • Repeat until game ends

It ensures fairness and flow.


๐Ÿ‘ค Human Moves

The player inputs a position (0–8).

Validation checks:

  • Is the index valid?
  • Is the cell empty?
๐Ÿ‘‰ Prevents illegal moves and crashes.

๐Ÿค– AI Strategies

Easy AI

Random move:

random.choice(empty_cells)

Medium AI

  • Try to win
  • Block opponent
  • Pick center/corner

Hard AI

Uses advanced logic like Minimax.


๐Ÿ“ AI Decision Math (Minimax Simplified)

\[ Score = \max(\text{AI moves}) - \min(\text{Opponent moves}) \]

Explanation:

  • AI tries to maximize its score
  • Opponent tries to minimize it

Recursive evaluation:

\[ V(s) = \begin{cases} +1 & \text{if AI wins}\\ 0 & \text{draw}\\ -1 & \text{if opponent wins} \end{cases} \]

๐Ÿ‘‰ The AI simulates all possible futures before choosing a move.

๐Ÿ’ป Code Example

def is_winner(board, player): win_combinations = [ [0,1,2],[3,4,5],[6,7,8], [0,3,6],[1,4,7],[2,5,8], [0,4,8],[2,4,6] ] return any(all(board[i]==player for i in combo) for combo in win_combinations)

๐Ÿ–ฅ️ CLI Output

Click to Expand Gameplay
X | O | X
---------
O | X |  
---------
  | O | X

Result: X wins 

๐Ÿ Game End Logic

The game ends when:

  • A player wins
  • All cells are filled (draw)

This is checked after every move.


๐Ÿ’ก Key Takeaways

  • Game loop controls flow
  • AI difficulty changes behavior
  • Math (Minimax) powers smart decisions
  • Validation ensures stable gameplay

๐ŸŽฏ Final Thoughts

This Tic-Tac-Toe project may look simple, but it introduces powerful ideas—decision-making, AI strategy, and algorithmic thinking.

Once you understand this, you can scale to much more complex games.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts