Lunar Lander AI

Machine Learning Project

Deep reinforcement learning for autonomous lunar landing.

This project trains a Deep Q-Network on LunarLander-v3 to learn safe landing behavior through reward-driven interaction, balancing descent control, fuel efficiency, and stability.

  • Implemented a TensorFlow/Keras DQN agent with replay buffer and target network stabilization
  • Built configurable training workflows for experimentation, evaluation, and reproducibility
  • Packaged the work with scripts, saved weights, and a polished project website
Project Snapshot
500 training episodes
95% reported success rate
200+ reported average reward
4 environment actions
Core areas

Python
TensorFlow / Keras
Reinforcement Learning
Experiment Design
Model Evaluation
Technical Presentation

Project Goal

A compact reinforcement learning project with clear technical depth.

The goal is to train an agent that can make better landing decisions over time while providing a clean implementation of DQN concepts such as replay memory, exploration, and target network updates.

Technical Focus

DQN with stabilizing mechanisms and reusable tooling.

Experience replay, epsilon-greedy exploration, target network updates, and preset-based scripts make the implementation credible, explainable, and easy to demonstrate.

Animated Demo

A browser replay of a successful landing sequence.

Replay View

This animated replay shows the lander correcting its angle, reducing velocity, and touching down inside the landing zone.

Replay running

Replay Notes

  • Starts with a high descent rate and slight tilt away from center
  • Applies controlled corrections while converging toward the landing pad
  • Finishes with low vertical speed and aligned landing legs
  • Runs fully in the browser so the project page stays easy to share

Technical Breakdown

Designed to show implementation depth, not just a final score.

Agent Architecture

The Q-network uses a multilayer perceptron with hidden layers of 128, 128, and 64 units to estimate action values from the lander state.

Training Pipeline

The training loop stores transitions, samples minibatches, applies Bellman updates, and periodically syncs a target network for more stable learning.

Experiment Management

Centralized configuration presets make it easy to switch between faster iteration and longer, higher-quality training runs while keeping comparisons consistent.

Core Workflow

  1. Observe the lander state from the environment.
  2. Select an action with epsilon-greedy exploration.
  3. Collect reward and next-state feedback.
  4. Store experience in replay memory.
  5. Train on sampled transitions and update target weights.

Technologies Used

  • Python
  • TensorFlow / Keras
  • Gymnasium
  • NumPy
  • Matplotlib

Results

Clear progress from unstable starts to controlled landings.

Performance Progression

Episode 1
-150
Episode 100
50
Episode 300
180
Episode 500
230

Why This Project Matters

This project goes beyond model training. It combines reinforcement learning concepts, disciplined implementation, measurable results, and a clear presentation that makes the work easy to review.

20-30 min End-to-end training time on a standard personal computer
Complete ML workflow Training, evaluation, checkpointing, and demo scripts in one workflow
Presentation-ready project Technical work presented through a polished, shareable project website

Technical Highlights

Key details that explain what the project includes.

Key Learnings

  • Reinforcement learning fundamentals and DQN training logic
  • Reusable modules, config presets, and CLI workflows
  • Evaluation scripts for demo and comparison runs
  • Presenting the system clearly is just as important as building it

Project Summary

This project implements a Deep Q-Network agent in Python and TensorFlow for LunarLander-v3, using experience replay, target network updates, configurable training presets, and evaluation scripts to achieve strong landing performance in a reproducible workflow.