Machine Learning Project

Deep reinforcement learning for autonomous lunar landing.

This project trains a Deep Q-Network on LunarLander-v3 to learn safe landing behavior through reward-driven interaction, balancing descent control, fuel efficiency, and stability.

Watch Demo Technical Highlights

Implemented a TensorFlow/Keras DQN agent with replay buffer and target network stabilization
Built configurable training workflows for experimentation, evaluation, and reproducibility
Packaged the work with scripts, saved weights, and a polished project website

Project Snapshot

500 training episodes

95% reported success rate

200+ reported average reward

4 environment actions

Core areas

Python
TensorFlow / Keras
Reinforcement Learning
Experiment Design
Model Evaluation
Technical Presentation

Project Goal

A compact reinforcement learning project with clear technical depth.

The goal is to train an agent that can make better landing decisions over time while providing a clean implementation of DQN concepts such as replay memory, exploration, and target network updates.

Technical Focus

DQN with stabilizing mechanisms and reusable tooling.

Experience replay, epsilon-greedy exploration, target network updates, and preset-based scripts make the implementation credible, explainable, and easy to demonstrate.

Animated Demo

A browser replay of a successful landing sequence.

Replay View

This animated replay shows the lander correcting its angle, reducing velocity, and touching down inside the landing zone.

Replay running

Replay Notes

Starts with a high descent rate and slight tilt away from center
Applies controlled corrections while converging toward the landing pad
Finishes with low vertical speed and aligned landing legs
Runs fully in the browser so the project page stays easy to share

Technical Breakdown

Designed to show implementation depth, not just a final score.

Agent Architecture

The Q-network uses a multilayer perceptron with hidden layers of 128, 128, and 64 units to estimate action values from the lander state.

Training Pipeline

The training loop stores transitions, samples minibatches, applies Bellman updates, and periodically syncs a target network for more stable learning.

Experiment Management

Centralized configuration presets make it easy to switch between faster iteration and longer, higher-quality training runs while keeping comparisons consistent.

Core Workflow

Observe the lander state from the environment.
Select an action with epsilon-greedy exploration.
Collect reward and next-state feedback.
Store experience in replay memory.
Train on sampled transitions and update target weights.

Technologies Used

Python
TensorFlow / Keras
Gymnasium
NumPy
Matplotlib

Results

Clear progress from unstable starts to controlled landings.

Performance Progression

Episode 1

-150

Episode 100

Episode 300

180

Episode 500

230

Why This Project Matters

This project goes beyond model training. It combines reinforcement learning concepts, disciplined implementation, measurable results, and a clear presentation that makes the work easy to review.

20-30 min End-to-end training time on a standard personal computer

Complete ML workflow Training, evaluation, checkpointing, and demo scripts in one workflow

Presentation-ready project Technical work presented through a polished, shareable project website

Technical Highlights

Key details that explain what the project includes.

Key Learnings

Reinforcement learning fundamentals and DQN training logic
Reusable modules, config presets, and CLI workflows
Evaluation scripts for demo and comparison runs
Presenting the system clearly is just as important as building it

Project Summary

This project implements a Deep Q-Network agent in Python and TensorFlow for LunarLander-v3, using experience replay, target network updates, configurable training presets, and evaluation scripts to achieve strong landing performance in a reproducible workflow.