SafeRL-Shield

A neural network-based safety shield framework for safe reinforcement learning in hazardous environments.

Overview

SafeRL-Shield implements a multi-class risk classification system that acts as a protective shield for Q-learning agents navigating environments with safety hazards. The project demonstrates how deep learning can proactively prevent reinforcement learning agents from taking dangerous actions while maintaining task performance.

Key Features

Multi-Class Risk Assessment

4-level danger classification (immediate hazard, 1-step away, 2-steps away, safe)
Trained neural network predicts risk for each possible action
Real-time action filtering based on configurable risk thresholds

Adaptive Safety Shield

Intervenes only when necessary to block dangerous actions
Selects safest alternative while maximizing expected reward
Tunable safety-performance trade-off via threshold parameter θ

Comprehensive Evaluation

Compares baseline unsafe Q-learning with shielded variants
Tracks safety violations, interventions, and cumulative rewards
Visualizes danger maps and confusion matrices for interpretability

Highlights

Zero-violation Learning: Shield with θ=0 achieves perfect safety (0 hazard collisions) while maintaining reward performance
Minimal Intervention: Safety is guaranteed with minimal agent autonomy loss - only critical actions are overridden
Stratified Training: 70/15/15 train-val-test split with class balancing ensures robust risk predictions
End-to-End Pipeline: From environment setup to danger mapping, dataset generation, shield training, and safe RL integration

Technical Stack

Environment: Custom SafeGridWorld with 15 hazard cells in 10×10 grid
RL Algorithm: Tabular Q-learning with ε-greedy exploration
Safety Model: 2-layer fully-connected neural network (64 hidden units)
Framework: TensorFlow/Keras for deep learning, NumPy for RL implementation

Results

In 1000-episode training runs with random starts:

Baseline Q-Learning: ~127 safety violations
Shield θ=0: 0 violations, comparable reward, <50 interventions
Shield θ=2: 0 violations, degraded reward, excessive interventions (>500)

The optimal configuration (θ=0) demonstrates that safety and performance are not mutually exclusive when intelligent shielding is applied.

Project Structure

├── sample.ipynb          # Main implementation notebook
├── data/
│   ├── best_qtable.pkl          # Best Q-learning policy
│   ├── completedataset.pkl      # Training/validation/test data
│   ├── shield_model.h5          # Trained safety shield
│   ├── shield_metrics.pkl       # Model evaluation metrics
│   └── task5_results.pkl        # Safe RL experiment results
└── README.md

Quick Start

Run the cells in sample.ipynb sequentially:

Task 1: Initialize SafeGridWorld environment
Task 2: Train baseline Q-learning agents
Task 3: Generate danger maps and safety dataset
Task 4: Train multi-class risk classifier (shield)
Task 5: Integrate shield with Q-learning and evaluate

Applications

This framework can be extended to:

Robotics navigation in dynamic environments
Autonomous vehicle control with safety constraints
Medical treatment optimization with risk bounds
Industrial process control with failure prevention

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
anaconda_projects/db		anaconda_projects/db
data		data
models		models
README.md		README.md
sample.ipynb		sample.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SafeRL-Shield

Overview

Key Features

Highlights

Technical Stack

Results

Project Structure

Quick Start

Applications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SafeRL-Shield

Overview

Key Features

Highlights

Technical Stack

Results

Project Structure

Quick Start

Applications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages