Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
-
Updated
Apr 17, 2024 - Python
Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
Implementation of a Deep Reinforcement Learning algorithm, Proximal Policy Optimization (SOTA), on a continuous action space openai gym (Box2D/Car Racing v0)
Policy Optimization with Penalized Point Probability Distance: an Alternative to Proximal Policy Optimization
[AAAI 2026] D²PPO: Diffusion Policy Policy Optimization with Dispersive Loss.
Mirror Descent Policy Optimization
Model-based Policy Gradients
Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (Moalla et al. 2024). Uses TorchRL and provides extensive tools for studying representation dynamics in policy optimization.
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
This repository contains the code for the paper "Local policy search with Bayesian optimization".
Reinforcement Learning (RL)! This repository is your hands-on guide to implementing RL algorithms, from Markov Decision Processes (MDPs) to advanced methods like PPO and DDPG. Build smart agents, learn the math behind policies, and experiment with real-world applications!
CPPO: Contrastive Perception for Vision Language Policy Optimization
Code for Policy Optimization as Online Learning with Mediator Feedback
Code accompanying the NeurIPS 2025 paper "Sequential Monte Carlo for Policy Optimization in Continuous POMDPs".
An implementation of the reinforcement learning for CartPole-v0 by policy optimization
This repo implements the REINFORCE algorithm for solving the Cart Pole V1 environment of the Gymnasium library using Python 3.8 and PyTorch 2.0.1.
“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”
Optimizing Household Waste Segregation in the Municipality of Bacolod, Lanao del Norte: An Agent-Based Modeling and Reinforcement Learning Approach
🛠️ Apply on-policy distillation to enhance Qwen3-0.6b's performance on GSM8K by learning from its own outputs, reducing bias during inference.
A collection of Jupyter notebooks implementing core reinforcement learning algorithms: Q-Learning, SARSA, and PPO.
Add a description, image, and links to the policy-optimization topic page so that developers can more easily learn about it.
To associate your repository with the policy-optimization topic, visit your repo's landing page and select "manage topics."