Hierarchical Actor-Critic

Hierarchical Actor-Critic (HAC)

An advanced hierarchical reinforcement learning algorithm that extends the actor-critic framework with temporal abstraction and hierarchical structure.

Family: Hierarchical Reinforcement Learning Status: 📋 Planned

Need Help Understanding This Algorithm?

🤖 Ask ChatGPT about Hierarchical Actor-Critic (HAC)

Overview

Hierarchical Actor-Critic (HAC) is an advanced reinforcement learning algorithm that extends the

actor-critic framework with temporal abstraction and hierarchical structure. The algorithm operates at multiple levels: a high-level meta-policy that selects subgoals or options, and low-level policies that execute actions to achieve these subgoals.

This hierarchical approach enables the agent to solve complex, long-horizon tasks by breaking them down into manageable subproblems. The meta-policy learns to sequence subgoals effectively, while the low-level policies learn to achieve specific subgoals efficiently. HAC is particularly powerful in domains where tasks have natural hierarchical structure, such as robotics manipulation, navigation, and game playing.

Mathematical Formulation¶

🧮 Ask ChatGPT about Mathematical Formulation

Problem Definition

Given:

State space: S
Subgoal space: G
Action space: A
Meta-policy: π_meta(g_t|s_t)
Low-level policy: π_low(a_t|s_t, g_t)
Meta-critic: V_meta(s_t)
Low-level critic: V_low(s_t, g_t)
Reward function: R(s,a,s')

Find hierarchical actor-critic policies that maximize expected cumulative reward:

π_h(a_t|s_t) = ∑_{g_t} π_meta(g_t|s_t) · π_low(a_t|s_t, g_t)

Key Properties

Hierarchical Policy Decomposition

π_h(a_t|s_t) = ∑_{g_t} π_meta(g_t|s_t) · π_low(a_t|s_t, g_t)

Hierarchical policy decomposes into meta and low-level components

Hierarchical Value Function

V_h(s_t) = E_{g_t ~ π_meta}[V_low(s_t, g_t)]

Value function decomposes into meta and low-level components

Hierarchical Advantage Function

A_h(s_t, g_t, a_t) = Q_h(s_t, g_t, a_t) - V_h(s_t)

Advantage function for hierarchical policy updates

Key Properties¶

🔑 Ask ChatGPT about Key Properties

Temporal Abstraction

High-level policies operate over longer time horizons
Subgoal Decomposition

Complex tasks broken into manageable subproblems
Hierarchical Learning

Policies and critics at different levels learn simultaneously
Transfer Learning

Low-level policies can be reused across different tasks

Implementation Approaches¶

💻 Ask ChatGPT about Implementation

Basic Hierarchical Actor-Critic (Recommended)

Standard HAC implementation with meta and low-level actor-critic networks

Complexity:

Time: O(batch_size × (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params))
Space: O(batch_size × (state_size + subgoal_size))

Advantages

Combines benefits of actor-critic methods with hierarchical structure
Temporal abstraction enables learning at different time scales
Subgoal decomposition makes complex tasks manageable
Value functions reduce variance compared to pure policy gradient methods

Disadvantages

Requires careful coordination between multiple networks
Subgoal achievement detection can be challenging
Four networks increase complexity and training time
Hyperparameter tuning becomes more complex

Complete Implementation

The full implementation with error handling, comprehensive testing, and additional variants is available in the source code:

Main implementation with meta and low-level actor-critic networks: src/algokit/hierarchical_rl/hierarchical_actor_critic.py
Comprehensive test suite including convergence tests: tests/unit/hierarchical_rl/test_hierarchical_actor_critic.py

Complexity Analysis¶

📊 Ask ChatGPT about Complexity

Time & Space Complexity Comparison

Approach	Time Complexity	Space Complexity	Notes
Basic HAC	O(batch_size × (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params))	O(batch_size × (state_size + subgoal_size))	Four-network architecture requires careful coordination and training

Use Cases & Applications¶

🌍 Ask ChatGPT about Applications

Application Categories

Robotics and Control

Robot Manipulation: Complex manipulation tasks with hierarchical subgoals
Autonomous Navigation: Multi-level navigation with waypoint subgoals
Industrial Automation: Process control with hierarchical objectives
Swarm Robotics: Coordinated behavior with hierarchical task decomposition

Game AI and Strategy

Strategy Games: Multi-level decision making with tactical and strategic goals
Puzzle Games: Complex puzzles broken into simpler subproblems
Adventure Games: Quest completion with hierarchical objectives
Simulation Games: Resource management with hierarchical planning

Real-World Applications

Autonomous Vehicles: Multi-level driving with navigation and control subgoals
Healthcare: Treatment planning with hierarchical medical objectives
Finance: Portfolio management with hierarchical investment strategies
Network Control: Traffic management with hierarchical routing policies

Educational Value

Hierarchical Learning: Understanding multi-level decision making
Actor-Critic Methods: Learning value functions and policies simultaneously
Temporal Abstraction: Understanding different time scales in learning
Transfer Learning: Learning reusable skills across different tasks

Educational Value

Hierarchical Learning: Perfect introduction to multi-level decision making
Actor-Critic Methods: Shows how to combine value functions and policies
Temporal Abstraction: Demonstrates learning at different time scales
Transfer Learning: Illustrates how skills can be reused across tasks

References & Further Reading¶

:material-library: Core Papers

:material-file-document:

Original Hierarchical Actor-Critic paper introducing HAC

:material-file-document:

Foundational work on hierarchical reinforcement learning

:material-book: Hierarchical RL Textbooks

:material-file-document:

Comprehensive introduction to reinforcement learning including hierarchical methods

:material-file-document:

Algorithms for reinforcement learning with hierarchical approaches

:material-web: Online Resources

:material-link:

Hierarchical Actor-Critic

Official HAC implementation repository

:material-link:

Hierarchical Reinforcement Learning

Wikipedia article on hierarchical reinforcement learning

:material-code-tags: Implementation & Practice

:material-link:

PyTorch Documentation

PyTorch deep learning framework documentation

:material-link:

OpenAI Gym

RL environments for testing hierarchical algorithms

:material-link:

Stable Baselines3

High-quality RL algorithm implementations

Interactive Learning

Try implementing the different approaches yourself! This progression will give you deep insight into the algorithm's principles and applications.

Pro Tip: Start with the simplest implementation and gradually work your way up to more complex variants.

Need More Help? Ask ChatGPT!

🧒 Explain Simply 📝 Practice Problems 🔀 Compare Algorithms 🐛 Debug Help

Related Algorithms in Hierarchical Reinforcement Learning:

Hierarchical Q-Learning - Extends traditional Q-Learning to handle temporal abstraction and hierarchical task decomposition with multi-level Q-functions.
Hierarchical Task Networks (HTNs) - A hierarchical reinforcement learning approach that decomposes complex tasks into hierarchical structures of subtasks for planning and execution.
Option-Critic - A hierarchical reinforcement learning algorithm that learns options (temporally extended actions) end-to-end using policy gradient methods.
Hierarchical Policy Gradient - Extends traditional policy gradient methods to handle temporal abstraction and hierarchical task decomposition with multi-level policies.
Feudal Networks (FuN) - A hierarchical reinforcement learning algorithm that implements a manager-worker architecture for temporal abstraction and goal-based learning.

Hierarchical Actor-Critic

Mathematical Formulation¶

Key Properties¶

Implementation Approaches¶

Complexity Analysis¶

Use Cases & Applications¶

References & Further Reading¶

:material-library: Core Papers

:material-book: Hierarchical RL Textbooks

:material-web: Online Resources

:material-code-tags: Implementation & Practice

Navigation¶