Hierarchical Actor-Critic
Hierarchical Actor-Critic (HAC)
An advanced hierarchical reinforcement learning algorithm that extends the actor-critic framework with temporal abstraction and hierarchical structure.
Family: Hierarchical Reinforcement Learning Status: ๐ Planned
Need Help Understanding This Algorithm?
Overview
Hierarchical Actor-Critic (HAC) is an advanced reinforcement learning algorithm that extends the
actor-critic framework with temporal abstraction and hierarchical structure. The algorithm operates at multiple levels: a high-level meta-policy that selects subgoals or options, and low-level policies that execute actions to achieve these subgoals.
This hierarchical approach enables the agent to solve complex, long-horizon tasks by breaking them down into manageable subproblems. The meta-policy learns to sequence subgoals effectively, while the low-level policies learn to achieve specific subgoals efficiently. HAC is particularly powerful in domains where tasks have natural hierarchical structure, such as robotics manipulation, navigation, and game playing.
Mathematical Formulation¶
๐งฎ Ask ChatGPT about Mathematical Formulation
Problem Definition
Given:
- State space: S
- Subgoal space: G
- Action space: A
- Meta-policy: ฯ_meta(g_t|s_t)
- Low-level policy: ฯ_low(a_t|s_t, g_t)
- Meta-critic: V_meta(s_t)
- Low-level critic: V_low(s_t, g_t)
- Reward function: R(s,a,s')
Find hierarchical actor-critic policies that maximize expected cumulative reward:
ฯ_h(a_t|s_t) = โ_{g_t} ฯ_meta(g_t|s_t) ยท ฯ_low(a_t|s_t, g_t)
Key Properties
Hierarchical Policy Decomposition
ฯ_h(a_t|s_t) = โ_{g_t} ฯ_meta(g_t|s_t) ยท ฯ_low(a_t|s_t, g_t)
Hierarchical policy decomposes into meta and low-level components
Hierarchical Value Function
V_h(s_t) = E_{g_t ~ ฯ_meta}[V_low(s_t, g_t)]
Value function decomposes into meta and low-level components
Hierarchical Advantage Function
A_h(s_t, g_t, a_t) = Q_h(s_t, g_t, a_t) - V_h(s_t)
Advantage function for hierarchical policy updates
Key Properties¶
๐ Ask ChatGPT about Key Properties
-
Temporal Abstraction
High-level policies operate over longer time horizons
-
Subgoal Decomposition
Complex tasks broken into manageable subproblems
-
Hierarchical Learning
Policies and critics at different levels learn simultaneously
-
Transfer Learning
Low-level policies can be reused across different tasks
Implementation Approaches¶
๐ป Ask ChatGPT about Implementation
Standard HAC implementation with meta and low-level actor-critic networks
Complexity:
- Time: O(batch_size ร (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params))
- Space: O(batch_size ร (state_size + subgoal_size))
Advantages
-
Combines benefits of actor-critic methods with hierarchical structure
-
Temporal abstraction enables learning at different time scales
-
Subgoal decomposition makes complex tasks manageable
-
Value functions reduce variance compared to pure policy gradient methods
Disadvantages
-
Requires careful coordination between multiple networks
-
Subgoal achievement detection can be challenging
-
Four networks increase complexity and training time
-
Hyperparameter tuning becomes more complex
Complete Implementation
The full implementation with error handling, comprehensive testing, and additional variants is available in the source code:
-
Main implementation with meta and low-level actor-critic networks:
src/algokit/hierarchical_rl/hierarchical_actor_critic.py
-
Comprehensive test suite including convergence tests:
tests/unit/hierarchical_rl/test_hierarchical_actor_critic.py
Complexity Analysis¶
๐ Ask ChatGPT about Complexity
Time & Space Complexity Comparison
Approach | Time Complexity | Space Complexity | Notes |
---|---|---|---|
Basic HAC | O(batch_size ร (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params)) | O(batch_size ร (state_size + subgoal_size)) | Four-network architecture requires careful coordination and training |
Use Cases & Applications¶
๐ Ask ChatGPT about Applications
Application Categories
Robotics and Control
-
Robot Manipulation: Complex manipulation tasks with hierarchical subgoals
-
Autonomous Navigation: Multi-level navigation with waypoint subgoals
-
Industrial Automation: Process control with hierarchical objectives
-
Swarm Robotics: Coordinated behavior with hierarchical task decomposition
Game AI and Strategy
-
Strategy Games: Multi-level decision making with tactical and strategic goals
-
Puzzle Games: Complex puzzles broken into simpler subproblems
-
Adventure Games: Quest completion with hierarchical objectives
-
Simulation Games: Resource management with hierarchical planning
Real-World Applications
-
Autonomous Vehicles: Multi-level driving with navigation and control subgoals
-
Healthcare: Treatment planning with hierarchical medical objectives
-
Finance: Portfolio management with hierarchical investment strategies
-
Network Control: Traffic management with hierarchical routing policies
Educational Value
-
Hierarchical Learning: Understanding multi-level decision making
-
Actor-Critic Methods: Learning value functions and policies simultaneously
-
Temporal Abstraction: Understanding different time scales in learning
-
Transfer Learning: Learning reusable skills across different tasks
Educational Value
-
Hierarchical Learning: Perfect introduction to multi-level decision making
-
Actor-Critic Methods: Shows how to combine value functions and policies
-
Temporal Abstraction: Demonstrates learning at different time scales
-
Transfer Learning: Illustrates how skills can be reused across tasks
References & Further Reading¶
:material-library: Core Papers
:material-book: Hierarchical RL Textbooks
:material-web: Online Resources
:material-code-tags: Implementation & Practice
Interactive Learning
Try implementing the different approaches yourself! This progression will give you deep insight into the algorithm's principles and applications.
Pro Tip: Start with the simplest implementation and gradually work your way up to more complex variants.
Need More Help? Ask ChatGPT!
Navigation¶
Related Algorithms in Hierarchical Reinforcement Learning:
-
Hierarchical Q-Learning - Extends traditional Q-Learning to handle temporal abstraction and hierarchical task decomposition with multi-level Q-functions.
-
Hierarchical Task Networks (HTNs) - A hierarchical reinforcement learning approach that decomposes complex tasks into hierarchical structures of subtasks for planning and execution.
-
Option-Critic - A hierarchical reinforcement learning algorithm that learns options (temporally extended actions) end-to-end using policy gradient methods.
-
Hierarchical Policy Gradient - Extends traditional policy gradient methods to handle temporal abstraction and hierarchical task decomposition with multi-level policies.
-
Feudal Networks (FuN) - A hierarchical reinforcement learning algorithm that implements a manager-worker architecture for temporal abstraction and goal-based learning.