Skip to content

Hierarchical Actor-Critic

Hierarchical Actor-Critic (HAC)

An advanced hierarchical reinforcement learning algorithm that extends the actor-critic framework with temporal abstraction and hierarchical structure.

Family: Hierarchical Reinforcement Learning Status: ๐Ÿ“‹ Planned

Need Help Understanding This Algorithm?

๐Ÿค– Ask ChatGPT about Hierarchical Actor-Critic (HAC)

Overview

Hierarchical Actor-Critic (HAC) is an advanced reinforcement learning algorithm that extends the

actor-critic framework with temporal abstraction and hierarchical structure. The algorithm operates at multiple levels: a high-level meta-policy that selects subgoals or options, and low-level policies that execute actions to achieve these subgoals.

This hierarchical approach enables the agent to solve complex, long-horizon tasks by breaking them down into manageable subproblems. The meta-policy learns to sequence subgoals effectively, while the low-level policies learn to achieve specific subgoals efficiently. HAC is particularly powerful in domains where tasks have natural hierarchical structure, such as robotics manipulation, navigation, and game playing.

Mathematical Formulation

๐Ÿงฎ Ask ChatGPT about Mathematical Formulation

Problem Definition

Given:

  • State space: S
  • Subgoal space: G
  • Action space: A
  • Meta-policy: ฯ€_meta(g_t|s_t)
  • Low-level policy: ฯ€_low(a_t|s_t, g_t)
  • Meta-critic: V_meta(s_t)
  • Low-level critic: V_low(s_t, g_t)
  • Reward function: R(s,a,s')

Find hierarchical actor-critic policies that maximize expected cumulative reward:

ฯ€_h(a_t|s_t) = โˆ‘_{g_t} ฯ€_meta(g_t|s_t) ยท ฯ€_low(a_t|s_t, g_t)

Key Properties

Hierarchical Policy Decomposition

ฯ€_h(a_t|s_t) = โˆ‘_{g_t} ฯ€_meta(g_t|s_t) ยท ฯ€_low(a_t|s_t, g_t)

Hierarchical policy decomposes into meta and low-level components


Hierarchical Value Function

V_h(s_t) = E_{g_t ~ ฯ€_meta}[V_low(s_t, g_t)]

Value function decomposes into meta and low-level components


Hierarchical Advantage Function

A_h(s_t, g_t, a_t) = Q_h(s_t, g_t, a_t) - V_h(s_t)

Advantage function for hierarchical policy updates


Key Properties

๐Ÿ”‘ Ask ChatGPT about Key Properties

  • Temporal Abstraction


    High-level policies operate over longer time horizons

  • Subgoal Decomposition


    Complex tasks broken into manageable subproblems

  • Hierarchical Learning


    Policies and critics at different levels learn simultaneously

  • Transfer Learning


    Low-level policies can be reused across different tasks

Implementation Approaches

๐Ÿ’ป Ask ChatGPT about Implementation

Standard HAC implementation with meta and low-level actor-critic networks

Complexity:

  • Time: O(batch_size ร— (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params))
  • Space: O(batch_size ร— (state_size + subgoal_size))

Advantages

  • Combines benefits of actor-critic methods with hierarchical structure

  • Temporal abstraction enables learning at different time scales

  • Subgoal decomposition makes complex tasks manageable

  • Value functions reduce variance compared to pure policy gradient methods

Disadvantages

  • Requires careful coordination between multiple networks

  • Subgoal achievement detection can be challenging

  • Four networks increase complexity and training time

  • Hyperparameter tuning becomes more complex

Complete Implementation

The full implementation with error handling, comprehensive testing, and additional variants is available in the source code:

Complexity Analysis

๐Ÿ“Š Ask ChatGPT about Complexity

Time & Space Complexity Comparison

Approach Time Complexity Space Complexity Notes
Basic HAC O(batch_size ร— (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params)) O(batch_size ร— (state_size + subgoal_size)) Four-network architecture requires careful coordination and training

Use Cases & Applications

๐ŸŒ Ask ChatGPT about Applications

Application Categories

Robotics and Control

  • Robot Manipulation: Complex manipulation tasks with hierarchical subgoals

  • Autonomous Navigation: Multi-level navigation with waypoint subgoals

  • Industrial Automation: Process control with hierarchical objectives

  • Swarm Robotics: Coordinated behavior with hierarchical task decomposition

Game AI and Strategy

  • Strategy Games: Multi-level decision making with tactical and strategic goals

  • Puzzle Games: Complex puzzles broken into simpler subproblems

  • Adventure Games: Quest completion with hierarchical objectives

  • Simulation Games: Resource management with hierarchical planning

Real-World Applications

  • Autonomous Vehicles: Multi-level driving with navigation and control subgoals

  • Healthcare: Treatment planning with hierarchical medical objectives

  • Finance: Portfolio management with hierarchical investment strategies

  • Network Control: Traffic management with hierarchical routing policies

Educational Value

  • Hierarchical Learning: Understanding multi-level decision making

  • Actor-Critic Methods: Learning value functions and policies simultaneously

  • Temporal Abstraction: Understanding different time scales in learning

  • Transfer Learning: Learning reusable skills across different tasks

Educational Value

  • Hierarchical Learning: Perfect introduction to multi-level decision making

  • Actor-Critic Methods: Shows how to combine value functions and policies

  • Temporal Abstraction: Demonstrates learning at different time scales

  • Transfer Learning: Illustrates how skills can be reused across tasks

References & Further Reading

:material-library: Core Papers

:material-file-document:
Original Hierarchical Actor-Critic paper introducing HAC
:material-file-document:
Foundational work on hierarchical reinforcement learning

:material-book: Hierarchical RL Textbooks

:material-file-document:
Comprehensive introduction to reinforcement learning including hierarchical methods
:material-file-document:
Algorithms for reinforcement learning with hierarchical approaches

:material-web: Online Resources

:material-link:
Official HAC implementation repository
:material-link:
Wikipedia article on hierarchical reinforcement learning

:material-code-tags: Implementation & Practice

:material-link:
PyTorch deep learning framework documentation
:material-link:
RL environments for testing hierarchical algorithms
:material-link:
High-quality RL algorithm implementations

Interactive Learning

Try implementing the different approaches yourself! This progression will give you deep insight into the algorithm's principles and applications.

Pro Tip: Start with the simplest implementation and gradually work your way up to more complex variants.

Related Algorithms in Hierarchical Reinforcement Learning:

  • Hierarchical Q-Learning - Extends traditional Q-Learning to handle temporal abstraction and hierarchical task decomposition with multi-level Q-functions.

  • Hierarchical Task Networks (HTNs) - A hierarchical reinforcement learning approach that decomposes complex tasks into hierarchical structures of subtasks for planning and execution.

  • Option-Critic - A hierarchical reinforcement learning algorithm that learns options (temporally extended actions) end-to-end using policy gradient methods.

  • Hierarchical Policy Gradient - Extends traditional policy gradient methods to handle temporal abstraction and hierarchical task decomposition with multi-level policies.

  • Feudal Networks (FuN) - A hierarchical reinforcement learning algorithm that implements a manager-worker architecture for temporal abstraction and goal-based learning.