Skip to content

Hierarchical Reinforcement Learning Algorithms

Hierarchical Reinforcement Learning decomposes complex tasks into simpler subtasks using temporal abstraction and multi-level decision making.

Hierarchical Reinforcement Learning (HRL) extends traditional reinforcement learning by decomposing

complex tasks into simpler subtasks or options. This hierarchical structure enables agents to learn more efficiently by reusing learned skills and operating at multiple levels of abstraction, from high-level strategic planning to low-level control execution.

Unlike traditional RL where agents learn flat policies, HRL introduces temporal abstraction through options - temporally extended actions that can be executed over multiple time steps. This allows agents to operate at different time scales and reuse learned behaviors across different tasks, making them particularly powerful for complex, long-horizon problems.

Overview

Key Characteristics:

  • Temporal Abstraction


    Actions that operate over different time scales from high-level strategy to low-level control

  • Skill Reuse


    Learned behaviors that can be applied to new tasks and environments

  • Multi-level Decision Making


    Hierarchical structure from strategy to control with different abstraction levels

  • Option Policies


    Temporally extended actions over multiple time steps with initiation and termination conditions

Common Applications:

  • manipulation tasks

  • navigation

  • autonomous vehicles

  • humanoid robots

  • real-time strategy games

  • chess

  • poker

  • multi-player games

  • swarm robotics

  • distributed systems

  • cooperative games

  • industrial automation

  • smart grids

  • traffic management

  • dialogue systems

  • text generation

  • language understanding

  • object recognition

  • scene understanding

  • visual navigation

Key Concepts

  • Options


    Temporally extended actions that can be executed over multiple time steps

  • Hierarchy


    Multiple levels of decision-making from high-level strategy to low-level control

  • Skill Reuse


    Learned behaviors that can be applied to new tasks and environments

  • Temporal Abstraction


    Actions that operate over different time scales

  • Subgoal Decomposition


    Breaking complex tasks into manageable subtasks

  • Policy Hierarchies


    Layered policies operating at different abstraction levels

  • Initiation Set


    States where an option can be started

  • Termination Function


    Probability of option terminating in each state

Complexity Analysis

Complexity Overview

Time: O(n²) to O(n³) Space: O(n) to O(n²)

Complexity depends on hierarchy depth, option complexity, and state-action space size

Types of Hierarchies

Temporal Hierarchy:

  • High-level: Strategic decisions and goal setting
  • Mid-level: Skill selection and coordination
  • Low-level: Primitive actions and control

Spatial Hierarchy:

  • Global: Environment-wide planning
  • Regional: Local area navigation
  • Local: Immediate obstacle avoidance

Functional Hierarchy:

  • Planning: Long-term strategy
  • Navigation: Path finding and movement
  • Control: Actuator commands

Hierarchy Construction Methods

Predefined Hierarchies:

  • Human-designed task decomposition
  • Fixed skill libraries
  • Structured learning objectives

Learned Hierarchies:

  • Automatic task decomposition
  • Dynamic skill discovery
  • Adaptive abstraction levels

Hybrid Approaches:

  • Combine predefined and learned components
  • Incremental hierarchy construction
  • Skill refinement over time

Option Components

  1. Initiation Set: States where the option can be started
  2. Policy: How to behave while the option is executing
  3. Termination Function: When to stop executing the option
  4. Reward Function: How rewards are distributed during option execution

Option Properties: - Temporal Abstraction: Options can last multiple time steps - Reusability: Same option can be used in different contexts - Composability: Options can be combined to form complex behaviors

Comparison Table

Algorithm Status Time Complexity Space Complexity Difficulty Applications
Hierarchical Q-Learning šŸ“‹ Planned O( S Ɨ G
Hierarchical Task Networks (HTNs) šŸ“‹ Planned O( T Ɨ S
Option-Critic šŸ“‹ Planned O(batch_size Ɨ (option_policy_params + option_selection_params + termination_params)) O(batch_size Ɨ (state_size + option_size)) Medium Robotics and Control, Game AI and Strategy
Hierarchical Actor-Critic (HAC) šŸ“‹ Planned O(batch_size Ɨ (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params)) O(batch_size Ɨ (state_size + subgoal_size)) Medium Robotics and Control, Game AI and Strategy
Hierarchical Policy Gradient šŸ“‹ Planned O(batch_size Ɨ (meta_params + low_params)) O(batch_size Ɨ (state_size + subgoal_size)) Medium Robotics and Control, Game AI and Strategy
Feudal Networks (FuN) šŸ“‹ Planned O(batch_size Ɨ (manager_params + worker_params)) O(batch_size Ɨ (state_size + goal_size)) Medium Robotics and Control, Game AI and Entertainment

Algorithms in This Family

  • Hierarchical Q-Learning - Extends traditional Q-Learning to handle temporal abstraction and hierarchical task decomposition with multi-level Q-functions.

  • Hierarchical Task Networks (HTNs) - A hierarchical reinforcement learning approach that decomposes complex tasks into hierarchical structures of subtasks for planning and execution.

  • Option-Critic - A hierarchical reinforcement learning algorithm that learns options (temporally extended actions) end-to-end using policy gradient methods.

  • Hierarchical Actor-Critic (HAC) - An advanced hierarchical reinforcement learning algorithm that extends the actor-critic framework with temporal abstraction and hierarchical structure.

  • Hierarchical Policy Gradient - Extends traditional policy gradient methods to handle temporal abstraction and hierarchical task decomposition with multi-level policies.

  • Feudal Networks (FuN) - A hierarchical reinforcement learning algorithm that implements a manager-worker architecture for temporal abstraction and goal-based learning.

Implementation Status

  • Complete


    0/6 algorithms (0%)

  • Planned


    6/6 algorithms (100%)

  • Rl: HRL extends traditional reinforcement learning with hierarchical structure

  • Multi-Agent: HRL can be applied to multi-agent coordination and cooperation

  • Planning: HRL often incorporates planning algorithms for task decomposition

  • Neural-Networks: Deep HRL combines hierarchical structure with neural networks

References

  1. Bacon, Pierre-Luc and Harb, Jean and Precup, Doina (2017). The Option-Critic Architecture. AAAI Press

  2. Vezhnevets, Alexander Sasha and Osindero, Simon and Schaul, Tom and Heess, Nicolas and Jaderberg, Max and Silver, David and Kavukcuoglu, Koray (2017). FeUdal Networks for Hierarchical Reinforcement Learning. PMLR

Tags

Hierarchical RL Reinforcement learning with hierarchical structure

Reinforcement Learning Machine learning algorithms that learn through interaction

Algorithms General algorithmic concepts and implementations