Hierarchical Reinforcement Learning Algorithms¶

Hierarchical Reinforcement Learning decomposes complex tasks into simpler subtasks using temporal abstraction and multi-level decision making.

Hierarchical Reinforcement Learning (HRL) extends traditional reinforcement learning by decomposing

complex tasks into simpler subtasks or options. This hierarchical structure enables agents to learn more efficiently by reusing learned skills and operating at multiple levels of abstraction, from high-level strategic planning to low-level control execution.

Unlike traditional RL where agents learn flat policies, HRL introduces temporal abstraction through options - temporally extended actions that can be executed over multiple time steps. This allows agents to operate at different time scales and reuse learned behaviors across different tasks, making them particularly powerful for complex, long-horizon problems.

Overview¶

Key Characteristics:

Temporal Abstraction

Actions that operate over different time scales from high-level strategy to low-level control
Skill Reuse

Learned behaviors that can be applied to new tasks and environments
Multi-level Decision Making

Hierarchical structure from strategy to control with different abstraction levels
Option Policies

Temporally extended actions over multiple time steps with initiation and termination conditions

Common Applications:

Robotics and Autonomous SystemsGame Playing and StrategyMulti-Agent CoordinationComplex Control SystemsNatural Language ProcessingComputer Vision and Perception

manipulation tasks
navigation
autonomous vehicles
humanoid robots

real-time strategy games
chess
poker
multi-player games

swarm robotics
distributed systems
cooperative games

industrial automation
smart grids
traffic management

dialogue systems
text generation
language understanding

object recognition
scene understanding
visual navigation

Key Concepts¶

Options

Temporally extended actions that can be executed over multiple time steps
Hierarchy

Multiple levels of decision-making from high-level strategy to low-level control
Skill Reuse

Learned behaviors that can be applied to new tasks and environments
Temporal Abstraction

Actions that operate over different time scales
Subgoal Decomposition

Breaking complex tasks into manageable subtasks
Policy Hierarchies

Layered policies operating at different abstraction levels
Initiation Set

States where an option can be started
Termination Function

Probability of option terminating in each state

Complexity Analysis¶

Complexity Overview

Time: O(n²) to O(n³) Space: O(n) to O(n²)

Complexity depends on hierarchy depth, option complexity, and state-action space size

Hierarchical StructuresLearning ApproachesOption Framework

Types of Hierarchies

Temporal Hierarchy:

High-level: Strategic decisions and goal setting
Mid-level: Skill selection and coordination
Low-level: Primitive actions and control

Spatial Hierarchy:

Global: Environment-wide planning
Regional: Local area navigation
Local: Immediate obstacle avoidance

Functional Hierarchy:

Planning: Long-term strategy
Navigation: Path finding and movement
Control: Actuator commands

Hierarchy Construction Methods

Predefined Hierarchies:

Human-designed task decomposition
Fixed skill libraries
Structured learning objectives

Learned Hierarchies:

Automatic task decomposition
Dynamic skill discovery
Adaptive abstraction levels

Hybrid Approaches:

Combine predefined and learned components
Incremental hierarchy construction
Skill refinement over time

Option Components

Initiation Set: States where the option can be started
Policy: How to behave while the option is executing
Termination Function: When to stop executing the option
Reward Function: How rewards are distributed during option execution

Option Properties: - Temporal Abstraction: Options can last multiple time steps - Reusability: Same option can be used in different contexts - Composability: Options can be combined to form complex behaviors

Comparison Table¶

Algorithm	Status	Time Complexity	Space Complexity	Difficulty	Applications
Hierarchical Q-Learning	📋 Planned	O(	S	×	G
Hierarchical Task Networks (HTNs)	📋 Planned	O(	T	×	S
Option-Critic	📋 Planned	O(batch_size × (option_policy_params + option_selection_params + termination_params))	O(batch_size × (state_size + option_size))	Medium	Robotics and Control, Game AI and Strategy
Hierarchical Actor-Critic (HAC)	📋 Planned	O(batch_size × (meta_actor_params + meta_critic_params + low_actor_params + low_critic_params))	O(batch_size × (state_size + subgoal_size))	Medium	Robotics and Control, Game AI and Strategy
Hierarchical Policy Gradient	📋 Planned	O(batch_size × (meta_params + low_params))	O(batch_size × (state_size + subgoal_size))	Medium	Robotics and Control, Game AI and Strategy
Feudal Networks (FuN)	📋 Planned	O(batch_size × (manager_params + worker_params))	O(batch_size × (state_size + goal_size))	Medium	Robotics and Control, Game AI and Entertainment

Algorithms in This Family¶

Hierarchical Q-Learning - Extends traditional Q-Learning to handle temporal abstraction and hierarchical task decomposition with multi-level Q-functions.
Hierarchical Task Networks (HTNs) - A hierarchical reinforcement learning approach that decomposes complex tasks into hierarchical structures of subtasks for planning and execution.
Option-Critic - A hierarchical reinforcement learning algorithm that learns options (temporally extended actions) end-to-end using policy gradient methods.
Hierarchical Actor-Critic (HAC) - An advanced hierarchical reinforcement learning algorithm that extends the actor-critic framework with temporal abstraction and hierarchical structure.
Hierarchical Policy Gradient - Extends traditional policy gradient methods to handle temporal abstraction and hierarchical task decomposition with multi-level policies.
Feudal Networks (FuN) - A hierarchical reinforcement learning algorithm that implements a manager-worker architecture for temporal abstraction and goal-based learning.

Implementation Status¶

Complete

0/6 algorithms (0%)
Planned

6/6 algorithms (100%)

Rl: HRL extends traditional reinforcement learning with hierarchical structure
Multi-Agent: HRL can be applied to multi-agent coordination and cooperation
Planning: HRL often incorporates planning algorithms for task decomposition
Neural-Networks: Deep HRL combines hierarchical structure with neural networks

References¶

Bacon, Pierre-Luc and Harb, Jean and Precup, Doina (2017). The Option-Critic Architecture. AAAI Press
Vezhnevets, Alexander Sasha and Osindero, Simon and Schaul, Tom and Heess, Nicolas and Jaderberg, Max and Silver, David and Kavukcuoglu, Koray (2017). FeUdal Networks for Hierarchical Reinforcement Learning. PMLR

Tags¶

Hierarchical RL Reinforcement learning with hierarchical structure

Reinforcement Learning Machine learning algorithms that learn through interaction

Algorithms General algorithmic concepts and implementations