Hierarchical Q-Learning

Extends traditional Q-Learning to handle temporal abstraction and hierarchical task decomposition with multi-level Q-functions.

Family: Hierarchical Reinforcement Learning Status: 📋 Planned

Need Help Understanding This Algorithm?

🤖 Ask ChatGPT about Hierarchical Q-Learning

Overview

Hierarchical Q-Learning extends the traditional Q-Learning framework to handle temporal abstraction

and hierarchical task decomposition. The algorithm learns Q-functions at multiple levels: a high-level Q-function that estimates the value of subgoals, and low-level Q-functions that estimate the value of actions given specific subgoals.

This hierarchical approach enables the agent to solve complex, long-horizon tasks by breaking them down into manageable subproblems. The high-level Q-function learns to sequence subgoals effectively, while the low-level Q-functions learn to achieve specific subgoals efficiently. Hierarchical Q-Learning is particularly powerful in domains where tasks have natural hierarchical structure, such as robotics manipulation, navigation, and game playing.

Mathematical Formulation¶

🧮 Ask ChatGPT about Mathematical Formulation

Problem Definition

Given:

State space: S
Subgoal space: G
Action space: A
Meta Q-function: Q_meta(s, g)
Low-level Q-function: Q_low(s, g, a)
Reward function: R(s,a,s')

Find hierarchical Q-functions that maximize expected cumulative reward:

Q_h(s_t, g_t, a_t) = Q_meta(s_t, g_t) + Q_low(s_t, g_t, a_t)

Key Properties

Hierarchical Q-Function Decomposition

Q_h(s_t, g_t, a_t) = Q_meta(s_t, g_t) + Q_low(s_t, g_t, a_t)

Q-function decomposes into meta and low-level components

Hierarchical Q-Learning Update

Q_h(s_t, g_t, a_t) ← Q_h(s_t, g_t, a_t) + α[r_t + γ max_{g'} Q_meta(s_{t+1}, g') - Q_h(s_t, g_t, a_t)]

Update rule for hierarchical Q-functions

Subgoal Selection

g_t = argmax_g Q_meta(s_t, g)

Subgoal selection based on meta Q-function

Key Properties¶

🔑 Ask ChatGPT about Key Properties

Temporal Abstraction

High-level Q-functions operate over longer time horizons
Subgoal Decomposition

Complex tasks broken into manageable subproblems
Hierarchical Learning

Q-functions at different levels learn simultaneously
Transfer Learning

Low-level Q-functions can be reused across different tasks

Implementation Approaches¶

💻 Ask ChatGPT about Implementation

Basic Hierarchical Q-Learning (Recommended)

Standard hierarchical Q-Learning with meta and low-level Q-tables

Complexity:

Time: O(|S| × |G| × |A| × episodes)
Space: O(|S| × |G| × |A|)

Advantages

Extends familiar Q-Learning framework to hierarchical settings
Temporal abstraction enables learning at different time scales
Subgoal decomposition makes complex tasks manageable
Transfer learning allows reuse of low-level Q-functions

Disadvantages

Requires discrete state-action spaces
Memory requirements grow with state and subgoal space sizes
Subgoal achievement detection can be challenging
Coordination between meta and low-level Q-functions is complex

Complete Implementation

The full implementation with error handling, comprehensive testing, and additional variants is available in the source code:

Main implementation with meta and low-level Q-functions: src/algokit/hierarchical_rl/hierarchical_q_learning.py
Comprehensive test suite including convergence tests: tests/unit/hierarchical_rl/test_hierarchical_q_learning.py

Complexity Analysis¶

📊 Ask ChatGPT about Complexity

Time & Space Complexity Comparison

Approach	Time Complexity	Space Complexity	Notes
Basic Hierarchical Q-Learning	O(	S	×

Use Cases & Applications¶

🌍 Ask ChatGPT about Applications

Application Categories

Robotics and Control

Robot Manipulation: Complex manipulation tasks with hierarchical subgoals
Autonomous Navigation: Multi-level navigation with waypoint subgoals
Industrial Automation: Process control with hierarchical objectives
Swarm Robotics: Coordinated behavior with hierarchical task decomposition

Game AI and Strategy

Strategy Games: Multi-level decision making with tactical and strategic goals
Puzzle Games: Complex puzzles broken into simpler subproblems
Adventure Games: Quest completion with hierarchical objectives
Simulation Games: Resource management with hierarchical planning

Real-World Applications

Autonomous Vehicles: Multi-level driving with navigation and control subgoals
Healthcare: Treatment planning with hierarchical medical objectives
Finance: Portfolio management with hierarchical investment strategies
Network Control: Traffic management with hierarchical routing policies

Educational Value

Hierarchical Learning: Understanding multi-level decision making
Subgoal Decomposition: Learning to break complex tasks into simpler parts
Temporal Abstraction: Understanding different time scales in learning
Transfer Learning: Learning reusable skills across different tasks

Educational Value

Hierarchical Learning: Perfect introduction to multi-level decision making
Subgoal Decomposition: Shows how to break complex tasks into manageable parts
Temporal Abstraction: Demonstrates learning at different time scales
Transfer Learning: Illustrates how skills can be reused across tasks

References & Further Reading¶

:material-library: Core Papers

:material-file-document:

Foundational work on hierarchical reinforcement learning with MAXQ decomposition

:material-file-document:

Hierarchical reinforcement learning with value function decomposition

:material-book: Hierarchical RL Textbooks

:material-file-document:

Comprehensive introduction to reinforcement learning including hierarchical methods

:material-file-document:

Algorithms for reinforcement learning with hierarchical approaches

:material-web: Online Resources

:material-link:

Hierarchical Reinforcement Learning

Wikipedia article on hierarchical reinforcement learning

:material-link:

MAXQ Value Function Decomposition

Original MAXQ paper and implementation details

:material-code-tags: Implementation & Practice

:material-link:

OpenAI Gym

RL environments for testing hierarchical algorithms

:material-link:

Stable Baselines3

High-quality RL algorithm implementations

:material-link:

Ray RLlib

Scalable RL library for production use

Interactive Learning

Try implementing the different approaches yourself! This progression will give you deep insight into the algorithm's principles and applications.

Pro Tip: Start with the simplest implementation and gradually work your way up to more complex variants.

Need More Help? Ask ChatGPT!

🧒 Explain Simply 📝 Practice Problems 🔀 Compare Algorithms 🐛 Debug Help

Related Algorithms in Hierarchical Reinforcement Learning:

Hierarchical Task Networks (HTNs) - A hierarchical reinforcement learning approach that decomposes complex tasks into hierarchical structures of subtasks for planning and execution.
Option-Critic - A hierarchical reinforcement learning algorithm that learns options (temporally extended actions) end-to-end using policy gradient methods.
Hierarchical Actor-Critic (HAC) - An advanced hierarchical reinforcement learning algorithm that extends the actor-critic framework with temporal abstraction and hierarchical structure.
Hierarchical Policy Gradient - Extends traditional policy gradient methods to handle temporal abstraction and hierarchical task decomposition with multi-level policies.
Feudal Networks (FuN) - A hierarchical reinforcement learning algorithm that implements a manager-worker architecture for temporal abstraction and goal-based learning.

Hierarchical Q-Learning

Mathematical Formulation¶

Key Properties¶

Implementation Approaches¶

Complexity Analysis¶

Use Cases & Applications¶

References & Further Reading¶

:material-library: Core Papers

:material-book: Hierarchical RL Textbooks

:material-web: Online Resources

:material-code-tags: Implementation & Practice

Navigation¶