Hierarchical Q-Learning
Hierarchical Q-Learning
Extends traditional Q-Learning to handle temporal abstraction and hierarchical task decomposition with multi-level Q-functions.
Family: Hierarchical Reinforcement Learning Status: 📋 Planned
Need Help Understanding This Algorithm?
Overview
Hierarchical Q-Learning extends the traditional Q-Learning framework to handle temporal abstraction
and hierarchical task decomposition. The algorithm learns Q-functions at multiple levels: a high-level Q-function that estimates the value of subgoals, and low-level Q-functions that estimate the value of actions given specific subgoals.
This hierarchical approach enables the agent to solve complex, long-horizon tasks by breaking them down into manageable subproblems. The high-level Q-function learns to sequence subgoals effectively, while the low-level Q-functions learn to achieve specific subgoals efficiently. Hierarchical Q-Learning is particularly powerful in domains where tasks have natural hierarchical structure, such as robotics manipulation, navigation, and game playing.
Mathematical Formulation¶
🧮 Ask ChatGPT about Mathematical Formulation
Problem Definition
Given:
- State space: S
- Subgoal space: G
- Action space: A
- Meta Q-function: Q_meta(s, g)
- Low-level Q-function: Q_low(s, g, a)
- Reward function: R(s,a,s')
Find hierarchical Q-functions that maximize expected cumulative reward:
Q_h(s_t, g_t, a_t) = Q_meta(s_t, g_t) + Q_low(s_t, g_t, a_t)
Key Properties
Hierarchical Q-Function Decomposition
Q_h(s_t, g_t, a_t) = Q_meta(s_t, g_t) + Q_low(s_t, g_t, a_t)
Q-function decomposes into meta and low-level components
Hierarchical Q-Learning Update
Q_h(s_t, g_t, a_t) ← Q_h(s_t, g_t, a_t) + α[r_t + γ max_{g'} Q_meta(s_{t+1}, g') - Q_h(s_t, g_t, a_t)]
Update rule for hierarchical Q-functions
Subgoal Selection
g_t = argmax_g Q_meta(s_t, g)
Subgoal selection based on meta Q-function
Key Properties¶
🔑 Ask ChatGPT about Key Properties
-
Temporal Abstraction
High-level Q-functions operate over longer time horizons
-
Subgoal Decomposition
Complex tasks broken into manageable subproblems
-
Hierarchical Learning
Q-functions at different levels learn simultaneously
-
Transfer Learning
Low-level Q-functions can be reused across different tasks
Implementation Approaches¶
💻 Ask ChatGPT about Implementation
Standard hierarchical Q-Learning with meta and low-level Q-tables
Complexity:
- Time: O(|S| × |G| × |A| × episodes)
- Space: O(|S| × |G| × |A|)
Advantages
-
Extends familiar Q-Learning framework to hierarchical settings
-
Temporal abstraction enables learning at different time scales
-
Subgoal decomposition makes complex tasks manageable
-
Transfer learning allows reuse of low-level Q-functions
Disadvantages
-
Requires discrete state-action spaces
-
Memory requirements grow with state and subgoal space sizes
-
Subgoal achievement detection can be challenging
-
Coordination between meta and low-level Q-functions is complex
Complete Implementation
The full implementation with error handling, comprehensive testing, and additional variants is available in the source code:
-
Main implementation with meta and low-level Q-functions:
src/algokit/hierarchical_rl/hierarchical_q_learning.py
-
Comprehensive test suite including convergence tests:
tests/unit/hierarchical_rl/test_hierarchical_q_learning.py
Complexity Analysis¶
📊 Ask ChatGPT about Complexity
Time & Space Complexity Comparison
Approach | Time Complexity | Space Complexity | Notes |
---|---|---|---|
Basic Hierarchical Q-Learning | O( | S | × |
Use Cases & Applications¶
🌍 Ask ChatGPT about Applications
Application Categories
Robotics and Control
-
Robot Manipulation: Complex manipulation tasks with hierarchical subgoals
-
Autonomous Navigation: Multi-level navigation with waypoint subgoals
-
Industrial Automation: Process control with hierarchical objectives
-
Swarm Robotics: Coordinated behavior with hierarchical task decomposition
Game AI and Strategy
-
Strategy Games: Multi-level decision making with tactical and strategic goals
-
Puzzle Games: Complex puzzles broken into simpler subproblems
-
Adventure Games: Quest completion with hierarchical objectives
-
Simulation Games: Resource management with hierarchical planning
Real-World Applications
-
Autonomous Vehicles: Multi-level driving with navigation and control subgoals
-
Healthcare: Treatment planning with hierarchical medical objectives
-
Finance: Portfolio management with hierarchical investment strategies
-
Network Control: Traffic management with hierarchical routing policies
Educational Value
-
Hierarchical Learning: Understanding multi-level decision making
-
Subgoal Decomposition: Learning to break complex tasks into simpler parts
-
Temporal Abstraction: Understanding different time scales in learning
-
Transfer Learning: Learning reusable skills across different tasks
Educational Value
-
Hierarchical Learning: Perfect introduction to multi-level decision making
-
Subgoal Decomposition: Shows how to break complex tasks into manageable parts
-
Temporal Abstraction: Demonstrates learning at different time scales
-
Transfer Learning: Illustrates how skills can be reused across tasks
References & Further Reading¶
:material-library: Core Papers
:material-book: Hierarchical RL Textbooks
:material-web: Online Resources
:material-code-tags: Implementation & Practice
Interactive Learning
Try implementing the different approaches yourself! This progression will give you deep insight into the algorithm's principles and applications.
Pro Tip: Start with the simplest implementation and gradually work your way up to more complex variants.
Need More Help? Ask ChatGPT!
Navigation¶
Related Algorithms in Hierarchical Reinforcement Learning:
-
Hierarchical Task Networks (HTNs) - A hierarchical reinforcement learning approach that decomposes complex tasks into hierarchical structures of subtasks for planning and execution.
-
Option-Critic - A hierarchical reinforcement learning algorithm that learns options (temporally extended actions) end-to-end using policy gradient methods.
-
Hierarchical Actor-Critic (HAC) - An advanced hierarchical reinforcement learning algorithm that extends the actor-critic framework with temporal abstraction and hierarchical structure.
-
Hierarchical Policy Gradient - Extends traditional policy gradient methods to handle temporal abstraction and hierarchical task decomposition with multi-level policies.
-
Feudal Networks (FuN) - A hierarchical reinforcement learning algorithm that implements a manager-worker architecture for temporal abstraction and goal-based learning.