Multi-Armed Bandit Testbed

Interactive simulation — Sutton & Barto, Sections 2.3–2.8

Average Reward Over Time

% Optimal Action Over Time

Cumulative Average Reward

Final Average Reward Comparison

Scroll to zoom · Drag to pan · Shift+drag to select region · Double-click to reset
Based on Reinforcement Learning: An Introduction (2nd ed.) by Sutton & Barto