Skip to main content
eScholarship
Open Access Publications from the University of California

Effects of Response Frequency Constraints on Learning in a Non-Stationary Multi-armed Bandit Task

Abstract

An n- armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest known probability of reinforcement) human choice in a trial-and-error learning problem. A different probability of reinforcement was assigned to each of eight response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse.  To differentially increase exploration, relative frequency thresholds were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option. The potential benefit of increased exploration in non-stationary environments was investigated by changing payoff probabilities so that the leanest options became the richest or the richest options became the leanest. On the average, forcing participants to explore at moderate to high levels always resulted in their earning less reinforcement, even when the payoffs changed.  This outcome may be due to humans’ natural level of exploration in our task being sufficiently high to create sensitivity to environmental dynamics.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View