Multi-Armed Bandit Problem
Brief Introduction
In probability theory, the multi-armed bandit problem is the problem a gambler faces at a row of slot machines when deciding which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls. In practice, multi-armed bandits have been used to model the problem of managing research projects in a large organization, like a science foundation or a pharmaceutical company. Given its fixed budget, the problem is to allocate resources among the competing projects, whose properties are only partially known now but may be better understood as time passes.
In probability theory, the multi-armed bandit problem is the problem a gambler faces at a row of slot machines when deciding which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls. In practice, multi-armed bandits have been used to model the problem of managing research projects in a large organization, like a science foundation or a pharmaceutical company. Given its fixed budget, the problem is to allocate resources among the competing projects, whose properties are only partially known now but may be better understood as time passes.
My Contribution
To better understand the problem, I reviewed some related papers in detail and simulated the algorithms they proposed. Then based on the idea of exploration and exploitation, I further designed my own algorithm for multi-armed bandit problem with multiple users, which took confidence interval as the threshold.
To better understand the problem, I reviewed some related papers in detail and simulated the algorithms they proposed. Then based on the idea of exploration and exploitation, I further designed my own algorithm for multi-armed bandit problem with multiple users, which took confidence interval as the threshold.