Multiplayer Information Asymmetric Contextual Bandits is a novel framework in reinforcement learning that extends the classical single-player contextual bandit problem to a multiplayer setting. In this framework, multiple players each have their own set of actions and observe the same context vectors. They simultaneously take actions, resulting in a joint action, but face information asymmetry in actions and/or rewards. The framework introduces an algorithm called LinUCB, which is a modification of the classical LinUCB algorithm, to achieve optimal regret when only one kind of asymmetry is present. Additionally, a novel algorithm called ETC, based on explore-then-commit principles, is proposed to handle both types of asymmetry. This framework is particularly useful in applications such as advertising, healthcare, and finance, where multiple agents interact with the same environment but have different information.
LinUCB, ETC
Contextual Bandits
Custom datasets for contextual bandits
Regret, convergence rate
Research environments
No
Yes
Handles information asymmetry, multiplayer setting
No
Standard computing resources
Linux, Windows, macOS
Compatible with reinforcement learning frameworks
N/A
N/A
N/A
Yes
Research community
N/A
Varies based on application
Depends on model complexity
Standard for reinforcement learning
N/A
N/A
Assumes specific types of asymmetry
AI research, advertising, healthcare, finance
Multiplayer decision-making
Researchers
Integrates with RL frameworks
Scalable with number of players
Community support
N/A
Command-line
No
N/A
Open-source
Yes
Research institutions
N/A
N/A
N/A
Research tool
No
N/A
Open-source
0.00
N/A
Open-source
01/01/1970
01/01/1970
N/A
N/A
Yes