A/B Testing Multi-Armed Bandit (MAB) & Reinforcement Learning Epsilon Greedy Upper Confidence Bound Thompson Sampling