<aside> 💡

Aa machine learning paradigm where an agent learns to make decisions in an environment to maximize a cumulative reward

</aside>

Common Pitfalls:

Reward Hacking