<aside> 💡
Aa machine learning paradigm where an agent learns to make decisions in an environment to maximize a cumulative reward
</aside>
Reward Hacking