Reinforcement learning has shown promise in developing autonomous agents capable of complex decision-making. Nonetheless, traditional reinforcement learning methods often operate in environments without constraints, resulting in unsafe or trivial behavior in realworld scenarios. This thesis investigates the development and evaluation of constrained reinforcement learning algorithms to enhance safety, performance, and reliability. Agents were trained using Q-learning in the Cliffwalking-v0 environment and Deep Q-network (DQN) in a customized Cartpole-v1 environment, navigating both discrete and continuous action spaces. Q-learning employed reward penalties, while DQN used deep constrained Q-learning to avoid unsafe actions. Performing multiple tests with different seeds indicated that constrained agents experienced lower Q-value and temporal difference losses, as well as fewer constraint violations, compared to unconstrained counterparts. Although constrained agents initially sacrificed some immediate rewards, they illustrated more consistent and safer behavior throughout training, ultimately achieving comparable or superior overall effectiveness.