This document presents research on improving power grid management through reinforcement learning. The authors introduce a model-free approach using a masked action space that allows agents to learn optimal control strategies without requiring predefined expert knowledge. Different grid state representations are explored, including novel graph-based methods, and evaluated on their effectiveness in reducing power losses and enhancing grid stability. Experimental results using a simulated environment demonstrate the superior performance of their approach, particularly when incorporating graph observations and training against an opponent to improve robustness.