This document introduces Nash Mirror Prox (NashMP), a novel algorithm designed to improve Large Language Model (LLM) alignment with human preferences. Traditional methods, often relying on Reinforcement Learning from Human Feedback (RLHF) and simplified preference models, struggle with complexities like intransitive human preferences. NashMP addresses this by framing the problem as finding a Nash equilibrium in a preference game, offering faster and more stable convergence compared to previous approaches like NashMD. The paper provides a rigorous theoretical analysis, demonstrating NashMP's linear convergence rates and practical implementation strategies for fine-tuning LLMs, showing competitive empirical performance against existing baselines.