This academic paper presents Open CaptchaWorld, a novel benchmark dataset designed to assess the ability of multimodal AI agents to solve complex, multi-step CAPTCHAs encountered in real-world online environments. Unlike existing benchmarks that focus on static, single-turn tasks, Open CaptchaWorld emphasizes the interactive and dynamic nature of modern human verification puzzles. Through empirical analysis using the benchmark, the research demonstrates that while state-of-the-art multimodal models can handle basic visual tasks, they significantly lag behind human performance on challenges requiring more complex reasoning, fine-grained operations, or strategic understanding. The study highlights the current limitations of AI agents in tackling CAPTCHAs and provides insights for future development in this area.