Scaling AI inference with open source ft. Brian Stevens

Author: Red Hat
Published: Wed 04 Jun 2025
Episode Link: https://www.redhat.com/en/technically-speaking/scaling-AI-inference

Explore the future of enterprise AI with Red Hat's SVP and AI CTO, Brian Stevens. In this episode, we delve into how AI is being practically reimagined for real-world business environments, focusing on the pivotal shift to production-quality inference at scale and the transformative power of open source.

Brian Stevens shares his expertise and unique perspective on:

• The evolution of AI from experimental stages to essential, production-ready enterprise solutions.
• Key lessons from the early days of enterprise Linux and their application to today’s AI inference challenges.
• The critical role of projects like vLLM in optimizing AI models and creating a common, efficient inference stack for diverse hardware.
• Innovations in GPU-based inference and distributed systems (like KV cache) that enable AI scalability.

Tune in for a deep dive into the infrastructure and strategies making enterprise AI a reality. Whether you're a seasoned technologist, an AI practitioner, or a leader charting your company's AI journey, this discussion will provide valuable insights into building an accessible, efficient, and powerful AI future with open source.

Share to:

EachPod

EachPod

Scaling AI inference with open source ft. Brian Stevens