1. EachPod

Episode 45: Your AI application is broken. Here’s what to do about it.

Author
Hugo Bowne-Anderson
Published
Thu 20 Feb 2025
Episode Link
https://vanishinggradients.fireside.fm/45

Too many teams are building AI applications without truly understanding why their models fail. Instead of jumping straight to LLM evaluations, dashboards, or vibe checks, how do you actually fix a broken AI app?

In this episode, Hugo speaks with Hamel Husain, longtime ML engineer, open-source contributor, and consultant, about why debugging generative AI systems starts with looking at your data.

In this episode, we dive into:


  • Why “look at your data” is the best debugging advice no one follows.

  • How spreadsheet-based error analysis can uncover failure modes faster than complex dashboards.

  • The role of synthetic data in bootstrapping evaluation.

  • When to trust LLM judges—and when they’re misleading.

  • Why most AI dashboards measuring truthfulness, helpfulness, and conciseness are often a waste of time.

If you're building AI-powered applications, this episode will change how you approach debugging, iteration, and improving model performance in production.

LINKS

Share to: