Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different. It's easy to think that having a good set of summary statistics (like mean, va…
Originally release June 2016
This episode is for all you (us) traffic nerds--we're talking about the hidden structure underlying traffic on-ramp metering systems. These systems slow down the flow of…
The year: 1998. The size of the web: 150 million pages. The problem: information retrieval. How do you find the "best" web pages to return in response to a query? A graduate student named Larry P…
As more and more data gets collected seemingly every day, and data scientists use that data for modeling, the technical limits associated with machine learning on big datasets keep getting pushed bac…
If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it v…
As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users inte…
Word2Vec is probably the go-to algorithm for vectorizing text data these days. Which makes sense, because it is wicked cool. Word2Vec has it all: neural networks, skip-grams and bag-of-words implem…
It seems like every day there's more and more machine learning problems that involve learning on text data, but text itself makes for fairly lousy inputs to machine learning algorithms. That's why t…
This week we'll hop into the rapidly developing industry around predictive analytics for education. For many of the students who eventually drop out, data science is showing that there might be earl…
In our follow-up episode to last week's introduction to the first self-driving car, we will be doing a technical deep dive this week and talking about the most important systems for getting a car to …
In October 2005, 23 cars lined up in the desert for a 140 mile race. Not one of those cars had a driver. This was the DARPA grand challenge to see if anyone could build an autonomous vehicle capabl…
Figuring out what features actually matter in a model is harder to figure out than you might first guess. When a human makes a decision, you can just ask them--why did you do that? But with machine…
It's hard to get information to and from Mars. Mars is very far away, and expensive to get to, and the bandwidth for passing messages with Earth is not huge. The messages you do pass have to traver…
You may be shocked to hear this, but sometimes, people on the internet can be mean. For some of us this is just a minor annoyance, but if you're a maintainer or contributor of a large project like W…
Advances in neural networks are moving fast enough that, even though it seems like we talk about them all the time around here, it also always seems like we're barely keeping up. So this week we hav…
When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you…
Say you're looking to use some Bayesian methods to estimate parameters of a system. You've got the normalization figured out, and the likelihood, but the prior... what should you use for a prior? E…
Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers? It's a tricky question to answer, since usually we need randomly distributed…
Remember last week, when we were talking about how great the ROC curve is for evaluating models? How things change... This week, we're exploring calibrated risk models, because that's a kind of mod…
00:14:32 |
Mon 06 Feb 2017
Disclaimer: The podcast and artwork embedded on this page are the property of Ben Jaffe and Katie Malone ([email protected]). This content is not affiliated with or endorsed by eachpod.com.