More Command line fun: downloading a podcast
In the show
hpr4398 :: Command line fun: downloading a podcast
Kevie walked us through a command to download a podcast.
He used some techniques here that I hadn't used before, and it's always great to see how other people approach the problem.
Let's have a look at the script and walk through what it does, then we'll have a look at some "traps for young players" as the
EEVBlog
is fond of saying.
Analysis of the Script
wget `curl https://tuxjam.otherside.network/feed/podcast/ | grep -o 'https*://[^"]*ogg' | head -1`
It chains four different commands together to "Save the latest file from a feed".
Let's break it down so we can have checkpoints between each step.
I often do this when writing a complex one liner - first do it as steps, and then combine it.
The curl command gets
https://tuxjam.otherside.network/feed/podcast/
.
To do this ourselves we will call
curl https://tuxjam.otherside.network/feed/podcast/ --output tuxjam.xml
, as the default file name is index.html.
This gives us a xml file, and we can confirm it's valid xml with the
xmllint
command.
$ xmllint --format tuxjam.xml >/dev/null
$ echo $?
0
Here the output of the command is ignored by redirecting it to
/dev/null
Then we check the error code the last command had. As it's
0
it completed sucessfully.
Kevie then passes the output to the
grep
search command with the option
-o
and then looks for any string starting with https followed by anything then followed by two forward slashes, then
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line
We can do the same with. I was not aware that grep defaulted to regex, as I tend to add the
--perl-regexp
to explicitly add it.
grep --only-matching 'https*://[^"]*ogg' tuxjam.xml
http matches the characters http literally (case sensitive)
s* matches the character s literally (case sensitive)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
: matches the character : literally
/ matches the character / literally
/ matches the character / literally
[^"]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
" a single character in the list " literally (case sensitive)
ogg matches the characters ogg literally (case sensitive)
When we run this ourselves we get the following
$ grep --only-matching 'https*://[^"]*ogg' tuxjam.xml
https://archive.org/download/tuxjam-121/tuxjam_121.ogg
https://archive.org/download/tuxjam-120/TuxJam_120.ogg
https://archive.org/download/tux-jam-119/TuxJam_119.ogg
https://archive.org/download/tuxjam_118/tuxjam_118.ogg
https://archive.org/download/tux-jam-117-uncut/TuxJam_117.ogg
https://tuxjam.otherside.network/tuxjam-115-ogg
https://archive.org/download/tuxjam_116/tuxjam_116.ogg
https://tuxjam.otherside.network/tuxjam-115-ogg
https://tuxjam.otherside.network/tuxjam-115-ogg
https://t