Concurrent I/O in Haskell

Posted on September 11, 2015

Today just after waking up, I have been tweaking the last bits and bops of TinfoilSub, a satiric fun/learning project of mine that scrapes YouTube channels to replicate a subscriptions page without requiring a Google account. The core of this little program is the runServer function that uses Scotty to host a local page that displays the results, while the list of channels is read from a file. In the beginning, this function looked something like this:

runServer :: [String] -> IO ()
runServer channels = scotty 3000 $ do
  get "/" $ do
    videos <- fmap (sort . concatMaybe) . liftIO . sequence $
                map scrapeChannel channels
    html . renderVideos $ take 50 videos
  -- ... and more routes

Now, imagine you have more than just a couple of channels in there. This works fine for 2-3 channels, but each channel adds about a second of runtime each time the page is refreshed and the channels are re-scraped, which quickly amounts to way to much. Naturally, I went on to profiling and trying to figure out how to improve the scaling capabilities, and one of the first things to come across is using several threads (duh). Pretty basic stuff, so lets add and import Control.Parallel, instruct the RTS to use a number of threads that suits the CPU in use and everything should be fine. Right?

This is the new code:

runServer :: [String] -> IO ()
runServer channels = scotty 3000 $ do
  get "/" $ do
    videos <- fmap (sort . concatMaybe) . liftIO . sequence $
                parMap rpar scrapeChannel channels
    html . renderVideos $ take 50 videos
  -- ... and more routes

This looks neat and one might think it does what we want it to do. But looking at the actual performance when using 15 channels, this code is only about 20% faster, even when using 8 or more threads on a CPU with more than enough cores. Time for some more in depth analysis. The first thing I decided to look at was good old top. YouTube is not too slow in terms of respond times, so with a couple of threads one could assume we get to start scraping the page fairly early on, which results in increased CPU usage. But looking at top while refreshing the page revealed that the CPU was idling for a good 10 seconds before starting to scrape at all. Scraping was then run concurrently for the collected pages, so all cores were used and the actual scraping only took a second.

A look at Wireshark confirmed the problem. Wireshark recorded a new request to YouTube every second before scraping started. I looked around a bit in Scalpel, the scraping library I used and the curl bindings that were used for the actual requests, but nothing indicated any problems with making multiple requests at the same time.

So, now for the big reveal. For those who do not know, Haskell’s Parallel only works for pure parts of code, so no I/O, which hinders you from creating things like race conditions, deadlocks and other really bad stuff. That is a good thing. So, if you want to do concurrent (or parallel) I/O, you need to use Concurrent, which brings internal threads, thread communication tools and this kind of stuff. Now, this is all pretty dangerous territory in that with these tools, we can create the bad things mentioned above, which we do not want to risk, at least not if there are better alternatives. And as it turns out, there are. Use async. Async is essentially a wrapper around all the evil stuff that makes it harder to shoot yourself in the foot. For us, there is this neat function called mapConcurrently, which does exactly what we want. Do a map, do it concurrently, and do it with IO. So this is what we get:

runServer :: [String] -> IO ()
runServer channels = scotty 3000 $ do
  get "/" $ do
    videos <- fmap (sort . concatMaybe) . liftIO $
                mapConcurrently scrapeChannel channels
    html . renderVideos $ take 50 videos
  -- ... and more routes

And it does exactly what we want, it starts scraping as soon as the first request has returned a page for use, which reduces the runtime by more than half.