Parallelism and Concurrency — the minimal introduction
Hi, if you are reading this you probably got confused when you heard about these two words “Parallelism” and “Concurrency” and got yourself thinking about what’s the big deal with them and the difference between the terms, or you probably believe dealing with parallel or concurrent computations are really hard, and you will need to have a PhD in the topic, because those were some things I used to think.
But it all changed when I met Haskell and parallelism and concurrency weren't a problem or something that chills your spine for Haskell developers, it was just a necessary/desirable feature for an advanced language.
So let's debunk some misconceptions, the idea that concurrency and parallelism are really hard and should only be done by specialists in my view is because the most part of the mainstream languages still don't have proper support for it, and the ones that have implemented it trough threads and locks that is a model of concurrency and parallelism easy for the machine to comprehend but in general very hard for the human mind to reason about it.
What, lead us to another misconception I used to believe that there was only one way of doing parallelism and concurrency like if it was a design pattern or an algorithm, but the truth is that there are a lot of models (where most of them are very human friendly) to reason about concurrency and parallelism
Just to name a few models supported in Haskell:
- For parallelism
— Eval monad
— Par monad
— Data parallelism
- For concurrency
—threads with forkIO
—Chan/CSP(same concurrency model of GO)
— Async Await
There are plenty of options and each one of them is good to express specific needs for distinct computations
Now based on the examples above, let's differentiate parallelism and concurrency
Parallelism is when I use all the cores of my processor to work on the same problem at the same time
For example, in image processing where an image is a matrix and I need to apply a heavy computation over this image, this is a problem of parallelism because I can just split the rows of the matrix between my multiple processors and divide to conquer where every processor will apply the heavy computation over just a small fraction of the matrix.
Concurrency is when all the cores of my processor are working independently, each one with a distinct task not related to the same problem
For example, on a web server, I can't receive a small part of a request in each processor, and the database query can't be parallelized because I depend on the data that comes from the request, but I can use concurrency to set each core to handle a unique request and I will be able to process more than one request at a time and still take advantage from multicore architecture.
Also, parallelism and concurrency give scale and fault tolerance to our software, for example, we can't think about multi-machine and multi-distribution without concurrency or parallelism because at some moment we would have to coordenate the machines in order to scale and cooperate
If we want to be fault-tolerant, keeping a recovery process in another thread that checks if the application is up and if not just restarts the application is an example.
Whew, I think It's enough for today (the writer doesn't work on multicore), thank you for reading! See you in other articles.