Introduction
As part of the never-ending effort to improve reBuy and turn it into a market leader, we recently decided to tackle the challenges of our customer services agents.
As a first step, a dump of tagged emails was created and the first goal was set: build a POC that tags the emails automatically.
To that end, NLP had to be used and a lengthy (and greedy) grid search had to be executed.
So lengthy, that 4 cores of a notebook were working for couple of hours with no results.
This was the point when I decided to explore dask
and its sibling distributed
.
In this tutorial/post we shall discuss how to take a local code doing grid search using Scikit-Learn to a cluster of AWS (EC2) nodes.
The full tutorial, including source files, can be found here, where the README is the entrypoint.