Dr. Dror

Foo is not just a "Bar"

Moving from local machine to Dask cluster using Terraform


Introduction

As part of the never-ending effort to improve reBuy and turn it into a market leader, we recently decided to tackle the challenges of our customer services agents. As a first step, a dump of tagged emails was created and the first goal was set: build a POC that tags the emails automatically. To that end, NLP had to be used and a lengthy (and greedy) grid search had to be executed. So lengthy, that 4 cores of a notebook were working for couple of hours with no results. This was the point when I decided to explore dask and its sibling distributed. In this tutorial/post we shall discuss how to take a local code doing grid search using Scikit-Learn to a cluster of AWS (EC2) nodes.

The full tutorial, including source files, can be found here, where the README is the entrypoint.