I recently came across a wonderful post by Talia Borodin titled "Think Your Company Needs a Data Scientist? You're Probably Wrong". If you didn't ready it yet, make sure you do it! It contains a wonderful collection of truths that every individual who has anything to do with data science …
Showing that at least in one certain case the two tests are the same
Tutorial on how to start a cluster of dask instances on AWS (EC2). Using this cluster execute an expansive grid search.
When grouping by DataFrame the order does matter and may be surprising.
Some remarks and highlights from taken from a meetup I attended.
Trying to give an intuitive understanding what's the difference between a biased and unbiased estimators of variance of a sample.
A gotcha when aggregated time series data involving hourly based counts.
Assume you have data set as follows: ID Date Value x x x where each row contains an ID, a date (given as pd.Datetime) and a value. The objective is to count how many rows occur in each day. In : import pandas as pd import numpy as np …
Benchmarking different ways to process two columns simultaneously.
I had to (or at least I thought I had to) implement a transformer to be used in a sklearn.pipeline.Pipeline. In a nutshell, I implemented badly the transform method. The original version can be found in this gist. In the following version I fixed it. Furthermore, I left …