Tutorial on how to start a cluster of dask instances on AWS (EC2). Using this cluster execute an expansive grid search.
When grouping by DataFrame the order does matter and may be surprising.
Some remarks and highlights from taken from a meetup I attended.
Trying to give an intuitive understanding what's the difference between a biased and unbiased estimators of variance of a sample.
A gotcha when aggregated time series data involving hourly based counts.
Assume you have data set as follows: ID Date Value x x x where each row contains an ID, a date (given as pd.Datetime) and a value. The objective is to count how many rows occur in each day. In : import pandas as pd import numpy as np …
Benchmarking different ways to process two columns simultaneously.
I had to (or at least I thought I had to) implement a transformer to be used in a sklearn.pipeline.Pipeline. In a nutshell, I implemented badly the transform method. The original version can be found in this gist. In the following version I fixed it. Furthermore, I left …
TL;DR The function pandas.DataFrame.values is not the inverse of pd.DataFrame(np.array). Introduction An important part of reproducible data science work, is the ability to apply the DAG on the very same dataset. Simplest option is to commit the datasets to a VCS like git. This …
The 90s are back and it turns out that cross platform approach doesn't apply for MS SQL Server.