Dr. Dror

Foo is not just bar

Articles





  • Sun 10 September 2017
  • Stats

Why do we need to divide by n-1?

Trying to give an intuitive understanding what's the difference between a biased and unbiased estimators of variance of a sample.


Group by date from a column

Assume you have data set as follows: ID Date Value x x x where each row contains an ID, a date (given as pd.Datetime) and a value. The objective is to count how many rows occur in each day. In [1]: import pandas as pd import numpy as np …


  • Thu 22 June 2017
  • ML

Some learnings from implementing a transformer

I had to (or at least I thought I had to) implement a transformer to be used in a sklearn.pipeline.Pipeline. In a nutshell, I implemented badly the transform method. The original version can be found in this gist. In the following version I fixed it. Furthermore, I left …

When trying to hash a data frame

TL;DR The function pandas.DataFrame.values is not the inverse of pd.DataFrame(np.array). Introduction An important part of reproducible data science work, is the ability to apply the DAG on the very same dataset. Simplest option is to commit the datasets to a VCS like git. This …