(Original notebooks can be found in this gist)
Assume you have data set as follows:
ID | Date | Value |
---|---|---|
x | x | x |
where each row contains an ID, a date (given as pd.Datetime
) and a value.
The objective is to count how many rows occur in each day.
import pandas as pd
import numpy as np
import datetime
import random
def random_date(start, end):
"""Generate a random datetime between `start` and `end`
Thanks to https://stackoverflow.com/a/8170651/671013
"""
return start + datetime.timedelta(
# Get a random amount of seconds between `start` and `end`
seconds=random.randint(0, int((end - start).total_seconds())),
)
Let's generate a random data set:
N = 100
df = pd.DataFrame(
{
"date": [random_date(datetime.datetime(2017,5,1), datetime.datetime(2017,7,1)) for x in range(N)],
"val": np.random.choice([0,1], size=N)
}
)
Magic:
(Based on this answer)
df.groupby(
[df['date'].map(lambda x: x.month),
df['date'].map(lambda x: x.day)]
).size()
date date
5 1 4
2 2
3 2
4 1
5 4
6 2
7 1
9 1
10 2
11 1
12 1
13 2
14 2
15 1
16 1
17 4
19 4
20 2
21 1
22 1
24 1
25 1
26 3
27 1
28 2
29 1
31 1
6 1 2
2 2
3 2
4 3
5 2
6 1
7 1
9 4
10 2
11 1
12 1
13 1
15 1
16 2
18 1
19 5
20 3
21 2
23 3
24 3
26 2
27 3
29 1
30 3
dtype: int64