Work with dates
Get true date format from string data
Example, you have dates like 21/09/2022 but you would a true date format to work with it:
df['TrueDate'] = pd.to_datetime(df['OriginalDate'], infer_datetime_format=True).dt.strftime('%Y-%m-%d')
Group by months
If you have a data-frame df
with a field Created at containing dates like:
2021-02-05 10:34:21 UTC
You can work with it, example to group and sort by month, you can cheat doing:
df_Temp = pd.DataFrame(df['Created at'].str[:7]) df_Created_count = pd.DataFrame(df_Temp.groupby(['Created at'], dropna=False).size(), columns=['Total']).sort_values(['Created at'], ascending=True).reset_index()
indeed with str[:7]
we extract years and months from strings. Then a usual groupby
.
Ok but it is more a pandas-way to convert your field in a true date format, in a new virtual field. Then you can extract real months and store them in a proper data-frame.
df['Created'] = pd.to_datetime(df['Created at']) df['Created'] = df['Created'].dt.to_period("M") df_TempCreated = pd.DataFrame(df['Created'])
Then easy to group and count them.
df_Created_count = pd.DataFrame(df_TempCreated.groupby(['Created'], dropna=False).size(), columns=['Total']).sort_values(['Created'], ascending=True).reset_index()
As now you use a real date format, you can display the full months with dt.strftime
.
df_Created_count['Created'] = df_Created_count['Created'].dt.strftime('%B %Y')
Format date
If you get date like 2022-10-24 and you would 24/10/2022:
df['My date'] = pd.to_datetime(df['My date'], format='%Y-%m-%d') df['My date'] = df['My date'].dt.strftime("%d/%m/%Y")
Format date from french format
If you get date like 11/05/2023 and you would 2023-05-11:
df['My date'] = df['French date'].str.split('/').str[2] + '-' + df['French date'].str.split('/').str[1] + '-' + df['French date'].str.split('/').str[0]
Get simple date (DDDDMMDD)
# SIMPLE DATE df_DataUnion['date_simple'] = df_DataUnion['date_submit'].astype(str).str[:10] df_DataUnion['date_simple'] = df_DataUnion['date_simple'].replace({'-': ''}, regex=True)
Extract the previous month
PreviousMonth = today - pd.DateOffset(months=1) print(PreviousMonth.strftime("%Y-%m"))
Convert false date field from Excel
Excel can concatenate true date format and string in the same column, arrgh! Later when you would use it in Pandas...
I use this tip to manage these kind of malicious fields in pandas:
df.loc[~df_GDC['date'].str.contains(' ', na=False), 'date'] = df['date'].astype(str).str.split(expand=True)[0] df.loc[~df_GDC['date'].str.contains(' ', na=False), 'date'] = \ df['date'].astype(str).str.split('-', expand=True)[2] + '/' +\ df['date'].astype(str).str.split('-', expand=True)[1] + '/' +\ df['date'].astype(str).str.split('-', expand=True)[0]
Eventually I put the code above in a try:... except:...
Eventually I convert all my field in string before, with astype(str)
Manage numeric date from Excel
Sometimes dates from Excel come as integer, arrgghhh!
import xlrd def read_date(date): return xlrd.xldate.xldate_as_datetime(date, 0) df.loc[df['My date'].astype(str).str.isdigit() == True, 'My date'] = \ pd.to_datetime(df['My date'].loc[df['My date'].astype(str).str.isdigit()].astype(int).apply(read_date), errors='coerce')
Check date format
Sometimes (especilally when it comes from Excel...) it is good to check your dates after treatments.
Below we search dates not starting with DD/MM/YYYY:
df2 = df[~df['my date'].str.contains('^[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]', regex=True)] print(tabulate(df2, headers='keys', tablefmt='psql', showindex=False))
Check if a date is in the future
There are many ways, but here converting in number:
from datetime import date, datetime today = date.today() df_ProblemeDate = df[['My date']][pd.to_numeric(df['My date'].str.split('/').str[2] + df['My date'].str.split('/').str[1] + df['My date'].str.split('/').str[0]) > int(str(today).replace('-', ''))]
Most recent date
mostRecentDate = df['Date'].max()
Most old date
mostOldDate = df['Date'].min()
Period between 2 dates
periodFromToday = today - mostOldDate
print(periodFromToday.days)
Count opening days with the same date column between the 2 following lines
Imagine each line of your dataframe is a delivery, with a Date field:
import pandas as pd from tabulate import tabulate from datetime import date import numpy as np import datetime as dt ... df['Year'] = df['Date'].str.split('/').str[2] df['Month'] = df['Date'].str.split('/').str[1] df['Day'] = df['Date'].str.split('/').str[0] df['Opening days'] = '' df.reset_index(drop=True, inplace=True) for i, row in df.iterrows(): dateDeliveryYear = int(row['Year']) dateDeliveryMonth = int(row['Month']) dateDeliveryDay = int(row['Day']) if i == 0: df.at[i, 'Opening days'] = 'First delivery' if i > 0: beforeDateDeliveryYear = int(df.iloc[[i-1]]['Year'].item()) beforeDateDeliveryMonth = int(df.iloc[[i-1]]['Month'].item()) beforeDateDeliveryDay = int(df.iloc[[i-1]]['Day'].item()) start = dt.date(beforeDateDeliveryYear,beforeDateDeliveryMonth,beforeDateDeliveryDay) end = dt.date(dateDeliveryYear,dateDeliveryMonth,dateDeliveryDay) df.at[i, 'Opening days'] = np.busday_count(start, end) print('\n' + u) print(tabulate(df.head(10), headers='keys', tablefmt='psql', showindex=True)) print(df.shape[0])