Python Pandas DateTime Reference Guide - Issue 97
A short guide to solutions and tips for handling DateTime cases in Python Pandas
Datetime transformations are common and time-consuming regardless of whether it’s SQL, Python, R, or English. In my analysis, I often rush to jump right into the most important questions that I want to answer, or head straight to the plots (oh, the plots!). However, I end up getting stuck on the most basic things, like dates. It can obviously be very frustrating.
That’s why I created this reference guide. It will save you time when working with the code.
I’ve created over a dozen of them for every language or any type of case. In my journal, I’ve published only a few so far but will be publishing more. Today, it’s all about DateTime transformations in Python Pandas - a compilation of tips, solutions, and workarounds for any possible case with DateTime formatting.
If you work with SQL, make sure to check out SQL Date-Time Guide for a list of the most common DATE extractions solutions.
For Python Pandas, I’ve compiled many (if not all) known variations of DateTimes solutions for any case, which you can see below. If I missed something common (I am sure I did), please let me know and I’ll add it. The tips and solutions below are taken throughout the web from multiple sources, including 100 pandas tricks created by Kevin Markham and just-pandas-things. You can copy the code below or fork and run it from this Kaggle notebook.
df = generate_sample_data_datetime().reset_index()
df = df.sample(500)
df["Year"] = df["index"].dt.year
df["Month"] = df["index"].dt.month
df["Day"] = df["index"].dt.day
df["Hour"] = df["index"].dt.hour
df["Minute"] = df["index"].dt.minute
df["Second"] = df["index"].dt.second
df["Nanosecond"] = df["index"].dt.nanosecond
df["Date"] = df["index"].dt.date
df["Time"] = df["index"].dt.time
df["Time_Time_Zone"] = df["index"].dt.timetz
df["Day_Of_Year"] = df["index"].dt.dayofyear
df["Week_Of_Year"] = df["index"].dt.weekofyear
df["Week"] = df["index"].dt.week
df["Day_Of_week"] = df["index"].dt.dayofweek
df["Week_Day"] = df["index"].dt.weekday
df["Week_Day_Name"] = df["index"].dt.weekday_name
df["Quarter"] = df["index"].dt.quarter
df["Days_In_Month"] = df["index"].dt.days_in_month
df["Is_Month_Start"] = df["index"].dt.is_month_start
df["Is_Month_End"] = df["index"].dt.is_month_end
df["Is_Quarter_Start"] = df["index"].dt.is_quarter_start
df["Is_Quarter_End"] = df["index"].dt.is_quarter_end
df["Is_Leap_Year"] = df["index"].dt.is_leap_year
Convert year and day of the year into a single DateTime column
d = {\
"year": [2019, 2019, 2020],
"day_of_year": [350, 365, 1]
}
df = pd.DataFrame(d)
df
# Step 1: create a combined column
df["combined"] = df["year"]*1000 + df["day_of_year"]
df
# Step 2: convert to datetime
df["date"] = pd.to_datetime(df["combined"], format = "%Y%j")
df
Convert from UTC to another timezone
s = pd.Series(range(1552194000, 1552212001, 3600))
s = pd.to_datetime(s, unit = "s")
# set timezome to current time zone (UTC)
s = s.dt.tz_localize("UTC")
# set timezome to another time zone (Chicago)
s = s.dt.tz_convert("America/Chicago")
Related publications:
Thank you for reading. Until next Wednesday!