Data Analysis Journal

Share this post

Python Pandas DateTime Reference Guide - Issue 97

dataanalysis.substack.com
Python

Python Pandas DateTime Reference Guide - Issue 97

A short guide to solutions and tips for handling DateTime cases in Python Pandas

Olga Berezovsky
Jun 8, 2022
5
Share this post

Python Pandas DateTime Reference Guide - Issue 97

dataanalysis.substack.com

Datetime transformations are common and time-consuming regardless of whether it’s SQL, Python, R, or English. In my analysis, I often rush to jump right into the most important questions that I want to answer, or head straight to the plots (oh, the plots!). However, I end up getting stuck on the most basic things, like dates. It can obviously be very frustrating. 

That’s why I created this reference guide. It will save you time when working with the code. 

I’ve created over a dozen of them for every language or any type of case. In my journal, I’ve published only a few so far but will be publishing more. Today, it’s all about DateTime transformations in Python Pandas - a compilation of tips, solutions, and workarounds for any possible case with DateTime formatting.   

If you work with SQL, make sure to check out SQL Date-Time Guide for a list of the most common DATE extractions solutions. 

For Python Pandas, I’ve compiled many (if not all) known variations of DateTimes solutions for any case, which you can see below. If I missed something common (I am sure I did), please let me know and I’ll add it. The tips and solutions below are taken throughout the web from multiple sources, including 100 pandas tricks created by Kevin Markham and just-pandas-things. You can copy the code below or fork and run it from this Kaggle notebook.

df = generate_sample_data_datetime().reset_index()

df = df.sample(500)

df["Year"] = df["index"].dt.year

df["Month"] = df["index"].dt.month

df["Day"] = df["index"].dt.day

df["Hour"] = df["index"].dt.hour

df["Minute"] = df["index"].dt.minute

df["Second"] = df["index"].dt.second

df["Nanosecond"] = df["index"].dt.nanosecond

df["Date"] = df["index"].dt.date

df["Time"] = df["index"].dt.time

df["Time_Time_Zone"] = df["index"].dt.timetz

df["Day_Of_Year"] = df["index"].dt.dayofyear

df["Week_Of_Year"] = df["index"].dt.weekofyear

df["Week"] = df["index"].dt.week

df["Day_Of_week"] = df["index"].dt.dayofweek

df["Week_Day"] = df["index"].dt.weekday

df["Week_Day_Name"] = df["index"].dt.weekday_name

df["Quarter"] = df["index"].dt.quarter

df["Days_In_Month"] = df["index"].dt.days_in_month

df["Is_Month_Start"] = df["index"].dt.is_month_start

df["Is_Month_End"] = df["index"].dt.is_month_end

df["Is_Quarter_Start"] = df["index"].dt.is_quarter_start

df["Is_Quarter_End"] = df["index"].dt.is_quarter_end

df["Is_Leap_Year"] = df["index"].dt.is_leap_year

Convert year and day of the year into a single DateTime column

d = {\
"year": [2019, 2019, 2020],
"day_of_year": [350, 365, 1]
}

df = pd.DataFrame(d)
df

# Step 1: create a combined column

df["combined"] = df["year"]*1000 + df["day_of_year"]
df

# Step 2: convert to datetime

df["date"] = pd.to_datetime(df["combined"], format = "%Y%j")
df

Convert from UTC to another timezone

s = pd.Series(range(1552194000, 1552212001, 3600))

s = pd.to_datetime(s, unit = "s")

# set timezome to current time zone (UTC)

s = s.dt.tz_localize("UTC")

# set timezome to another time zone (Chicago)

s = s.dt.tz_convert("America/Chicago")

Related publications: 

  • A Selection Of Python Tutorials for Analysts

  • How To Install And Set Up Python

  • Correlation Analysis 101 in Python

  • SQL vs Python For Data Cleaning

  • How To Pass A Take-Home Python Assignment

  • Python Questions For A Data Analyst Interview

Thank you for reading. Until next Wednesday!

Share this post

Python Pandas DateTime Reference Guide - Issue 97

dataanalysis.substack.com
Previous
Next
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Olga Berezovsky
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing