<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Analysis Journal: Python]]></title><description><![CDATA[Interview questions, examples, guides]]></description><link>https://dataanalysis.substack.com/s/python</link><image><url>https://substackcdn.com/image/fetch/$s_!WdsI!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7029b3-f274-4215-ac43-d275f496ecf8_200x200.png</url><title>Data Analysis Journal: Python</title><link>https://dataanalysis.substack.com/s/python</link></image><generator>Substack</generator><lastBuildDate>Wed, 22 Apr 2026 06:57:59 GMT</lastBuildDate><atom:link href="https://dataanalysis.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Olga Berezovsky]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[dataanalysis@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[dataanalysis@substack.com]]></itunes:email><itunes:name><![CDATA[Olga Berezovsky]]></itunes:name></itunes:owner><itunes:author><![CDATA[Olga Berezovsky]]></itunes:author><googleplay:owner><![CDATA[dataanalysis@substack.com]]></googleplay:owner><googleplay:email><![CDATA[dataanalysis@substack.com]]></googleplay:email><googleplay:author><![CDATA[Olga Berezovsky]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Data Analysis in Python: A Guide to Working With Dates in Pandas - Issue 211]]></title><description><![CDATA[A quick datetime reference guide for handling and parsing dates in Python Pandas.]]></description><link>https://dataanalysis.substack.com/p/data-analysis-in-python-a-guide-to</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/data-analysis-in-python-a-guide-to</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 03 Jul 2024 12:00:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/42c35d97-2e88-4658-909d-3acdf24d1990_468x422.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to the <a href="https://dataanalysis.substack.com/">Data Analysis Journal</a>, a weekly newsletter about data science and analytics.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://dataanalysis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://dataanalysis.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p><em>Date</em> transformations are annoying and time-consuming, regardless of whether it&#8217;s SQL, Python, R, or English. I often get stuck on the most basic things, like dates. It is very frustrating.</p><p>I want to share this reference guide today, mostly so I can have it close for myself. I still have to use it at least once per week. Hopefully, it will save you time when working with UTC-to-datetime conversions, parsing strings to datetime, or handling anything dates-related.&nbsp;&nbsp;&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Aqf7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Aqf7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!Aqf7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!Aqf7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Aqf7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Aqf7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png" width="170" height="170" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:170,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Aqf7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!Aqf7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!Aqf7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Aqf7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e885aa7-6094-4d3f-9162-7c77e9c6886a_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>For Python Pandas, I&#8217;ve compiled many (if not all) known variations of <em>datetimes</em> solutions for any case, which you can see below. If I missed something common (I am sure I did), please let me know, and I&#8217;ll add it.&nbsp;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/data-analysis-in-python-a-guide-to">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[SQL And Python Mistakes To Avoid - Issue 155]]></title><description><![CDATA[Don&#8217;t repeat my mistakes. Learning from common errors and how to avoid them]]></description><link>https://dataanalysis.substack.com/p/sql-and-python-mistakes-to-avoid</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/sql-and-python-mistakes-to-avoid</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 02 Aug 2023 12:01:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5493181-bc4d-4e0a-809c-6c6ff6beeb80_480x366.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The painful part of growing up is accepting your mistakes, errors, and missteps. We all do them, learn from them, embrace the shame, and move forward. That&#8217;s how you know you are growing - with every step you take, you can remember 1 or 2 mistakes you made to get here.&nbsp;</p><p>If you are in a leadership position, it&#8217;s your responsibility to ensure that you will be able to contain, own, and ideally prevent whatever errors your team makes.</p><p>Today I&#8217;ll grudgingly share some painful lessons I learned from my own errors made while working with SQL and Python to extract, read, and analyze data. These mistakes are so simple and basic, which makes it that much harder to accept. I hope it will remind us of the importance of triple-checking, code reviews, and cross-team validation.&nbsp;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/sql-and-python-mistakes-to-avoid">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Python For Data Science: The Difference Between Merge, Join, And Concat - Issue 124]]></title><description><![CDATA[Ways to join and merge datasets in Python Pandas. How to know which method to pick for which use case.]]></description><link>https://dataanalysis.substack.com/p/python-for-data-science-the-difference</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/python-for-data-science-the-difference</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 21 Dec 2022 13:01:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8081d17c-53d5-4cbe-9428-785b75d3da75_811x571.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><p>Throughout the year, I covered most steps of data work in Python including its packages, data ingestion, cleaning, EDA, graphs, and more. One important piece of workflow in data science and analytics that I haven&#8217;t touched yet is data processing. Specifically: <strong>merging multiple datasets in Python Pandas</strong>.&nbsp;</p><p>This step was somehow the hardest for me to figure out, and I had my share of mistakes made using the wrong approach that delayed my analysis (or even worse, brought me to the wrong output leading to getting the wrong data).&nbsp;</p><p>This step is the most tricky one because it&#8217;s done early in the process and sets the baseline for your analysis. What this means is if you initially merged the datasets wrong, every next step will get you further from the truth. In this issue, I&#8217;ll walk you through the methods of merging multiple datasets into one and describe the difference between MERGE(), JOIN(), and CONCAT() in Python, and give you pointers to follow to figure out which approach you should use for which case.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VIc8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VIc8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!VIc8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!VIc8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!VIc8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VIc8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png" width="200" height="200" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VIc8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!VIc8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!VIc8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!VIc8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F725d7e2a-b3e3-468f-b4b4-db25f1df0b30_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Python Pandas is particularly great for any use cases in data analysis. I can see how it can be daunting to search for documentation to figure out what is the right or best way to perform a particular task, especially when you don&#8217;t know what you&#8217;re searching for. While I encourage you to start with reading <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html">documentation</a>, I also hope to point you to the right method and save you some time.&nbsp;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/python-for-data-science-the-difference">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How To Pass A Technical Assignment For Senior Data Scientist Position - Issue 118]]></title><description><![CDATA[A case study challenge breakdown for a senior data scientist position at one of the biggest Bay Area companies completed in Python]]></description><link>https://dataanalysis.substack.com/p/how-to-pass-a-technical-assignment</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/how-to-pass-a-technical-assignment</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 09 Nov 2022 15:01:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2b648aa-bc04-4117-8109-5c6eaca51ed4_1600x421.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>San Francisco has been trudging drearily through a few weeks of layoffs. For those of you who might be going through rounds of interviews soon or even now, I wanted to share a good example of a technical assignment completed in Python for a Senior Data Scientist position at one of the biggest Bay Area companies.&nbsp;</p><p>The assignment and my solution contain ex&#8230;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/how-to-pass-a-technical-assignment">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Getting Started With Python Plots - Issue 115]]></title><description><![CDATA[An introduction to creating visualizations in Python and a consolidated reference guide for building plots in Python Pandas]]></description><link>https://dataanalysis.substack.com/p/getting-started-with-python-plots</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/getting-started-with-python-plots</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 19 Oct 2022 14:01:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F07caad75-96c7-43fe-8973-afbcf56e5e20_952x888.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><p>Plots, plots, plots! Hate them or like them (I know the expression is usually &#8220;love&#8221;, but let&#8217;s not get ahead of ourselves), you can&#8217;t deliver a thing in analytics without them.&nbsp;</p><p>This publication is a follow-up piece to my recent <a href="https://dataanalysis.substack.com/p/how-to-pick-the-right-chart-issue">How To Pick The Right Chart</a> article where I covered the most common charts and offered guides on how to appropriately choose the right visualization for your analysis.</p><p>I am also publishing this reference guide mostly for myself, in order to have a consolidated list of the most useful plot types and plots tutorials. There is a lot of documentation on Python plots (check some at the end of this newsletter) but this is meant to serve as a sweet one-pager that you can quickly refer to while picking a plot for your analysis.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Re1Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Re1Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!Re1Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!Re1Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Re1Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Re1Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png" width="200" height="200" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Re1Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!Re1Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!Re1Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Re1Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0f85406-cbf5-4cb6-abd9-d9c23ca049ac_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div>
      <p>
          <a href="https://dataanalysis.substack.com/p/getting-started-with-python-plots">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Python Pandas DateTime Reference Guide - Issue 97]]></title><description><![CDATA[A short guide to solutions and tips for handling DateTime cases in Python Pandas]]></description><link>https://dataanalysis.substack.com/p/python-pandas-datetime-reference</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/python-pandas-datetime-reference</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 08 Jun 2022 16:30:20 GMT</pubDate><enclosure url="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/efda1c71-0d58-49c8-8e83-7d0cce8647ee_225x225.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Datetime transformations are common and time-consuming regardless of whether it&#8217;s SQL, Python, R, or English. In my analysis, I often rush to jump right into the most important questions that I want to answer, or head straight to the plots (oh, the plots!). However, I end up getting stuck on the most basic things, like dates. It can obviously be very frustrating.&nbsp;</p><p>That&#8217;s why I created this reference guide. It will save you time when working with the code.&nbsp; </p><p>I&#8217;ve created over a dozen of them for every language or any type of case. In my journal, I&#8217;ve published only a few so far but will be publishing more. Today, it&#8217;s all about DateTime transformations in Python Pandas - a compilation of tips, solutions, and workarounds for any possible case with DateTime formatting.&nbsp;&nbsp;&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qdZd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qdZd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!qdZd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!qdZd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!qdZd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qdZd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png" width="190" height="190" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:190,&quot;bytes&quot;:2197,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qdZd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!qdZd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!qdZd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!qdZd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b67ed1c-6b0a-4510-9a7b-94a311647b13_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://dataanalysis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://dataanalysis.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>If you work with SQL, make sure to check out <a href="https://dataanalysis.substack.com/p/sql-date-time-checklist?s=w">SQL Date-Time Guide</a> for a list of the most common DATE extractions solutions.&nbsp;</p><p>For Python Pandas, I&#8217;ve compiled many (if not <em>all</em>) known variations of DateTimes solutions for any case, which you can see below. If I missed something common (I am sure I did), please let me know and I&#8217;ll add it. The tips and solutions below are taken throughout the web from multiple sources, including <a href="https://www.dataschool.io/python-pandas-tips-and-tricks/">100 pandas tricks</a> created by <a href="https://www.linkedin.com/in/justmarkham/">Kevin Markham</a> and <a href="https://github.com/chiphuyen/just-pandas-things">just-pandas-things</a>. You can copy the code below or fork and run it from this <a href="https://www.kaggle.com/code/olgaberezovsky/pandas-100-tricks">Kaggle notebook</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pnlt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pnlt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 424w, https://substackcdn.com/image/fetch/$s_!Pnlt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 848w, https://substackcdn.com/image/fetch/$s_!Pnlt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 1272w, https://substackcdn.com/image/fetch/$s_!Pnlt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pnlt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png" width="197" height="197" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:225,&quot;width&quot;:225,&quot;resizeWidth&quot;:197,&quot;bytes&quot;:2956,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pnlt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 424w, https://substackcdn.com/image/fetch/$s_!Pnlt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 848w, https://substackcdn.com/image/fetch/$s_!Pnlt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 1272w, https://substackcdn.com/image/fetch/$s_!Pnlt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd82f9d75-1300-4844-b45b-bf020cb4f782_225x225.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p><code>df = generate_sample_data_datetime().reset_index()</code></p><p><code>df = df.sample(500)</code></p><p><code>df["Year"] = df["index"].dt.year</code></p><p><code>df["Month"] = df["index"].dt.month</code></p><p><code>df["Day"] = df["index"].dt.day</code></p><p><code>df["Hour"] = df["index"].dt.hour</code></p><p><code>df["Minute"] = df["index"].dt.minute</code></p><p><code>df["Second"] = df["index"].dt.second</code></p><p><code>df["Nanosecond"] = df["index"].dt.nanosecond</code></p><p><code>df["Date"] = df["index"].dt.date</code></p><p><code>df["Time"] = df["index"].dt.time</code></p><p><code>df["Time_Time_Zone"] = df["index"].dt.timetz</code></p><p><code>df["Day_Of_Year"] = df["index"].dt.dayofyear</code></p><p><code>df["Week_Of_Year"] = df["index"].dt.weekofyear</code></p><p><code>df["Week"] = df["index"].dt.week</code></p><p><code>df["Day_Of_week"] = df["index"].dt.dayofweek</code></p><p><code>df["Week_Day"] = df["index"].dt.weekday</code></p><p><code>df["Week_Day_Name"] = df["index"].dt.weekday_name</code></p><p><code>df["Quarter"] = df["index"].dt.quarter</code></p><p><code>df["Days_In_Month"] = df["index"].dt.days_in_month</code></p><p><code>df["Is_Month_Start"] = df["index"].dt.is_month_start</code></p><p><code>df["Is_Month_End"] = df["index"].dt.is_month_end</code></p><p><code>df["Is_Quarter_Start"] = df["index"].dt.is_quarter_start</code></p><p><code>df["Is_Quarter_End"] = df["index"].dt.is_quarter_end</code></p><p><code>df["Is_Leap_Year"] = df["index"].dt.is_leap_year</code></p><h3>Convert year and day of the year into a single DateTime column</h3><pre><code>d = {\
"year": [2019, 2019, 2020],
"day_of_year": [350, 365, 1]
}

df = pd.DataFrame(d)
df

<em># Step 1: create a combined column</em>

df["combined"] = df["year"]*1000 + df["day_of_year"]
df

<em># Step 2: convert to datetime</em>

df["date"] = pd.to_datetime(df["combined"], format = "%Y%j")
df</code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OH-D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OH-D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 424w, https://substackcdn.com/image/fetch/$s_!OH-D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 848w, https://substackcdn.com/image/fetch/$s_!OH-D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 1272w, https://substackcdn.com/image/fetch/$s_!OH-D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OH-D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png" width="438" height="156.67362924281986" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:274,&quot;width&quot;:766,&quot;resizeWidth&quot;:438,&quot;bytes&quot;:36988,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OH-D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 424w, https://substackcdn.com/image/fetch/$s_!OH-D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 848w, https://substackcdn.com/image/fetch/$s_!OH-D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 1272w, https://substackcdn.com/image/fetch/$s_!OH-D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cbe4507-f1ab-4692-b18b-2fa77ad2681c_766x274.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3>Convert from UTC to another timezone</h3><pre><code>s = pd.Series(range(1552194000, 1552212001, 3600))

s = pd.to_datetime(s, unit = "s")

<em># set timezome to current time zone (UTC)</em>

s = s.dt.tz_localize("UTC")

<em># set timezome to another time zone (Chicago)</em>

s = s.dt.tz_convert("America/Chicago")</code></pre><div><hr></div><h3><strong>Related publications:&nbsp;</strong></h3><ul><li><p><a href="https://dataanalysis.substack.com/p/a-selection-of-python-tutorials-for">A Selection Of Python Tutorials for Analysts</a></p></li><li><p><a href="https://dataanalysis.substack.com/p/how-to-install-and-set-up-python">How To Install And Set Up Python</a></p></li><li><p><a href="https://dataanalysis.substack.com/p/correlation-analysis-101-in-python">Correlation Analysis 101 in Python</a></p></li><li><p><a href="https://dataanalysis.substack.com/p/sql-vs-python-for-data-cleaning-issue">SQL vs Python For Data Cleaning</a></p></li><li><p><a href="https://dataanalysis.substack.com/p/how-to-pass-a-take-home-assignment">How To Pass A Take-Home Python Assignment</a></p></li><li><p><a href="https://dataanalysis.substack.com/p/python-questions-for-a-data-analyst">Python Questions For A Data Analyst Interview</a></p></li></ul><p>Thank you for reading. Until next Wednesday!</p>]]></content:encoded></item><item><title><![CDATA[SQL vs Python For Data Cleaning - Issue 84]]></title><description><![CDATA[What is better for data cleaning - SQL or Python? How to use them for data transformation, analysis, and refactoring.]]></description><link>https://dataanalysis.substack.com/p/sql-vs-python-for-data-cleaning-issue</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/sql-vs-python-for-data-cleaning-issue</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 02 Mar 2022 17:30:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rkaj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4cbe0286-e8f5-4b71-8c65-a5377016f315_523x474.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The sad truth is Data Scientists and Analysts spend most of their time doing data cleaning, according to one of the <a href="https://www.anaconda.com/state-of-data-science-2020">Anaconda surveys</a>. It&#8217;s common to use Python for data transformation, but over the last few years, SQL has become a popular method proving to be more cost and time optimized. In this issue, I&#8217;ll walk through the basics of common data cleani&#8230;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/sql-vs-python-for-data-cleaning-issue">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[A Selection Of Python Tutorials for Analysts - Issue 61]]></title><description><![CDATA[A roundup of my favorite tutorials for data analysts to either get started with or upkeep existing Python skills]]></description><link>https://dataanalysis.substack.com/p/a-selection-of-python-tutorials-for</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/a-selection-of-python-tutorials-for</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 15 Sep 2021 16:30:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0bsd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello analysts! As promised earlier, today&#8217;s newsletter is all about Python tutorials. I am sure every one of you has your own list of handy videos, courses, and go-to pages for Python solutions and tips. In this issue, I am going to save you countless hours of research and share my favorite list of Python tutorials for data processing, transformations, cleaning, analysis, and visualizations.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h1u5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h1u5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!h1u5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!h1u5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!h1u5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h1u5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png" width="200" height="200" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2197,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h1u5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!h1u5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!h1u5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!h1u5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec16f63f-adaf-4897-a451-92c2a48953d9_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://dataanalysis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://dataanalysis.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>A little pre-history: I started my analytical journey with R (like any gray-haired medieval analyst today). And after all that time, I still think R language is the most suited for statistics and analysis. I had to make a switch because every team I was a part of was using Python, and many projects we collaborated on were also done in Python.&nbsp;</p><p>My transition from R to Python was &#8230; long. I think I went through every Python class out there. After too many tutorials, I started making a list of my favorite go-to videos and lessons which I still use and keep updating to this day. Most of those are (or were) free, but things change, so it is possible some of the classes below are not free anymore.&nbsp;&nbsp;</p><p>You probably have noticed already that I&#8217;m a fan of <a href="https://realpython.com/">Real Python</a> tutorials. I think they have a good combination of theory and examples, covering everything from basics to advanced development. So many of my favorite sources come from there.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0bsd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0bsd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!0bsd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!0bsd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!0bsd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0bsd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0bsd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!0bsd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!0bsd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!0bsd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8d1423-db15-419d-82d0-f910ecdf62fa_1920x1080.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Installation</strong></h3><ul><li><p><a href="https://realpython.com/courses/setting-up-python/">Python Basics: Setting Up Python</a></p></li><li><p><a href="https://realpython.com/courses/installing-python-windows-macos-linux/">Installing Python on Windows, macOS, and Linux</a></p></li><li><p><a href="https://www.youtube.com/watch?v=xFciV6Ew5r4&amp;list=PL-osiE80TeTt66h8cVpmbayBKlMTuS55y">Setting up a Python Development Environment in Sublime Text</a> (if anyone is using Sublime)</p></li><li><p><a href="https://www.youtube.com/watch?v=YJC6ldI3hWk&amp;list=PL-osiE80TeTt66h8cVpmbayBKlMTuS55y&amp;index=4">Python Tutorial: Anaconda - Installation</a></p></li></ul><h3><strong>Basics</strong></h3><ul><li><p><a href="https://365datascience.com/tutorials/python-tutorials/import-data-python/">How To Import Data Into Python?</a></p></li><li><p><a href="https://realpython.com/courses/reading-and-writing-csv-files/">Reading and Writing CSV Files</a></p></li><li><p><a href="https://realpython.com/courses/reading-writing-files-pandas/">Reading and Writing Files With Pandas</a></p></li><li><p><a href="https://realpython.com/courses/using-jupyter-notebooks/">Using Jupyter Notebooks</a></p></li><li><p><a href="https://realpython.com/courses/pandas-dataframe-working-with-data/">The Pandas DataFrame: Working With Data Efficiently</a></p></li><li><p><a href="https://www.youtube.com/watch?v=QUClKFFn1Vk&amp;list=PLZoTAELRMXVNUL99R4bDlVYsncUNvwUBB&amp;index=7">Pandas, Data Frame, and Data Series</a> - really like this channel. Data frame and series explanations are very good.</p></li><li><p><a href="https://realpython.com/courses/pandas-dataframes-101/">Pandas DataFrames 101</a></p></li></ul><h3><strong>Data Types</strong></h3><ul><li><p><a href="https://realpython.com/courses/python-data-types/">Basic Data Types in Python</a></p></li><li><p><a href="https://devhints.io/python">Python cheatsheet - basics</a></p></li><li><p><a href="https://www.youtube.com/watch?v=pkYVOmU3MgA">Data Structures and Algorithms in Python - Full Course for Beginners</a></p></li><li><p><a href="https://realpython.com/courses/variables-python/">Variables in Python</a></p></li><li><p><a href="https://realpython.com/courses/lists-tuples-python/">Lists and Tuples in Python</a></p></li><li><p><a href="https://www.programiz.com/python-programming/tuple">Python Tuple</a></p></li><li><p><a href="https://www.youtube.com/watch?v=sLRZ17XvoxY&amp;list=PLtb2Lf-cJ_AW6aS1VqeA6Sk0ov6yZKCKP">Python Lists - Basics</a></p></li><li><p><a href="https://realpython.com/courses/dicts-arrays-ideal-data-structure/">Dictionaries and Arrays: Selecting the Ideal Data Structure</a></p></li><li><p><a href="https://www.w3schools.com/python/python_dictionaries.asp">Python Dictionaries</a></p></li><li><p><a href="https://realpython.com/courses/defining-and-calling-functions/">Defining and Calling Python Functions</a></p></li></ul><h3><strong>Data cleaning and EDA</strong></h3><ul><li><p><a href="https://365datascience.com/courses/data-preprocessing-numpy/">Data Preprocessing with NumPy</a></p></li><li><p><a href="https://realpython.com/python-data-cleaning-numpy-pandas/">Pythonic Data Cleaning With Pandas and NumPy</a></p></li><li><p><a href="https://realpython.com/pandas-sort-python/">Pandas Sort: Your Guide to Sorting Data in Python</a></p></li><li><p><a href="https://realpython.com/pandas-groupby/">Pandas GroupBy: Your Guide to Grouping Data in Python</a></p></li><li><p><a href="https://www.youtube.com/watch?v=txMdrV1Ut64&amp;list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&amp;index=8">Grouping and Aggregating - Analyzing and Exploring Your Data</a></p></li><li><p><a href="https://www.youtube.com/watch?v=Lw2rlcxScZY&amp;list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&amp;index=4">Filtering - Using Conditionals to Filter Rows and Columns</a></p></li><li><p><a href="https://bsdmag.org/python_data/">Cheat Sheet for Exploratory Data Analysis in Python</a> - infographic by Analytics Vidhya</p></li><li><p><a href="https://realpython.com/courses/explore-dataset-with-pandas/">Explore Your Dataset With Pandas</a></p></li><li><p><a href="https://www.youtube.com/watch?v=uXGiQhRV8II&amp;list=PLtb2Lf-cJ_AU_EK5iWVLOgxaR6xzjeZqd&amp;index=13">Data Analysis with Python</a> (EDA)</p></li><li><p><a href="https://www.youtube.com/watch?v=KdmPHEnPJPs&amp;list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&amp;index=9">Cleaning Data - Casting Datatypes and Handling Missing Values</a></p></li><li><p><a href="https://www.youtube.com/watch?v=UFuo7EHI8zc&amp;list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&amp;index=10">Working with Dates and Time Series Data</a></p></li><li><p><a href="https://realpython.com/courses/generating-random-data-python/">Generating Random Data in Python</a> (not cleaning or EDA, but helpful for testing)</p></li></ul><h3><strong>Analysis and some simple ML</strong></h3><ul><li><p><a href="https://realpython.com/python-statistics/">Python Statistics Fundamentals: How to Describe Your Data</a></p></li><li><p><a href="https://www.youtube.com/watch?v=nLw1RNvfElg&amp;list=PLQVvvaa0QuDfSfqQuee6K8opKtZsh7sA9">Introduction - Data Analysis and Data Science with Python and Pandas</a></p></li><li><p><a href="https://realpython.com/numpy-scipy-pandas-correlation-python/">NumPy, SciPy, and Pandas: Correlation With Python</a></p></li><li><p><a href="https://www.youtube.com/watch?v=UsglokDLa2o&amp;list=PLZoTAELRMXVNUL99R4bDlVYsncUNvwUBB&amp;index=11">Seaborn Tutorial</a></p></li><li><p><a href="https://realpython.com/linear-regression-in-python/">Linear Regression in Python</a></p></li><li><p><a href="https://www.youtube.com/watch?v=45ryDIPHdGg&amp;list=PLzMcBGfZo4-mP7qA9cagf68V06sko5otr&amp;index=2">Linear Regression</a></p></li><li><p><a href="https://realpython.com/logistic-regression-python/">Logistic Regression in Python</a></p></li><li><p><a href="https://realpython.com/courses/splitting-datasets-scikit-learn-train-test-split/">Splitting Datasets - train_test_split()</a></p></li><li><p><a href="https://www.youtube.com/watch?v=X2vAabgKiuM&amp;list=PLWKjhJtqVAbnqBxcdjVGgT3uVR10bzTEB&amp;index=16">Intro into NLP</a></p></li><li><p><a href="https://www.youtube.com/watch?v=OGxgnH8y2NM&amp;list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v">Machine Learning Tutorial with Python Intro</a></p></li><li><p><a href="https://www.youtube.com/watch?v=AbVtcUBlBok&amp;list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v&amp;index=25">Creating an SVM from scratch</a></p></li></ul><h3><strong>SQL</strong></h3><ul><li><p><a href="https://realpython.com/python-sql-libraries/">Introduction to Python SQL Libraries</a></p></li><li><p><a href="https://www.youtube.com/watch?v=byHcYRpMgI4&amp;list=PLWKjhJtqVAbnqBxcdjVGgT3uVR10bzTEB&amp;index=21">SQLite Databases With Python - Full Course</a></p></li><li><p><a href="https://hakibenita.com/sql-for-data-analysis">Practical SQL for Data Analysis</a></p></li></ul><h3><strong>Plots</strong></h3><ul><li><p><a href="https://realpython.com/courses/plot-pandas-data-visualization/">Plot With Pandas: Python Data Visualization Basics</a></p></li><li><p><a href="https://www.blog.pythonlibrary.org/2021/09/07/matplotlib-an-intro-to-creating-graphs-with-python/">Matplotlib &#8211; An Intro to Creating Graphs with Python</a></p></li><li><p><a href="https://www.youtube.com/watch?v=UO98lJQ3QGI&amp;list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_">Creating and Customizing Our First Plots</a></p></li><li><p><a href="https://realpython.com/courses/python-plotting-matplotlib/">Python Plotting With Matplotlib</a></p></li><li><p><a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html">Chart Visualization</a> - pandas documentation</p></li><li><p><a href="https://realpython.com/courses/python-histograms/">Python Histogram Plotting: NumPy, Matplotlib, Pandas &amp; Seaborn</a></p></li><li><p><a href="https://www.youtube.com/watch?v=nKxLfUrkLE8&amp;list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_&amp;index=2">Bar Charts</a></p></li><li><p><a href="https://www.youtube.com/watch?v=xN-Supd4H38&amp;list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_&amp;index=4">Stack Plots</a></p></li><li><p><a href="https://www.youtube.com/watch?v=_LWjaAiKaf8&amp;list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_&amp;index=8">Plotting Time Series Data</a></p></li><li><p><a href="https://www.youtube.com/watch?v=zZZ_RCwp49g&amp;list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_&amp;index=7">Scatter Plots</a></p></li><li><p><a href="https://www.youtube.com/watch?v=XDv6T4a0RNc&amp;list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_&amp;index=6">Histograms</a></p></li></ul><h3><strong>Other</strong></h3><ul><li><p><a href="https://www.youtube.com/watch?v=rfscVS0vtbw">Learn Python - Full Course for Beginners</a> - a full complete course (4 hours) of Python Intro.</p></li><li><p><a href="https://www.kaggle.com/timoboz/python-data-science-handbook">Python Data Science Handbook</a> - The entire Python Data Science Handbook, in the form of free Jupyter notebooks.</p></li><li><p><a href="https://allendowney.github.io/AstronomicalData/README.html">Astronomical Data in Python</a><strong> - </strong>the code is written in Jupyter notebooks. You can run the notebooks either on Colab or in your own environment (you can download them from the repository and follow the instructions to set up your environment).</p></li><li><p><a href="https://www.youtube.com/c/joshstarmer/playlists">StatQuest with Josh Starmer</a> - less Python and mostly ML.</p></li><li><p><a href="https://github.com/huangsam/ultimate-python">Ultimate Python study guide</a> - all in one: a good resource and guide.&nbsp;</p></li><li><p><a href="https://www.youtube.com/channel/UC4lpdIKfv2Vm3L8MCLPqblg/featured">Python from Nisha M</a> - a good explanation about data types.</p></li></ul><div><hr></div><p>I have another list for SQL and Statistics tutorials, which I&#8217;ll share soon as well. I continually update this list to keep it fresh and relevant, so please, send me your favorite Python links and sources that you think I should add!&nbsp;&nbsp;</p><p>Thanks for reading, everyone. Until next Wednesday!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://dataanalysis.substack.com/p/a-selection-of-python-tutorials-for?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://dataanalysis.substack.com/p/a-selection-of-python-tutorials-for?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[How To Pass A Take-Home Python Assignment - Issue 59]]></title><description><![CDATA[A case study challenge for a senior analyst position at one of the biggest Bay Area companies completed in Python]]></description><link>https://dataanalysis.substack.com/p/how-to-pass-a-take-home-assignment</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/how-to-pass-a-take-home-assignment</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 01 Sep 2021 16:30:41 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a06f65e-6633-4617-9306-893546a8c3ac_1784x758.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Did you receive a take-home assignment? My condolences. Get on stack overflow or bookmark this issue.</p><p>Did they ask you not to spend more than 2-4 hours on a challenge? Haha. Very funny. But this is data analysis, not a comedy club. Let&#8217;s get down to business!</p><p>Let me do a proper intro. This publication is a successful example of a take-home assignment for &#8230;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/how-to-pass-a-take-home-assignment">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Currency Conversion In SQL and Python - Issue 53]]></title><description><![CDATA[How to approach conversion rates for multiple currencies in SQL for revenue reporting]]></description><link>https://dataanalysis.substack.com/p/currency-conversion-in-sql-issue</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/currency-conversion-in-sql-issue</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 21 Jul 2021 16:30:23 GMT</pubDate><enclosure url="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/1d820c46-984f-4ba4-bfcc-ebf529f6e99d_1280x1148.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A common challenge for data analysts is to deal with multiple currency conversions for revenue reporting for internationally available products. In this article, I will describe 2 ways of how to generate an exchange rate table in SQL that will allow you to convert different local currency values into one target currency for your analysis.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1z2K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1z2K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!1z2K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!1z2K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!1z2K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1z2K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png" width="200" height="200" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2197,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1z2K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!1z2K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!1z2K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!1z2K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3370858c-1fdd-4467-aae3-ef98397d9fd9_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>There are multiple ways to solve this challenge, ranging from easy workarounds for one-time ad-hoc requests or writing scripts that would allow automating revenue reporting. Depending on your applications, you could simply leverage the window function, utilize the built-in currency converter, or request support from your engineering team to run this conversion on a backend. If none of these are an option for you, below I&#8217;ll demonstrate an easy and quick workaround as a way to get it done on your own using only SQL.&nbsp;</p><h3>Problem</h3><p>Let&#8217;s say you have all purchases loaded in a table with user id, user country, and main purchase details like plan id, plan name, amount, currency, and transaction date. You have to use this table to calculate MRR per country per period or get total revenue made internationally:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Vcl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Vcl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 424w, https://substackcdn.com/image/fetch/$s_!1Vcl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 848w, https://substackcdn.com/image/fetch/$s_!1Vcl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 1272w, https://substackcdn.com/image/fetch/$s_!1Vcl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Vcl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png" width="1456" height="350" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119272,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Vcl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 424w, https://substackcdn.com/image/fetch/$s_!1Vcl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 848w, https://substackcdn.com/image/fetch/$s_!1Vcl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 1272w, https://substackcdn.com/image/fetch/$s_!1Vcl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b718e3a-55fd-48b3-946f-4750819ba1b1_1622x390.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>If your desired currency is dollars, the US revenue is easy to report. In your SELECT statement, you would just do sumSUMamount_paid). But what about International?&nbsp;</p><p>You can&#8217;t simply pick the expected amount per plan and report revenue based on it, because the plan amount is different from the amount paid. The amount paid is actual money received from users, including discounts, promo codes, refunds, etc. That&#8217;s why you have to convert every paid amount into dollars or the baseline reporting currency at your company. If only 5 or 10 countries are supported, the conversion rate can be calculated manually in SQL. However, if you deal with many countries, then you&#8217;d better pick another approach.&nbsp;&nbsp;</p><h3>Solution</h3><p>The obvious solution to this problem is to create a helper table with exchange rates per currency. Below I offer you 2 solutions on how to generate this table:&nbsp;</p><ol><li><p>We leverage <a href="https://fiscaldata.treasury.gov/datasets/treasury-reporting-rates-exchange/treasury-reporting-rates-of-exchange">public rates of exchange data</a> and load them into a view or a table with a primary key on currency code, and then simply perform a join to map the currency we have to the appropriate conversion rate to calculate a dollar amount for every transaction.&nbsp;</p></li></ol><p>This solution will work for every database and SQL variation. It&#8217;s simple, easy, and fast. The downside is that it&#8217;s a static table that won&#8217;t work as a long-term solution, because the currency rates fluctuate. When you report revenue data, you obviously should be as precise as you can. But for ad-hoc or one-time data requests, the above is the best way to proceed.&nbsp;</p><ol start="2"><li><p>We use a Python script to access <a href="https://freecurrencyapi.net/">public exchange rates data via API</a>, and then transform JSON files into a SQL INSERT statement and load it into a generated table.</p></li></ol><p>This is a more complex approach similar to ETL. It gives you refreshed, current exchange rates and keeps the table updated for any long-term or automated reporting.&nbsp;</p><p>There are multiple ways to achieve this. My workable approach was to leverage sqlalchemy and SQLite, and the destination table can be loaded right into Snowflake, Postgres, or any database. </p><p>Check Python code <a href="https://gist.github.com/ks--ks/cb5acc48d89ffa1eda1446dc161857c5">here</a>.</p><p>Thanks for reading, everyone. Until next Wednesday!</p>]]></content:encoded></item><item><title><![CDATA[How To Install And Set Up Python - Issue 52]]></title><description><![CDATA[How to get started with Python basics and its environments for data science and analysis]]></description><link>https://dataanalysis.substack.com/p/how-to-install-and-set-up-python</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/how-to-install-and-set-up-python</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 14 Jul 2021 16:30:09 GMT</pubDate><enclosure url="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d74d349c-b431-4620-b127-1a7d2ae515ba_640x480.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Python installation can be quite complex and often leads to a &#8220;<a href="http://allendowney.blogspot.com/2018/02/learning-to-program-is-getting-harder.html">programming barrier</a>&#8221; for beginners, where you can&#8217;t get started with Python until you install it, but you can&#8217;t simply install it because you don&#8217;t know how software installation works, how to clone repositories, or even how to download code from GitHub, and then how to run the code you down&#8230;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/how-to-install-and-set-up-python">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Python Questions For A Data Analyst Interview - Issue 44]]></title><description><![CDATA[Some common Python interview questions for hiring data analysts and data scientists]]></description><link>https://dataanalysis.substack.com/p/python-questions-for-a-data-analyst</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/python-questions-for-a-data-analyst</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 19 May 2021 16:31:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!S2aO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F77ac8be0-c577-40ee-9152-e41cf8b0dd7d_1366x754.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Technical interviews are stressful and time-consuming. I felt like preparing for technical screens specifically created for data analyst positions was more challenging, because most materials out there are focused on the engineering aspect of problem-solving that isn&#8217;t quite related to analysis. Python questions can be very different for data engineerin&#8230;</p>
      <p>
          <a href="https://dataanalysis.substack.com/p/python-questions-for-a-data-analyst">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Correlation Analysis 101 in Python - Issue 35]]></title><description><![CDATA[How to read and run correlation plots in Python Pandas]]></description><link>https://dataanalysis.substack.com/p/correlation-analysis-101-in-python</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/correlation-analysis-101-in-python</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 10 Mar 2021 17:30:14 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello everyone! Do you realize it&#8217;s spring already? I&#8217;m almost ready to celebrate the holiday of flowers, but first: another data analysis practice for you today that will make your life easier (or at least more interesting, hopefully).&nbsp;</p><p>Do you ever receive questions like:&nbsp;</p><ul><li><p><em>Does correlation imply causation?</em></p></li><li><p><em>How do you prove if features X and Y are correlated?</em></p></li></ul><p>After this helpful guide, you will know the best way to answer those types of questions. In this article, I&#8217;ll focus on positive and negative correlation analysis and specifically cover:&nbsp;</p><ol><li><p>Practical use cases for correlation analysis.</p></li><li><p>A methodology for how to build a correlation table and a heatmap in Python Pandas.</p></li><li><p>How to read and interpret different heatmaps and correlation charts.&nbsp;</p></li><li><p>How to prove that correlation implies causation.&nbsp;&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RxTI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RxTI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!RxTI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!RxTI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!RxTI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RxTI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png" width="200" height="200" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4d60858b-2697-415a-b027-c1f293993bb0_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2197,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RxTI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!RxTI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!RxTI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!RxTI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d60858b-2697-415a-b027-c1f293993bb0_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://dataanalysis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://dataanalysis.substack.com/subscribe?"><span>Subscribe now</span></a></p></li></ol><h3>When do you need to run a correlation analysis?</h3><p>From a business perspective, correlation analysis helps you to answer questions like:&nbsp;</p><ol><li><p>What is the relationship between 2 features on your product app?&nbsp;</p></li><li><p>Are they dependent or independent? Do they increase and decrease together (positive correlation)?</p></li><li><p>Does one of them increase when the other one decreases and vice versa (negative correlation)? Or are they not correlated?</p></li><li><p>Does changing a price affect subscription creation?&nbsp;</p></li><li><p>Does an increase in comments affect reshares?&nbsp;</p></li></ol><p>There are multiple other analytical techniques that can help you tackle those or similar questions like hypothesis testing, decision trees, network analysis, matrix, or sorting. Correlation analysis is one of the more common ways to learn the relationship between 2 or more variables.&nbsp;</p><p>Correlation is represented as a value between -1 and +1 where +1 denotes the highest positive correlation, -1 denotes the highest negative correlation, and 0 denotes that there is no correlation. Below, I&#8217;ll demonstrate how to run correlation analysis using Python Pandas and read a heatmap.&nbsp;</p><h3>How to build correlation analysis</h3><p>Building a correlation chart in Python Pandas is very easy.&nbsp;</p><p>First, you have to prepare your data by having only numerical and boolean variables in columns (other formats will be ignored by the function). You don&#8217;t have to worry about missing/NULL values here, as the function excludes them. After that, you can simply run:</p><p><code>DataFrame.corr()</code></p><p>or</p><p><code>DataFrame.corr(method ='pearson')</code></p><p>This is for a DataFrame. You also can run Series.corr() to compute the correlation between 2 series.&nbsp;&nbsp;</p><p>DataFrame.corr() returns a correlation table between dataset variables. Here is an example of output from <a href="https://www.kaggle.com/olgaberezovsky/eda-using-python-pandas">Reddit Exploratory Data Analysis in Python</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2cXI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2cXI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 424w, https://substackcdn.com/image/fetch/$s_!2cXI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 848w, https://substackcdn.com/image/fetch/$s_!2cXI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 1272w, https://substackcdn.com/image/fetch/$s_!2cXI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2cXI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png" width="1456" height="415" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2cXI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 424w, https://substackcdn.com/image/fetch/$s_!2cXI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 848w, https://substackcdn.com/image/fetch/$s_!2cXI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 1272w, https://substackcdn.com/image/fetch/$s_!2cXI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4574f6-692b-4ec4-8cc9-492311f617de_1600x456.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We see that both the score and number of comments are highly positively correlated with a correlation value of 0.63. There is some positive correlation of 0.2 between total awards received and score (0.2) and num_comments (0.1).</p><p>&#128161;Note: this works for the Pearson correlation type, which is the most commonly used standard correlation. There are also Kendall and Spearman rank correlation types. You can specify the method of your correlation like this: DataFrame.corr(method ='kendall'). To learn the difference between these methods, I recommend reading <a href="https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/supporting-topics/basics/a-comparison-of-the-pearson-and-spearman-correlation-methods/">this guide</a>. But in a nutshell, we use Pearson to find a linear relationship between normally distributed variables. When the variables are not normally distributed or the relationship between the variables is not linear, we would use the Spearman rank correlation method.&nbsp;&nbsp;</p><p>Now let's visualize the correlation table above using a heatmap:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wyiH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wyiH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 424w, https://substackcdn.com/image/fetch/$s_!wyiH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 848w, https://substackcdn.com/image/fetch/$s_!wyiH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 1272w, https://substackcdn.com/image/fetch/$s_!wyiH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wyiH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png" width="1040" height="502" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7f21d012-3741-426a-952e-9e73f336a316_1040x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:1040,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wyiH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 424w, https://substackcdn.com/image/fetch/$s_!wyiH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 848w, https://substackcdn.com/image/fetch/$s_!wyiH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 1272w, https://substackcdn.com/image/fetch/$s_!wyiH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7f21d012-3741-426a-952e-9e73f336a316_1040x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It returns a correlation heatmap plot:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2FZi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2FZi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 424w, https://substackcdn.com/image/fetch/$s_!2FZi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 848w, https://substackcdn.com/image/fetch/$s_!2FZi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 1272w, https://substackcdn.com/image/fetch/$s_!2FZi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2FZi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png" width="1420" height="750" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/10816efa-a33a-4e86-a166-10a18c251435_1420x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1420,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2FZi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 424w, https://substackcdn.com/image/fetch/$s_!2FZi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 848w, https://substackcdn.com/image/fetch/$s_!2FZi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 1272w, https://substackcdn.com/image/fetch/$s_!2FZi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10816efa-a33a-4e86-a166-10a18c251435_1420x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The full analysis is <a href="https://www.kaggle.com/olgaberezovsky/correlation-analysis-using-python-pandas?scriptVersionId=56090846">Correlation Analysis Using Python Pandas</a>.&nbsp;</p><h3>How to read correlation charts:</h3><p>Each square shows the correlation relationship between the variables on each axis. As I said above, correlation ranges from -1 to +1.&nbsp;</p><ul><li><p>Values closer to 0 mean that there is no linear trend between 2 variables.&nbsp;</p></li><li><p>The close to 1 the correlation is the more positively correlated they are, the stronger this relationship is.&nbsp;</p></li><li><p>A correlation closer to -1 is similar, but instead of both increasing like the example above, one variable will decrease as the other one increases.&nbsp;</p></li><li><p>The diagonals are all 1 and marked dark because those squares are correlating each variable to itself (so it's a perfect correlation, therefore it is 1).&nbsp;</p></li></ul><p>Overall, the larger the number and the darker the color, the higher the correlation between 2 variables. The plot is also symmetrical and diagonal because the same 2 variables are being paired together in those squares.</p><p>Another example from my EDA analysis - <a href="https://www.kaggle.com/olgaberezovsky/predicting-titanic-survival-using-most-common-ml">Predicting Titanic Survival</a>. This plot is meant to show if there is a relationship between such variables as having children, parents, siblings, expensive tickets (fare), or specific age, all compared to passenger survival on the Titanic. It might be a grim example, but it&#8217;s a good one:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!30vi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!30vi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 424w, https://substackcdn.com/image/fetch/$s_!30vi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 848w, https://substackcdn.com/image/fetch/$s_!30vi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 1272w, https://substackcdn.com/image/fetch/$s_!30vi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!30vi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png" width="1402" height="774" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:774,&quot;width&quot;:1402,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!30vi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 424w, https://substackcdn.com/image/fetch/$s_!30vi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 848w, https://substackcdn.com/image/fetch/$s_!30vi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 1272w, https://substackcdn.com/image/fetch/$s_!30vi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae0762de-0313-4bd8-8a96-9013e96763c3_1402x774.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As you noticed, there are no high correlations in this graph. The highest value is 0.4 between parents/children and siblings. This relationship analysis guides us to the right input variables to create a ML model to predict passenger survival.&nbsp;&nbsp;</p><h3>How to prove correlation implies causation?</h3><p>In addition to the correlation table and a heatmap, for some of the business cases you should consider other factors based on historical data, events, user attributes, and business case specifics:</p><p><em><strong>Strength</strong></em> - a relationship is more likely to be causal if the correlation coefficient is large and statistically significant. This is directly related to the correlation table output data.</p><p><em><strong>Consistency</strong></em> - a relationship is more likely to be causal if it can be replicated.&nbsp;</p><p><em><strong>Temporality</strong></em> - a relationship is more likely to be causal if the effect always occurs after the cause.</p><p><em><strong>Gradient</strong></em> - a relationship is more likely to be causal if greater exposure to the suspected cause leads to a greater effect. This is related to positive or negative correlation. As I stated above, negative correlation occurs when one variable decreases as the other one increases.&nbsp;</p><p><em><strong>Experiment</strong></em> - a relationship is more likely to be causal if it can be verified experimentally. You can run hypothesis testing to prove it.</p><p><em><strong>Analogy</strong></em> - a relationship is more likely to be causal if there are proven relationships between similar causes and effects.</p><p>That&#8217;s it for now. Until next Wednesday! </p>]]></content:encoded></item><item><title><![CDATA[Generating a Word Cloud In Python - Issue 30]]></title><description><![CDATA[Learn how to make a word cloud for text analysis]]></description><link>https://dataanalysis.substack.com/p/generating-a-word-cloud-in-python</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/generating-a-word-cloud-in-python</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 03 Feb 2021 18:12:13 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Word or text clouds are very common tasks for analysts who work with textural, qualitative, or semantical data analysis. They are also common take-home assignments for candidates to test their knowledge of handling, processing, and visualizing text data. Below, I&#8217;ll showcase one of the ways to build a word cloud in Python.&nbsp;</p><p>There are many applications, tools, and libraries that can help you to generate a word cloud in mere seconds for free (you can check some of those below). That being said, as an analyst, you should be able to create your own visuals in either R or Python, both of which should grant you the freedom to tailor your dataset as needed. Pick a style and customization that works best for you!</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZPVs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZPVs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!ZPVs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!ZPVs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!ZPVs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZPVs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png" width="200" height="200" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2197,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZPVs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 424w, https://substackcdn.com/image/fetch/$s_!ZPVs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 848w, https://substackcdn.com/image/fetch/$s_!ZPVs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 1272w, https://substackcdn.com/image/fetch/$s_!ZPVs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F840f37ca-d18b-489a-8f3e-ed38838d8c19_200x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://dataanalysis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://dataanalysis.substack.com/subscribe?"><span>Subscribe now</span></a></p><h3>Word clouds - what and why</h3><p>First things first! You&#8217;ll need to make a word cloud if you want to visualize which words are used the most in your dataset. The more often a word is used, the larger it will appear in your cloud. Text clouds are the best option when you have to quickly find a pattern, insight, or note a frequency of words used in your data. This will be your first request for any Exploratory Data Analysis tasks with text data.</p><h3>Getting started</h3><p>For my analysis today, I am choosing the Python <em>wordcloud</em> package. We&#8217;ll use NumPy and Pandas for data processing.&nbsp;</p><p><code>import numpy as np </code><em><code># linear algebra</code></em></p><p><code>import pandas as pd </code><em><code># data processing</code></em></p><p><em><code>import seaborn as sns #statist graph package</code></em></p><p><em><code>import matplotlib.pyplot as plt #plot package</code></em></p><p><code>import wordcloud </code><em><code>#will use for the word cloud plot</code></em></p><p><code>from wordcloud import WordCloud, STOPWORDS </code><em><code># optional to filter out the stopwords</code></em></p><p>You don&#8217;t have to use <em>stopwords</em> to generate a word cloud. It&#8217;s advised to use them, however, in order to eliminate the text noise. You also can set a list of stop words to anything you like:&nbsp;</p><p><code>stop_words = set(['have', 'when', 'about', 'according', &#8216;who&#8217;, 'actually','zero', ''])</code></p><p>&#128161; Tip: if you are unfamiliar with the package and its functions or limitations, you can simply run <em>?WordCloud </em>to get its documentation.</p><h3>Prepare dataset</h3><p>Before we proceed with the cloud, we have to tailor our dataset to ensure the values are in an appropriate format.</p><p>First, we have to remove NULL values:</p><p><code>df["title"] = df["title"].fillna(value="")</code></p><p><code>Now, let&#8217;s add a string value instead to make our Series clean:</code></p><p><code>word_string=" ".join(df['title'].str.lower())</code></p><h3>and... Plotting!</h3><p><code>plt.figure(figsize=(15,15))</code></p><p><code>wc = WordCloud(background_color="purple", stopwords = STOPWORDS, max_words=2000, max_font_size= 300,&nbsp; width=1600, height=800)</code></p><p><code>wc.generate(word_string)</code></p><p><code>plt.imshow(wc.recolor( colormap= 'viridis' , random_state=17), interpolation="bilinear")</code></p><p><code>plt.axis('off')</code></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EbUs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EbUs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 424w, https://substackcdn.com/image/fetch/$s_!EbUs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 848w, https://substackcdn.com/image/fetch/$s_!EbUs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 1272w, https://substackcdn.com/image/fetch/$s_!EbUs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EbUs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png" width="1256" height="658" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:658,&quot;width&quot;:1256,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EbUs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 424w, https://substackcdn.com/image/fetch/$s_!EbUs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 848w, https://substackcdn.com/image/fetch/$s_!EbUs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 1272w, https://substackcdn.com/image/fetch/$s_!EbUs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8b80803-5407-4b0b-99c1-3cb9fb612b76_1256x658.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We set 2000 words limits for this cloud. Let&#8217;s try setting the limit to 50 and changing the background color:</p><p><code>plt.figure(figsize=(15,15))</code></p><p><code>wc = WordCloud(background_color="yellow", stopwords = STOPWORDS, max_words=50, max_font_size= 300,&nbsp; width=1400, height=800)</code></p><p><code>wc.generate(word_string)</code></p><p><code>plt.imshow(wc.recolor( colormap= 'viridis' , random_state=17), interpolation="bilinear")</code></p><p><code>plt.axis('off')</code></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KK0c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KK0c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 424w, https://substackcdn.com/image/fetch/$s_!KK0c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 848w, https://substackcdn.com/image/fetch/$s_!KK0c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 1272w, https://substackcdn.com/image/fetch/$s_!KK0c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KK0c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png" width="1456" height="873" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:873,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KK0c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 424w, https://substackcdn.com/image/fetch/$s_!KK0c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 848w, https://substackcdn.com/image/fetch/$s_!KK0c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 1272w, https://substackcdn.com/image/fetch/$s_!KK0c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F30b5f6d0-41ba-422f-aecc-1354c9e457be_1600x959.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The full code is on Kaggle - <a href="https://www.kaggle.com/olgaberezovsky/word-cloud-using-python-pandas">Word Cloud using Python Pandas</a>.</p><p>&#128161;You probably noticed I am using <em>imshow.</em> It&#8217;s a function from <em>matplot</em> package that transforms your data into an image. To set or change its parameters, follow this <a href="https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html">guide</a>.</p><p>That&#8217;s it for now. In one of my next issues, I&#8217;ll demonstrate using masks for generating clouds in a form of a star, circle, or any shape that you could possibly<em> ever want!&nbsp; </em>(Maybe.)&nbsp;</p><h4>Check out a list of my favorite go-to online word cloud generators:</h4><ol><li><p><a href="https://tagcrowd.com/">TagCrowd.com</a></p></li><li><p><a href="https://monkeylearn.com/">MonkeyLearn.com</a> - you have to create an account, but once you are set, they provide a lot of text semantical analysis.</p></li><li><p><a href="https://wordart.com/">WordArt.com</a></p></li><li><p><a href="https://worditout.com/word-cloud/create">WordItOut.com</a> - works the best with a cleaned text.&nbsp;</p></li><li><p><a href="http://wordcloudmaker.com/">WordCloudMaker.com</a></p></li></ol><p>Thanks for reading, everyone. Until next Wednesday!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://dataanalysis.substack.com/p/generating-a-word-cloud-in-python?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://dataanalysis.substack.com/p/generating-a-word-cloud-in-python?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p>]]></content:encoded></item><item><title><![CDATA[Data Analysis In Python - Issue 23]]></title><description><![CDATA[Podcasts, free datasets, and tutorials for using Python for data analysis.]]></description><link>https://dataanalysis.substack.com/p/data-analysis-in-python-issue-23</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/data-analysis-in-python-issue-23</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 16 Dec 2020 22:24:32 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb48fbe-823f-461c-a850-a7f17f49a51d_1548x544.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello readers, I&#8217;m back, as usual, with a weekly recap of interesting news and events in the data analysis world from the <a href="https://dataanalysis.substack.com/">Data Analysis Journal</a>.</p><p>Last week I had the pleasure to speak with a UC Berkeley student, Shalini, where we shared our experience, love, and pain of navigating through the sometimes-rough waters of data analysis. Check out <a href="https://www.youtube.com/c/ShaliniK/about">her YouTube &#8230;</a></p>
      <p>
          <a href="https://dataanalysis.substack.com/p/data-analysis-in-python-issue-23">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[5 Concepts Of Data Engineering Every Data Analyst Must Know - Issue 11]]></title><description><![CDATA[Today is Wednesday, and it&#8217;s time for a weekly recap of interesting stories and events in the data analysis world from the Data Analysis Journal.]]></description><link>https://dataanalysis.substack.com/p/5-concepts-of-data-engineering-every</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/5-concepts-of-data-engineering-every</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Wed, 23 Sep 2020 23:55:07 GMT</pubDate><enclosure url="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/cc9a68c5-aa0e-45c8-866a-01962aac33c2_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today is Wednesday, and it&#8217;s time for a weekly recap of interesting stories and events in the data analysis world from the <a href="https://dataanalysis.substack.com/">Data Analysis Journal</a>.</p><h2><strong>&#10024; </strong>Today we will be discussing:&nbsp;</h2><ul><li><p>Snowflake, a data analytics and cloud platform, is the biggest IPO in 2020.&nbsp;</p></li><li><p>Database tunning - what and why.</p></li><li><p>How to work with structured and unstructured data.</p></li><li><p>Hackathons round up -&#8230;</p></li></ul>
      <p>
          <a href="https://dataanalysis.substack.com/p/5-concepts-of-data-engineering-every">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Analysis: Exploratory Data Analysis Using Python Pandas and SQL]]></title><description><![CDATA[What is EDA? What is the easiest and fastest way to start? A walk through the most common commands in Python Pandas.]]></description><link>https://dataanalysis.substack.com/p/exploratory-data-analysis-using-python-pandas-and-sql-aa7cfb14cca</link><guid isPermaLink="false">https://dataanalysis.substack.com/p/exploratory-data-analysis-using-python-pandas-and-sql-aa7cfb14cca</guid><dc:creator><![CDATA[Olga Berezovsky]]></dc:creator><pubDate>Tue, 30 Jun 2020 22:16:00 GMT</pubDate><enclosure url="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ae72758d-3131-4079-bf28-3a74a61e0039_2472x1648.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Pandas Python library is becoming more and more popular between data scientists and analysts. It allows you quickly to load, process, transform, analyze, and visualize the data.</p><p>When you work with Pandas, the most important thing to understand is that there are two main data structures &#8212; Series and DataFrame:</p><ul><li><p>Series is a one-dimensional indexed array that&#8230;</p></li></ul>
      <p>
          <a href="https://dataanalysis.substack.com/p/exploratory-data-analysis-using-python-pandas-and-sql-aa7cfb14cca">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>