5 Concepts Of Data Engineering Every Data Analyst Must Know - Issue 11

Today is Wednesday, and it’s time for a weekly recap of interesting stories and events in the data analysis world from the Data Analysis Journal.

Today we will be discussing: 

  • Snowflake, a data analytics and cloud platform, is the biggest IPO in 2020. 

  • Database tunning - what and why.

  • How to work with structured and unstructured data.

  • Hackathons round up - a call for participants.

    Subscribe now

📚 Weekend Longread

Snowflake, a data analytics and cloud platform, is the biggest IPO in 2020. 

I didn’t plan to cover this whole “Snowflake boom” in my journal. I used the product, I appreciate it, and the challenges it solves, and yet I am confused about why it generated so much fuss, and what is so extra special about it. It is expensive to use, and it doesn’t do anything extra what Google BigQuery, Vertica, Databricks, Cloudera, or Amazon and Azure products provide.

Anyway, founded by ex-Oracle data architects, Snowflake made its loud debut last week, reporting over $100 million in revenue in 2019! This makes it one of the most successful SaaS businesses. Read a deep dive analysis to learn what features and use cases they offer, how they scale and centralize storage, how they win (and not) against their competitors. 

🔥 What’s new this week

  • Product Development. You would think that developing a product overseas and merging it into the USA hot market is a good bet (at least cheaper and safer), but it doesn’t always work that way. Read a true story of mistakes to avoid from a founder who successfully started his company in India, proved the concept, and made a profit, but wasn’t able to bring it into the US enterprise market despite having investors and coaching support. 

  • Data Analysis. Read this quick guide to learn the differences between MySQL to PostgreSQL, and how to migrate the app from one source to another. 

  • Data Engineering. Learn about database tunning - why data engineers do it and what problems it solves:

🏆 Nailed It

Be prepared for your next interview

As you know, I am a data analyst. That being said, you probably noticed that I put a big focus on data engineering in my newsletters and often cover topics and news which might not be super relevant for classic data or product analysis. I do this on purpose. One of my values that I try to foster within my team is “data quality is a priority”. 

It’s a pleasure to work with analysts who are tech-savvy and have a deep understanding of their data sources and ability not only to extract fast the data they need but to lead and drive their data pipeline optimization, or data snapshots refresh, or data aggregation practice to ensure the “ingredients” they use for “cooking” are fresh and good. 

To lead with these initiatives, a data analyst must understand:

  1. Types of data structures - data can be structured and unstructured. Working with structured data you would use SQL to extract and format it. Unstructured data can be stored in NoSQL databases (MongoDB, Cassandra, DynamoDB, Regis, or Graph databases) or data lakes. You have to know a programming language to extract and read unstructured data. Especially, if you want to work with predictive analytics.

  2. The basics of data architecture or model - the cloud and analytics stack at your company can get quite complex. As a data analyst, you should know your data sources, the purpose for each, and destinations:

    1. Data lake (Apache Spark, Amazon S3, Presto)

    2. Data warehouse (BigQuery, Amazon Redshift, Snowflake, Vertica)

    3. Datamart (the same as a data warehouse, but for a smaller portion of data that is related to a particular business domain) 

Taken from here.

  1. Types of ETL - understand your data flow, either through the batch load or streaming. Know the refresh frequency, filters, attributes, etc. 

  2. Concept of memory and cost in RDBS - be mindful of your SQL running in production. Make sure you follow the right query practice. 

  3. Version control (Git) and CI/CD pipeline - understand who, when, and why changed the code, where events are coming from, how often they are refreshed, etc. 

🎓 Level Up

Certifications, internships, schools, and courses.

Hackathons round up:

🍸 Drink and Mingle

Upcoming free events, meetups, talk, webinars

AI for Good Global Summit kicked off this week, but you still can join the workshops, listen to keynote speakers, and learn how AI influence and can drive gender equity, the future of food, and collective pandemic intelligence. 

That is all for today. Make sure to subscribe to receive updates.

You can check other publications in the Data Analysis Journal as they are published. Otherwise, until next Wednesday!