Welcome to Issue 10 of my Data Analysis Journal newsletter, where I write about data analysis, data science, and business intelligence.
If you just joined us, this is a weekly newsletter to summarise the events and news of the last week in the data analysis world and to cover the new articles and helpful materials recently published in my Data Analysis Journal.
✨ In today’s newsletter, we’ll be reviewing:
Launching a new section - Ask Away!
How Facebook accelerates SQL on a massive scale.
Database of Databases - online encyclopedia of all databases.
Practical SQL guide for data analyst interview.
Let’s get started!
📚 Weekend Longread
Read how Facebook accelerates SQL at a massive and extreme scale. Its data architecture must be flexible and sophisticated enough to support fast queries and data storage.
“Facebook has long been a proponent of open-source software, and it has generated its share of open source projects in the big data space. Apache Hive and its follow-on, Presto, both emerged from Facebook, as did RocksDB and Apache Cassandra. GraphQL is one of a number of developer tools that came out of Facebook, along with PyTorch.”
🛎️ Ask Away
I see all your questions and will do my best to answer as many as I can. For now, I aim to periodically post some common questions I receive from my subscribers. My first Ask Away post covers SQL interview challenges, A/B testing, and Portfolio questions.
🙏 A small favor: if you have a question, please don’t send it to me over LinkedIn, it gets overloaded with messages, and I won’t be able to get to them in a timely fashion. You can either comment or respond to this email or send it to email@example.com. I answer much faster if you attach cat pictures.
🔥 What’s new this week
National Cancer Institute organized a series of events about how new visualization technologies and data analysis influence and develop cancer research. Scroll down here and watch Cancer Moonshot leaders and data experts speak about cancer data visualization.
Amazon Redshift now can be accessed using the built-in Data API. No need to manage database connections and credentials anymore. DevOps teams should be happy now.
Did you know there are 715 total database systems? Here is an online encyclopedia of all databases, and it has the best name - Database of Databases. Interestingly, the most common programming language used is C++ and Java.
Database technologies remain a hot sector and it will grow as our data grows. Just think about it, there is more data created every hour in 2020 than in the entire year of 2000. I am convinced the demand for data engineers and ML engineers will grow at least twice over the next few years. Gonna go learn Scala or Julia or something.
Taken from here.
🏆 Nailed It
Be prepared for your next interview
Today, we’re going to focus on SQL a little more. The best way to practice SQL is in your local database. For this, you can install the PostgreSQL database and use the DataGrip client (free 30-day trial!) with it. This gives you the flexibility to design the schema the way you want and practice SQL from anywhere at any time.
❗ This part is very important: whatever SQL syntax you choose, make sure you have respect for your reader and follow a proper SQL style. Going through some bad SQL code formatting is a special type of pain.
Also, here is a roundup of my favorite Medium articles about SQL:
🍸 Drink and Mingle
Upcoming free events, meetups, talk, webinars
Sep 17, Algorithmia: Eight must-haves for MLOps success
Sep 22, Lakehouse: The new approach to managing data
Sep 23, Introduction to Python
Sep 23, InfluxData How to Use InfluxDB and Grafana
Sep 24, Wanted: Breaking Data
Sep 25, Harvard: Bayesian Statistics, ML and Our Future Democracies
Sept 26, OpenMined: Data Privacy Conference 2020
Sep 29, Lakehouse: Leveraging Delta Lake for High-Performance SQL and Analytics
Sep 30, Ray Summit: Scalable ML and Python
Thanks for reading everyone❤️. Until next Wednesday!