This is an intense and stressful week. While we are all busy watching election results, data analysis work never stops - in fact, it’s more important than ever. I’m here, as usual, on Wednesday with prepared stories and news for you from the Data Analysis Journal.
✨ In today’s newsletter:
Analytics at Netflix: team, projects, verticals, and mission.
Predicting election results - did it work?
Free courses, certifications, and mentorships to learn data science, ML, Python, and SQL.
📚 Weekend Longread
Read the recent post from Netflix Technology Blog about Analytics at Netflix: Who We Are and What We Do. It’s an interesting view of Analytics and Visualization Engineering at Netflix, how they structure the team, what their business verticals are, how they define analytics, and what their common projects and responsibilities are.
🔥 What’s new this week
Excel introduced new custom live data types. Microsoft brings in more than 100 new data types! This is currently available for Office beta testers in the Insiders program. Well, they didn’t have another choice after that incident when scientists had to rename human genes due to auto-formatting issues in Excel. True story.
Check out this video of the most popular databases from May 2006 up to today. In October 2020, the most used and popular databases are Oracle, followed by MySql, and then SQL Server. Who knew?
If you are looking for a rich textural dataset for NLP or other ML, check these 100,000+ books in plain text format (37GB). You are welcome.
Google Ads turned 20 last week. For 20 years now, marketers use Google ads to promote their businesses via campaigns and ads. Only one more year until Google Ads can buy a drink at an American bar!
🏆 Nailed It
Election prediction
Now that the US election results are known (almost!), it’s interesting to see which prediction techniques and methods were the most precise.
To start it off, here was the forecast from Nate Silver:
Data and code behind the articles and graphics at FiveThirtyEight
Another prediction GOV-1347 from data scientist YAO YU also favored Biden with 301 electoral points:
Joe Biden also was leading with 85.9% from the 2020 Battleground 270 model running 25,000 daily simulations:
Here is the data from The Economist from their modeling, showing Joe Biden likely to beat Donald Trump in the electoral college:
And, JFK Forecasts also reported Biden winning over Trump:
Each of those predictions is based on (a) polls and (b) available public and social demographic data. Predicting this year’s election was even more challenging due to 16 states being in the “toss-up” category.
What’s interesting here is the various ways that the aggregated data was represented through different data visualization methods: maps, bar charts, histograms, line plots. Any site or publication showing poll data at this point in the year wanted to be able to show information in a digestible and clear way, which is the foundation of data storytelling. A simple comparison can be shown in many different ways. And, I think Nate Silver’s approach of visualization nailed it.
Although no model can predict the future with 100% accuracy, this period of time shows how unique and varied data visualization can be for a broad audience.
🎓 Level Up
Certifications, internships, schools, and courses.
Programs and courses:
If you are looking for data science courses, you definitely should check Kaggle programs. They offer Python, Into into ML, Pandas, Data Visualization, Intro to SQL, Advanced SQL, NLP, Computer Vision, and many other courses. The programs are free and you can earn certificates.
Also, watch their YouTube channel - Getting to know the Kaggle Grandmasters. These are some very smart people who got top placements in Kaggle competitions.
If you live and Chicago and identify yourself as a woman, trans person, or non-binary person and want to become a data scientist or data engineer, join this 9-month program. It focused on technologists in the first 1-5 years of your technical career with experience and interest in data science and engineering. This application is due November 15th, so hurry!
Mentorship:
Don’t miss an opportunity to sign up for a 1-1 GitHub mentoring session with GitHub employees. Applications are due on November 9th. Applications are open until the end of this week.
Check out WooTech, a mentorship platform for women in technology that guides them in their careers. It is open to everyone including males, students, working professionals, or just anyone curious about technology.
🍸 Drink and Mingle
Upcoming free events, meetups, talks, and webinars.
Nov 7, Kaggle Days Meetups: Using TPUs for computer vision
Nov 10, InfluxDays: The impact of time-series data
Nov 10, Verta: How to make any Python-based ML model reproducible
Nov 11, WAIE: Women in AI Ethics™ Asia-Pacific Summit
Nov 12, DSS: Become a Data Science Superhero with Python
Nov 17: Bay Area Chromatin: ML In Epigenomics
Nov 18, Anaconda: Data Exploration, Visuals, and Dashboards Using APIs
Nov 19, WiMLDS: ML For Climate Change
Nov 19, WomenHack Virtual
Nov 20, SWD: Accelerating ETL for Recommender Systems
🙏 Do Some Good.
It pays back.
I have a favor to ask. As my audience grows, I’d like to better understand how this journal can stay interesting and help you to become a better data analyst, data scientist, and progress in your career. I’d appreciate it if you could take a moment to fill out this brief survey. (And thank you to those who have already done it!)
⚙️ Try It Out
It just might solve all of your problems.
Check out Luckysheet, an online spreadsheet like Excel that is powerful, simple to configure, and completely open-source. This is life-saving for people who don’t like Google Sheets but can’t deal with Microsoft Excel either, and looking for something fast, simple to use, and large enough for data volume.
Thanks for reading everyone. Until next Wednesday!