Why Today Is The Perfect Time to Learn Data | Seattle Data Guy
Interviewing Seattle Data Guy - how to transition to data engineering, stay motivated, and break out of tutorial hell.
Welcome to my Data Analytics Journal, where I write about data science and analytics.
This month, paid subscribers learned about:
Applying Statistics In Product Analytics - A deep dive into distributions, their types, use cases, and examples to help you decide which distribution to apply for your analysis or forecast.
SQL For Weekday Product Usage Analysis - How to get the frequency of user engagement using SQL for day-of-the-week analysis to understand cyclicality and usage patterns.
When Things Go South - Or why self-service is a myth, why it may introduce discrepancies and inconsistencies in reporting, and how it can damage your data culture.
I have been following and readingfor a long time, even before my iCloud storage maxed out with screenshots taken of his LinkedIn posts. I have adopted many of Ben’s data handling and processing concepts from his publications, videos, and tutorials.
There are now many data engineering influencers and experts. Still, to me, Ben stands out from most as he brings theory into real-life scenarios with a detailed, unbiased, and thorough approach to problem-solving. I also enjoy how structured and easy-to-read his content is and how it’s tailored to all levels of analysts and engineers. His Substack is on my list of top newsletters about data and analytics.
While I disagree with Ben on a few concepts (come on, analytics is way more impactful, challenging, and interesting than data engineering. And it’s the best possible career growth!), I have long been a fan of his newsletterand YouTube channel (Seattle Data Guy).
This week Ben is celebrating his 100th newsletter 🎉🎉🎉, and I’m so excited to interview him for my journal! 🤩
Ben Rogojan is a data infrastructure and engineering consultant. He helps his clients set up and improve their end-to-end data processes and workflows as the Seattle Data Guy. This could be helping clients migrate to the cloud, improve old infrastructure, or help create data sets for ML models. Before consulting full-time, he worked at Facebook, a healthcare analytics start-up and a large healthcare provider. He also loves sharing his thoughts on his two newsletters (one on data infrastructure, the other on starting a data analytics consulting company) and YouTube.
For analysts transitioning into data engineering, what skills and qualifications do you think are a must today?
There are a few key skills I’d recommend an analyst pick up if they want to transition into data engineering.
To start out with, you’ll likely want to improve your SQL (or at least think about SQL from a more structural perspective vs. purely an ad-hoc one) and learn to program; Python is fine unless you plan to work somewhere that uses Scala or Java and then from there the key is understanding data warehouse and data pipelines. That’s usually enough to get you the job. From there, you’ll pick up so many other skills!
What’s your take on analytics engineering? Do you see it as a transitional role tied to specific tooling/strategy and isn’t easily plugged in at any company or, rather, potentially its own domain that will go into its further specializations soon?
I find that analytics engineering captured the fact that data engineers often fall on a spectrum in many companies and teams. Those heavily technical in spinning up Docker containers prefer Scala over Python. On the other side, some data engineers tend to focus more on data modeling, Python, SQL, and how to make the data easy for analysts to use. The former is where I find analytics engineering has stemmed from.
In terms of its longevity, I think that function will always be required, agnostic of tooling or title.
You pivoted from a full-time data engineering role into a consultancy. I did the opposite - after 5 years in consulting, I transitioned back to full-time. I miss the opportunity to work with different tech stacks, but I found consulting rather challenging overall. Any regrets?
I don’t have any regrets.
Transitioning from a full-time data engineer to a consultant has opened up many opportunities for me. Both in terms of learning opportunities (like getting better at selling projects and ideas) that have helped round out my skill set to getting involved with projects in various industries.
Overall, it’s been a fun journey with a whole new set of challenges:
What was the most challenging project you ever had to work on?
I think one of the more challenging, only in terms of logistics, was a project where I had to work in a heavily regulated industry. Actually, since I didn’t have the proper credentials to work in said industry, I had to essentially write code and Airflow DAGs on my computer and have someone else deploy it on the actual system it’d be tested in. It was like playing a game of operator sometimes. Having to tell a person how to enter Vim, run a script, or start up a docker container (they were a Windows person). It really was more of a test of patience for everyone involved.
Every year learning data resources become more accessible, affordable, and better. Today data enthusiasts can learn new, highly demanding skills for free. This wasn’t the case for us years back when we entered the industry.
Do you feel the quality of new data people changes as well?
It’s honestly amazing seeing all the new resources available to learn about data, software engineering, and just about every other technical skill set. I think the potential to accelerate your career is far better than it ever has been. Even about a decade ago, when I started in the data world, you’d have to put together your own courses on data engineering essentially by finding a Java intro from the NewBoston and a SQL course from WiseOwl while picking up a book here or there.
But now you can find fully written guides, and you have data engineering projects from great creators like Darshil, and BigQuery and Snowflake both offer free data sets (back when I started, you had to spin up a SQL Server or Postgres instance and find some CSV to load into it).
The only challenge now is keeping motivation. I have written about it in the past, the curse of tutorial hell:
“We have all fallen into tutorial hell. More than likely, for most of us, it all started with that first Python tutorial or advanced SQL course.
To some degrees, it is part of the learning process; the constant repetition of the same basics over and over again until something sticks. But to really allow the new subject a permanent residence in our minds, we must find methods such as accountability groups and projects to ensure we don’t lose what we just learned.”
So I imagine at this stage, we probably have more people becoming analysts and data engineers because the barrier to accessing training, tutorials, and information has lowered. I would say I have noticed what feels like a broader set of individuals who write SQL and a little Python. I can’t truly speak on quality, as it’d purely be anecdotal.
What are common mistakes you notice analysts (or data engineers) often make?
Attacking the technical problem without understanding the business problem - One of the common issues I see a lot of junior and senior engineers/analysts is jumping on the technical problem vs the business problem. Now generally, if you’re a senior level, you probably do this less often, but I do think it’s an easy trap to fall into. Perhaps you just learned a new skill or have some recency bias in terms of how to solve a problem. Instead of pausing and asking the business what they actually need, you may just jump into building which is a gamble in terms of whether or not you will deliver what is expected.
Building systems that are hard to maintain - It’s really easy today, with all the various tools and technologies, to want to use everything even if they don’t actually solve the problem. It doesn’t help that job descriptions seem to push this narrative that you need to have 30 different technologies to be hired.
Not having some form of data analytics process - One thing I learned early on is that if you don’t apply some form of analytical process, you are likely to create an incoherent end product. Either the product didn’t go deep enough, failed to answer the real question, or never got a conclusive end.
What would you do differently if you had to start your career over?
I think the one thing I might have done differently is to spend even more time on the fundamentals.
The tech world is one where you constantly have to keep up with changes, at least in terms of understanding how they impact your daily workflows. And honestly, the more firm a foundation you have, the easier I believe you can pick up new technologies.
Spot on. While interviewing, I ask candidates their SQL level from 1 to 10, with 10 being the most advanced. Interestingly, 6-7 years ago, most candidates put themselves at 5. Now, many candidates rate themselves at around 8-9 and yet still fail to explain window functions or SQL cost optimization practices.
There seems to be a constant desire to rush through the learning process. There are plenty of 12-week bootcamps and accelerator courses to learn new technology. But our careers are often pretty long, and I don’t think there should be a constant rush to speed through learning.
If you can, take your time and enjoy learning. Because at a certain point, you’ll be expected to learn new technologies, manage projects, get buy-in for next year's budget, and grow a team. All of which will become challenging if you don’t already have a solid base in fundamental data, programming, and analytics concepts.
Is there anything else you want to share to encourage or inspire people to learn data?
We are entering an exciting time in the next 5-10 years. Many people ask me if I think data engineering will be around in 10 years, and that’s too long to predict anything. I think there will always be a need to manage data, and the data we are being expected to collect and manage is growing in all senses of the 5Vs of big data. So even if we have LLMs and some generative AI, I just see them as tools (at least for the next few years) to assist us.
After that, who knows?
Thank you, Ben!
Check some of Ben’s most popular articles:
Is Everyone's Data Infrastructure A Mess? (the answer is yes, but no).
And videos from the Seattle Data Guy channel:
Make sure to subscribe to Ben’s newsletter to learn more:
Thanks for reading, everyone. Until next Wednesday!