Discover more from Data Analysis Journal
Expert Insight: Haki Benita - Choosing The Right Tool For The Job
An insightful look into skills, qualities, and values, favorite applications, and the future of structured databases
A few years ago I was researching SQL cost optimization tips and ran into an article about materialized CTEs - Be Careful With CTE in PostgreSQL written by Haki Benita. I believe it was back in 2018. His simple explanations caught my attention, and since then I’ve been a fan of his writing, now following, learning, and bookmarking many articles he wrote. His blog over the years taught me best practices of treating data, pitfalls to avoid, things to look for, and shaped me into an analyst with a thorough sense of data foundation and due diligence.
After he published his recent article Practical SQL for Data Analysis I reached out to him and, hey, guess what? Today I’m excited to share my interview with him!
Haki Benita is a software developer and a technical lead. His expertise is in databases, web development, software design, and performance tuning. He writes about his unique perspective and opinions offering solutions to real-world challenges. Aside from writing, he is also stays busy with coaching, giving private talks, developing tutorials, running online training sessions for O’Reilly, and much more. His publications have been republished and featured on many sites including Hacker News, Real Python, PopSQL, and others.
Make sure to check his publications - https://hakibenita.com
If you’ve been reading this resource for some time, you might notice that I tend to put significant emphasis on data engineering in my newsletters, because I believe that “data quality is a priority”. It’s a pleasure to work with analysts who have a foundational understanding of their data sources and the ability not only to extract their data quickly, but to also lead and drive their data pipeline optimization, snapshot refresh, or data aggregation practice to ensure the data they use for analysis is the right quality.
In this interview, we speak about qualities “data people” should have, skills to develop, favorite applications, and the future of structured databases.
When you interview candidates for your team, what qualities and skills do you look for? And what would be a red flag or a stopper for you to proceed with a candidate?
I mostly hire junior developers. We have a very specific way of doing things, and I don’t expect someone that comes for an interview to have the same mindset. I look for basic intellectual capabilities to understand concepts, but more importantly, I want to find people who have critical thinking. This is very important. Also, I look for people who can learn by themselves.
I think with these two capabilities I can turn you into a decent developer.
There are a few red flags. I’d say in my team we talk about the “no problem” expression. When you take even a simple project and try to automate, scale, and prepare for production - it’s never simple. There are a lot of aspects of development work that most people aren’t aware of. The code should not only be working, but it has to be readable, testable, and maintainable. When I write a code, I consider it an investment because I know I can come back to it and read and rerun it. If a developer scopes the project and says “no problem” it’s a red flag for me. You have to be aware of different aspects of the challenge.
Vanity - people can’t know anything. A developer has to be humble, there is always something to learn. There is often a better way of doing something than you have in mind. You have to be open to opinions.
What are common mistakes you notice analysts (or data engineers) often make?
I wrote a lot about common mistakes - in SQL, and everywhere. If I had to pick one thing I’d say - choosing the right tool for the job. My last article reflects my view on tutorials and analysts using tools they are familiar with, for example, Julia, Pandas. They use tools that they are familiar with to save energy or time to learn SQL. That said, often doing within SQL would be faster, more efficient, less expensive. The same goes for using excel. Remember a joke about excel 1.2M rows? You can load that much data, and analysts still can be frustrated that they can’t load more than 1.2M rows and can’t work with it.
If you had to start your career over, what would you do differently?
Very interesting question. Looking back, I don’t think I would change anything. During the rough part of my career, if you told me I will benefit from that bad experience, I wouldn’t believe it. But looking back, even at places where I was not happy, and things I have done that I wasn’t particularly interested in, in retrospect, I see the benefit.
All these places gave me a broader view of the industry, and made me more aware of different types of challenges, different people I work with. Being a manager for none data-focused developers, made me more aware of all the challenges. And I think that I can now provide them with better solutions, as someone who has worked with data and can bridge many gaps offering solutions and knowledge.
I started rough, and it was a good place to start.
Do you have a favorite tool, application, or software you use a lot?
My philosophy is that I try to not be dependent on 3rd party tools. I’m careful with the tool I decide to use. For example, in Python, I rarely install 3rd party libraries. I usually try to use built-ins as much as possible. If I need, for example, simple cashing, I’d do it in a database and would not install an instance of Redis. I think dependencies are expensive, and you must have a very good reason to introduce dependencies into your infrastructure or workflow.
I use a lot of Postgres, I like to use CLI, and use a lot of SQL.
Back when I was using Oracle, I liked to use SQL*plus and miss a few nice features. Like the BREAK command that allows you to clear GROUP BY and prepare a nice report. Ha, maybe I’ll write soon about things I miss from Oracle. I also miss Optimizer hints, and I am not a fan of the information schema in Postgres. I can never find anything and need to go to documentation. Oracle has dictionary tables with a naming convention that is easy to use. It’s not easy to find it in Postgres.
... Oracle still has many fans surprisingly.
I keep hearing about many large companies and corporations moving off Oracle into hosted environments in the cloud. It’s an interesting shift, big organizations are moving from Oracle with all expensive licenses to PostgreSQL. I believe, PostgreSQL will get a lot of traction in the future from corporate clients. Corporate clients and applications need a different set of features.
Can you share the values you foster within your team?
I foster critical thinking. This is one thing I advise people to have.
It’s annoying from a manager’s perspective because once you foster it, your team starts to question everything you do. While it is annoying, it makes people better developers. Questioning things leads to a better understanding of how things work, and what they are trying to achieve. And questions lead to answers. Questions help to capture things or issues early in the process.
Anything else you want to share to encourage or inspire people to learn data?
You should learn SQL. This is usually the number #1 asked on interviews and must language to learn, and lasts the longest time. Every job that touches data requires SQL, regardless if you are a developer, analyst, admin, data engineering, or ETL developer. Whatever you do, if you have to touch data, make data-driven decisions, you need SQL. If you have to choose one skill that should go with you - invest in SQL.
... you don’t believe in the unstructured database shift that’s become very popular now?
I do have use cases where I use unstructured data, and that’s completely fine. But at the end of the day, a database is all about structure. You want constraints and the relation between tables, otherwise, you can’t trust the data.
One of the reasons Mongo didn’t survive the “ test of time” is because Mongo didn’t force you to some structure. The database is usually the “last level of defense”. You can do whatever you want on the application level, but when you want to store the data and want the source of truth for data, it is reliable. Will databases change in the future? No doubt. But do I think the databases are going to enforce less structure? No.
Here are some of my favorite articles from Haki Benita:
Thanks for reading, everyone. Until next Wednesday!