You started a new project and are excited to put your new PyTorch skills to use, or perhaps you want to build the most advanced AI system that solves every problem your customer has.
That’s the feeling we all get when learning new things.
The fact is when you onboard to a new project you get handed over API keys for Google Cloud and AWS, gets you the name of the bucket where the bunch of pdf are sitting and you are getting asked to analyze the data as “ML Engineer”, but to analyze the data you need to move the data from those buckets, clean the data, organize it and set a system that updates itself.
This is an everyday thing for everyone that worked in the data industry. In fact, most Data Engineers I know started as ML Engineers or Data Scientists. The reality is that the curated data you have for ML projects is not how real work is done, and it is only a small percentage of the work you will do, especially at the early stages of your career.
If you got into a company that separates responsibilities, the intersection between ML Engineer and Data Engineer is still important.
This 2015 diagram outlines the requirements and needs for a machine learning system to function. Many of the requirements require Data Engineering skills, as they involve managing data and joining pieces together.
Yeah, the code was never the problem.
As you grow in skills and seniority, you may step down from being a full-stack developer, but your communication across teams will be demanded even more, and you need to effectively communicate with other teams, identify blockers, gather requirements, and different aspects of the data lifecycle, which means your data engineering skills become even more important.
Understanding the entire data cycle helps you grow. Data Engineering is one of those skills you shouldn’t miss out on learning if you want to build a long and successful career.

