5 Ways To Evolve Your Career in Data Science
When I first learned about data science in 2016, I was fresh out of college with a background in statistics and a minor understanding of computer programming. I was lucky enough to have the opportunity to jump into a career path that required no resources other than a computer connected to the internet. Getting started with understanding how to explore, visualize, manage, predict data was not the part I struggled with the most, but rather the overwhelming feeling of not understanding which strategies to use. Self driving cars, crypto, recommendation systems, deep learning, and other data driven solutions to problems have proven to work in some aspect or another, but when do you jump into those technologies? Are you the only person who doesn’t understand how these things work under the hood?
When I had opportunities to meet with data scientists at other companies, the questions I was most interested in were pre-processing questions like “How do you handle unknown values for making model predictions?”, and “What is the best model to use?” Frustrated by the repetitive answer “Well, it depends”, I thought either they didn’t know what they were talking about, or they weren’t actually interested in my question. Little did I know those weren’t the questions I should have been concerned with because they wouldn’t solve the problems I was working on.
Looking back at my career in data, I’ve come across a lot of different domains using different methods of utilizing data to solve problems. For those who don’t know where to start and feel like they must jump immediately to complex models, or understand the secret sauce to building a great machine learning algorithm, here are my suggestions of how to establish a strong career in the data sector.
1. Get Comfortable Handling Data
I was of the belief all data was in a tabular format, packed in spreadsheets in a nicely formatted fashion. I didn’t even know there were such things as NULL values. Databases, flat files, images, structured vs unstructured, whatever the source may be requires a familiarity to producing outputs at a high level. Data has a flow to it. Not all datasets are equal to each other. Over time, you’ll notice data from different sources have assumptions about it. Customer survey data and sports data can end up with similar problems, but there is small overlap in data similarity. Just like if you were to go to an Italian restaurant, you know what items would be on the menu vs if you were at a Mexican restaurant. Both are delicious, but in their unique ways.
This is also extremely important because too often datasets are provided to users in a clean, structured format. This is rarely the case in professional fields. Most of the time, data coming from sources comes in a logging form. These logging forms need to be parsed through to analyze appropriately. The better you are at working through these kinds of problems, the faster you’ll get to work with the data to derive insights.
2. Try To Run Into Coding Errors
I am convinced the best programmers in the world are the ones who have seen the most errors. The only way to get better at something is through practice, and practice entails you will not get everything right the first time. Those senior level programmers who just seem to be able to crank out a ton of great code all the time didn’t start out that way. Those programmers aren’t naturally talented code writers, but rather they have written so much code before they know what will run into errors and what will not. They see an error in the terminal and understand what they did wrong not because they memorized the whole software, but they’ve run into that same problem before and know how to get around it.
This isn’t exclusively limited to debugging errors you see in the console either, this also includes architectural errors. A data ingestion process into a database that takes way too long because they queries aren’t optimized is a great example. There might not be any errors produced by the interpreter, but it might be a process needing to be optimized.
Debugging is a natural part of any role in programming, and it is a skill to become comfortable responding when things don’t go exactly the way they were intended to. These problem solving skills will roll over into your every day life as well.
Evaluating when something is not working, figuring out what is going wrong, and fixing it, is something all the best programmers do. Problems you’ve never seen or heard of will come up. The better you are at recognizing what to do next will lead to faster turnaround and more secure coding.
3. Fundamentals First
This is one of my favorite memes ever.
I promise, you do not need to understand how ChatGPT works right off the bat. Don’t understand how embeddings work in multi-dimension vector space? Not a problem. Those people who are building these advanced models had to go through the same progression of machine learning you have to. The best engineers in the world are able to create these new models because they have built standard machine learning models in the past, and even more importantly, know the pros and cons of each model.
When you start off with simpler models and work your way up, the much more complex models will make so much more sense. It’s fun to grab a machine learning model and figure out what it does but not understand how you’d use it in an application. Instead, find the problems using fundamental math and statistics to solve the problem, then find how a ML model might speed up the problem solving as well as find a more accurate solution.
4. Look For Marketable Skills
Having valuable skills comes from demand of other companies.
If you want to be hired in the data field, you need to have the skillset employers are looking for. Do you need to know Spark? Hadoop? Database management? Visualizations? NLP? There are tons of different areas of expertise with no true guidance of what is necessary to become a great data scientist.
The old saying “Dress for the job you want, not for the job you have” can be tweaked a little. For those in the data science field, it should be “Search for the jobs you want, not the one you have.” If you look for jobs at companies you want to work for, or the title of jobs you would love to be, look to see what their requirements are to be one. If you consistently see a programming language for these roles, you should probably grab a textbook and start learning it. If there is an application listed, you can try out a free trial.
Climbing up the ladder to where you want to be isn’t always about overall experience, but it’s about having experience in the right area.
5. Participate Outside Of Your Domain
Too often we want to stay comfortable in what we know and understand and don’t want to voyage into the dark. Think about building an application. There is the data engineering, data management, and data science aspects to it. For a lot of people, staying in their corner of an app and not digging into the other parts of the app makes the most sense. It seems natural just to just request from other team members for data and then pass along the results.
If you really want to continue evolving in the data sector, you have to understand the full A-Z of how data is being handled, transformed, manipulated, and distributed. Now, that doesn’t mean if you’re a data engineer you need to understand how to build ML models. If you’re a data scientist, you don’t need to be building servers to host applications on. The idea is to understand how all the puzzle pieces fit together. The more you understand the entire data ecosystem, the better you will identify problems that will arise, and the more likely you will want to participate in multiple areas of the data lifecycle, exposing you to new challenges and improving upon your existing data skills set.
Data science is like any other tech sector — you have to continue to learn as time goes on or you will get left behind. People believe just knowing as many algorithms as possible is the key to being a great data scientist, but over a career in working you’ll soon realize the people you continuously seek help from are the ones who are not just one trick ponies. The best in the field have explored different areas of software engineering as well as figured out what makes a great application from top to bottom.