Today, data is everywhere and more accessible than ever. But sometimes data can be overwhelming, and it is only really useful if the analysis is done properly to ensure useful insights. That’s where a Data Scientist comes in.
To understand more about what a Data Scientist does, we interviewed our Lead Data Scientist at Codec, Ekaterina Volkova Volkmar - or how we know her, Katya.
TB: How long have you been working as a Data Scientist?
KVV: I have been working as a Data Scientist for three years, from when I finished my PhD at the end of 2014. Although my role wasn’t exactly the same in previous jobs, the responsibilities were very similar. During my PhD I essentially studied Data Science too - that’s how I learned most of the skills needed for the role.
TB: What are the main responsibilities involved in your role?
KVV: One of the main responsibilities is coordination between the needs for new research and the value it can add to the product, and how to implement these needs. When new ideas are being discussed, it is important to build a plan with hypotheses, timelines, and alternatives, and to make sure everyone involved knows what their respective tasks are. Keeping close contact to other teams is crucial, as it is easy to dive into theoretical solutions and miss the big picture, including the goal in sight - the product. I end up writing code every day too, which I love!
TB: What is a typical day like as a Data Scientist? Run us through what you do at Codec.
KVV: Every day usually has several aspects to it - workshops and brainstorms with just a few team members or most of the Codec team, depending on the problem; regular meetings with the Data and Development team where we update each other on the progress of our current projects. But the largest part of the day is still research and prototyping - it can be about auditing a data set for its quality and other properties, or training a model, or documenting a piece of code to hand it over to our development team.
TB: In an average work week, what problems typically arise? How do you go about solving these problems - is there a certain protocol or plan of action in place amongst your team?
KVV: When we don’t keep our terminology extremely clear, miscommunication problems can arise between the research and the development teams. Small technical problems can also stall the projects. Cases when you realise that a certain dataset might take days to weeks to collect and process are especially painful. The key to prevent these problems and to solve them quickly when they arise is, firstly, clear documentation of everything that is going on - write up notes and plans after workshops and share them with everyone involved right away, before important details have slipped your memory. Avoid vague terminology, clear names help team members to make clear connections between the concepts and prevent miscommunication errors. Secondly, when hitting a stumbling block, don’t isolate yourself and try to find a solution on your own, ask for advice and help, this will save everyone’s time and help for speedy progress.
TB: What is the most interesting project you have worked on?
KVV: It is really hard to choose, there have been so many interesting ones. I think all the projects that I have enjoyed most have a unifying feature - understanding human behaviour through the data trail they leave. It can be public Twitter posts or responses to an experiment designed to test a hypothesis - it is always fascinating to see how people respond to changes in the environment, but at the same time change the environment itself with their activity.
TB: Data Science seems complex - what would you want to tell someone who doesn’t know anything about a Data Scientist?
KVV: It is immense fun! You end up seeing the world differently through the incoming data streams that you receive from everywhere. Poetry aside, in its core, the role of a Data Scientist is about collecting and analysing datasets to solve specific problems. The challenge is that many of these problems are very novel, the questions you are trying to answer are open-ended and there is not as much reference material as in other fields, although Data Science is developing very fast and many resources you can easily find today would not have been there two years ago.
TB: What is the most challenging part of your role?
KVV: I would say the most challenging part is know when to stop chasing a solution that is not going to work, to step back and count your losses, stop wasting time and start afresh.
TB: Do you have any opinions on the future of the Data Science industry?
KVV: I think the field will diversify, it has already started to do so but not officially yet. These days you can be a Data Scientist and do solid statistics all day, or mostly machine learning in computer vision, or tackle search optimisation problems - regardless, you role will still say “Data Scientist” even though the skillsets needed for these three activities are very different and it’s very rare to see one person capable of doing all three. I think the role titles will become more specific. I also hope for the democratisation of the field, with more access to online learning resources, but also access to computing power. For now, the easiest entry into Data Science is still through graduate programs in Computer Science and related fields. In a couple of years, I hope to find a truly global community.