Cytel data scientists apply advanced statistical techniques including predictive modelling of biological processes and drug interactions to unlock the potential of big data.
In this blog from our Career Perspectives series, we talk to Andrea Hita, at Data Scientist at Cytel, to find out more about her career path, her current role at Cytel and her interests outside of work.
Have you always been passionate about data science and did you know early on this would be your career choice?
I have always been very analytical, but I did not discover my passion for data science until my final year at university. While studying biomedical engineering, I enjoyed formulating mathematical models that explain biomedical systems and human physiology. During the last year of my undergraduate degree, and later in my Master, through different collaborations on signal and image processing projects, I discovered the applied side of modelling and the sophisticated computational methods that deal with complex datasets. I got very enthusiastic about how data science algorithms can be used for prediction in diagnosis and treatment guidance in life sciences.
What is your current role?
I use statistical and machine learning algorithms to extract relevant knowledge from data that can be used in clinical trial applications. Essentially, our work is oriented to make data useful in various ways. For instance, we use computational learning algorithms to build medical devices that automatically process and classify biomedical signals and images to predict a disease condition. We also work on biomarker discovery projects. In these kinds of projects, we work with genetic and molecular datasets and the question is: Are we able to find the right markers defining biological signatures that predict the response of a subject to a given treatment?
How has your prior experience prepared you for your current role?
My main background in biomedical engineering has been fundamental. I am familiar with most of the biomedical data measuring devices and my understanding of human systems physiology is key when analyzing a dataset. I can combine my knowledge on both the data source, in this case, an alive human body system and the sensors used to measure this data to characterize a priori - what will the signal component of interest be and also what noise are we going to find when analyzing data. I have also worked developing software for automatic medical image processing. This helped me gain experience in pattern recognition techniques, programming, and computer science.
"I love the creativity that data science projects require."
What tools or devices help you succeed in your role as a data scientist?
For most of the projects, I develop customized code to analyze the data according to purpose. I mainly develop algorithms for data analysis in R, it is platform-independent and very flexible. This allows me to create interactive visualizations of data with a surprising rapidity. Moreover, when it comes to machine learning and statistical analysis, any new research in the field probably has an accompanying R package that allows you to implement the newest cutting-edge technology. As a data scientist, it is very helpful to work in a multi-disciplinary team, like the one I work now with, that is formed by professionals from very different initial backgrounds. The team consists of computer scientists, statisticians, and biomedical engineers. This helps to successfully develop a data science project from all these perspectives.
How do you typically work with big data sets?
It is difficult to generalize as of course different methods are applied to different data types. In general, a good first step is to make use of visualization tools representing different transformations, often reducing the dimensionality, of the datasets. The goal is to identify the features are that explain most of the variability in order to design what is going to be the first modelling approach.
What most inspires you about working in this field?
I love the creativity that data science projects require. The question of identifying where the information inside a noisy data set is becomes very challenging and fascinating. Data sets have lots of different shapes and structures and you have to be creative to pick the right methodologies in order to identify the valuable information that can be used predict an outcome of interest.
Another inspiration is thinking about the largest unknowns in today's human biology and how, from my point of view, data science methods in life sciences are going to be crucial to address some of these big questions, e.g. understanding the brain code that is able to sophistically memorize and learn or the genome code that with just four letters, it has the potential to codify the organization of the whole process from a one-cell zygote to an embryo and later to an adult being.
What would be your three top tips for early career data scientist looking to develop in this field?
- Ensure your programming skills are fluent, this a key requirement when developing algorithms.
- It is important to identify the field you want to specialize in and then develop your skills and knowledge to become an expert with the data type, for instance in life science we work with genomics, images or signals and the methodologies are specific for each area.
- A versatile data scientist possesses experience and knowledge with many machine learning algorithms; the more tools you know, the more versatile you become! This is a fast-moving field and it is paramount that you dedicate time to your own learning and development.
What are your personal values?
I believe it is important to listen, having the ability to listen goes hand in hand with having an openness to different people and new ideas. This is fundamental if people are going to understand and learn from each other, not only in the workplace but in our personal life. Having this attitude provides space and perspective allowing us to solve challenges.
What are your main interests outside of work?
I have a real passion for hiking and I am a volunteer with a children's hiking group in Barcelona, many of my weekends are busy trekking across the Pyrenees. Music is also important to me and you can often find me playing my favourite songs with my guitar and singing with friends and family.
I am fortunate to be in close proximity to the Mediterranean and have grown up sailing – when the weather is good, I still try to get out to sea for a few hours.
Thank you for taking the time to talk to us and sharing your journey.
Cytel's data scientists. statisticians, programmers and data managers are active and well regarded in industry associations and communities around the world. Would you like to join our talented team? To find out more about rewarding careers with us click below.