Data Science

What is a Data Scientist?

Some call it magic, some call it knowledge. I call it work

One of the questions people ask me more often is "You're mathematician, so you can only work as a teacher, really?"

The question is obviously no! Mathematicians are outlined for different and usefun skills like logical reasoning, hacking skills, etc... In fact, the answer of the previous questions is that I work as a Data Scientist on an online business company. The answer is then followed by a deformation of their faces that always ends in a singular:

"¿¿WHAT??"

So, with the aim to let people know more about this job, that is currently growing all around the world with online marketing, let's see what is a Data Scientist, because it's not really easy to define because it covers lots of different aspects.

Data Science

La definición

Citing the Wikipedia,

Data science is, in general terms, the extraction of knowledge from data. The key word in this job title is "science," with the main goals being to extract meaning from data and to produce data products. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, and information technology, including signal processing, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing. The discipline is not restricted only to so-called big data, although an important aspect of data science is its ability to easily cope with large amounts of data. The development of machine learning, a branch of artificial intelligence used to uncover patterns in data from which practical and usable predictive models can be developed, has enhanced the growth and importance of data science.

The job of Data Scientists

Data Scientists work in complex problems to try to solve them based on their experience on some specific scientific discipline. Generally speaking, Data Scientists must work with several elements of mathematics, statistics, online marketing, psychology and computer science although he is not required to have long experience with all of them at the same time. However, a Data Scientist is likely to become expert in only one or two of those disciplines and only little competent in two or three more. So, Data Science is better practised by a team, where the members share their knowledge and they can complement each other.

Data Scientists use their capacities to find, analyze and understand the data sources with some significant value, administrate huge amounts of data despite the hardware, software and connection limitations, combine data from different sources, guarantee the coherence of the data, createvisualizations of the data to share their discovers and to help others (usually non Data Scientists) to understand the founds they make, create mathematical models to suit the data and present and share the data with other specialists and scientists of the team and, in last insctance to a non-expert crowd.

Data science techniques affect research in many domains, including the biological sciences, medical informatics, health care, social sciences and the humanities. It heavily influences economics, business and finance. From the business perspective, data science is an integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and data analysis.

Data Scientists at social gaming

I've specialized myself in Data Science of social and cross-platform multiplayer casino games, one of the richest and most emerging markets that move amount of money every year. Companies like King.com, Supercell or Rovio, are examples of companies that were born with a few employees, and through the launchment of hits like Candy Crush Saga, Clash of Clans or Angry Birds saga they have earned a reputation and status in the world of social games. Behind all that game development there is always a small team of Data Scientists who spend the day looking for data from those games, trying to give a coherent meaning and proceeded to give suggestions on how to improve the product to keep going, increase their monetisation and grow exponentially (King started with 3 employees and a few years after his first success are more than 1,000 offices in more than 5 countries).

All for one

A great way that I use to describe the features of my work are often those you see below in the following Venn diagram (I am mathematician, recall it?):

Diagrama de Venn describiendo el concepto de Data Science

Diagrama de Venn describiendo el concepto de Data Science

A Data Scientist is a combination of three specific skills:

Programming knowledge

The huge amount of data that a Data Scientist must analyze is usually stored in huge databases. The amount of data reaches so large magnitudes that almost never all data for the study is available. Here it comes in the ability of the Data Scientist to create a good algorithm that returns a result susceptible of being analyzed. This part of the Data Scientist, often also called Data Mining, is often the most time consuming work he must do. It is often very useful to know the data structure that lies behind the product that analyzes, in order to save time among the thousands of tables that normally usually have the databases he uses.

Thus, in this first phase, we define what is worth being analyzed and extract, one way or another, a series of data which subsequently undergo a phase of statistical analysis. Let's get into it.

Mathematical knowledge

A Data Scientist  should know maths and statistical models over all other things. Many of the problems that work throughout his career go through understanding the behavior of a population, trying to model it with a probability distribution function, to extract common features which can then be used to improve the product that is being tested.... Day by day, he has to deal with problems of hypothesis testing, systems of equations, integrals, cohort analysis... so a mathematical background is always more than required. 

In this intermediate stage, the above data extracted is now weighted, aggregated, averaged, submitted to hypothesis testing,... Often the data collected by the programmer side of the data scientist is not usually enough, and therefore he goes back, gets the data blown to starts all over again with new data to answer questions that arise in the analysis phase.

Experience

It may seem silly, but often the experience of a Data Scientist is a highly valued degree. The problems of past analysis, the conclusions drawn by analyzing previous problems are useful for the application to new analysis. A Data Scientist will be refined analysis after analysis: its conclusions are often increasingly useful, more precise and logical; reasoning are increasingly substantiated; and perspective when facing new analysis is increasingly opening to look into cases that at the beginning would have gone completely unnoticed. Without any doubt, I could say that a good Data Scientist is like wine: it improves over the years.

And what to expect from a Data Scientist?

Like everything else, a Data Scientist has its limitations, and even more when you consider that your basic foundation is the statistic, where nothing is 100% sure. Do not ask next lottery number or numbers that must contain a bingo ticket to have some reward, because he can't tell you ... In one of his usual analysis, a Data Scientist may draw inferences such as:

  • What gender and age range converts better in a web site? Therefore, the marketing team has a good feedback to know what kind of users they have to focus in at new campaigns.
  • Where the users get stuck in the registration and purchase process? This can be rearranged by changing the design and in board process to reduce the number of lost customers.
  • Which button works better? The best place the purchase button is up, down, brightly colored, opaque colors ... Often this decisions may seem trivial, but in many cases they are often critical.
  • What is the best channel to reengage users? Email, notices, letters, newsletters, ...
  • What is the best time of day for promotions?
  • How should the difficulty of some level of a video game be increased in order to balance the frustration with the engagement should?
  • What features are worth being improved and which should be erased from the product?
  • Does the company should move to mobile apps? Is it worth to have a tablet version totally different to the mobile version? Is it interesting to integrate with the Twitter API?

These and many other questions can be answered with the help of a good Data Scientist. Some call it magic, other knowledge, but ourselves, Data Scientists call it work.

Leave a Reply

Your email address will not be published. Required fields are marked *