May 6, 2019

Who is a data scientist and how to get a data scientist role

Greetings folks, today we will talk little bit about introduction to data science , and we will learn what are the different components , different terminologies and jargons used in data science and data analytics.

Data Science is the skill of extracting knowledge from data and to use that knowledge to predict the unknown.We use that intelligence or insights to improve the business outcomes.Another simple definition is that data science is the science of deriving useful insights from data. The goal is to create meaning from data and create data products.

Data scientists need to do end to end analysis of large and diverse data sets.Unlike other domains, here we concentrate more on data and less on code. Not only do we deal with acquisition of data, we also deal with storing and processing of data. The data may be small or big, but as a data scientist we need to have an idea of how to go ahead and deal with such scenarios - this is what we mean by data analytics.

In typical business intelligence(BI scenarios), we do not necessarily deal with end to end analysis. We deal only with structured data in BI. In the field of data science, we also deal with unstructured data. We have to clean the data, we have to transform the data, and this is part of the end to end analysis.

So we deal with large datasets with the ultimate goal of uncovering the value from data. It is not just enough to get meaningful insights.. the results should allow us to take up some action.

We are looking at multiple discliplines here - software engineering, statistics, domain expertise. It does not mean that if you are not an expert in coding, you cant enter the data science field. Just like, you do not have to be a mechanical engineer to drive a car !!

We don't have to write lots of code here but we focus to be able to re-use the code.

As per Monica Rogati in this article

By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo - starry eyed explorers and skeptical detectives.

half hacker is somebody who knows to look at code , re-use the code , perform some coding operations, look at data, change the data and think from a analytical perspective. They will think from the perspective of what to do if the data changes.

You need apply your analytical skills with the combination of slight business knowledge. Again, you many not know everything in your business domain, but have enough knowledge to build a data product.

A data scientist finds deeply hidden patterns in data. Hence data scientist has been voted as the best job in America . Hands on training is the best way to improve statistical and programming skills for aspiring data scientists.

Focus on the below topics, and you will not regret later, during your beginner's journey of becoming data scientist.

The skills required to be a data scientist, as per job postings on different sites, are:-

1. Python

2. R

3. SQL

4. Hadoop

5. Java

6. SAS

7. Spark

8. Matlab

9. Hive

10. Tableau

So whats the takeaway from this? Focus on Python and R, and you will set your foot into majority of data science job openings.