What is Data Science?
Data Science is the art of making predictions and understanding a deep-embedded pattern around everything underlying the functioning of a business. In an age, where after God, you can only trust Data, the growth is literally quicker than even a wink and the sheer amount of data that exists in the world is bound to astound anyone. More than 2.7 Zettabytes of data exists in today’s digital universe which is projected to grow to 180 Zettabytes in 2025. Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data. 571 new websites are created every minute of the day. 100 terabytes of data uploaded daily to Facebook. Data production will be 44 times greater in 2020 than it was in 2009.
As Carly Fiorina, former executive, president, and chair of Hewlett-Packard Co. had once famously remarked, “the goal is to turn data into information, and information into insight”, the real scope of Data Science is to collect and organize data in such a way that useful information can be derived from it. A Data Scientist’s job is to understand the data, clean the data, write an algorithm and then optimize the algorithm to make informed predictions about the business. People who got are looking forward to get an opportunity like such must also be aware that 70 to 80% of a Data Scientist’s consists of collecting and cleaning up the data.
Harvard has called it the “Sexiest Job of the 21st Century”, Glassdoor named it the “Best Job of the Year” for 2016 and according to Gautam Tambay, cofounder and CEO of Springboard, “Data is the new oil”. The world is producing more data every year, now, than it did in the entire 20th century and success over the next decade is going to be largely dependent on how companies are going to be able to turn that data into insights and actually take action on it. That’s where data science comes in. In an age where the bombardment of information never seems to cease, most of us just let it all flow over us, remembering a few highlights here and there but quickly moving on to the next thing. Data scientists are the ones who can slice through all the fluff to extract the meaning of all that information: how it’s connected and what its implications are. For companies looking to make strategic decisions, this skill is invaluable.
Why you should consider a career in Data Science?
A recent study by McKinsey indicates that the demand for Data Scientists is on the rise, with an estimated 50% demand-supply gap by 2018. Skilled, certified data scientists are among the highest-paid professionals in the IT industry, with a median salary for entry-level data scientists at $91,000, and managers making as much as $250,000 a year.
India is the second highest country to recruit employees in the field of data science or data analytics, etc, with 50,000 positions available – second only to the US. The demand for data experts is equally competitive, whether you look at the big companies, the e-commerce industry or even start-ups. Since there is an expected shortage of around 200,000 data specialists by 2018, companies are throwing money at those having the skills to take on the positions of Data Analysts, Scientists, Engineers, etc. Thus, a career in data promises a stimulating salary amongst the top paying jobs.
#1 From the Career point of view
Don’t expect this bubble to burst anytime soon. According to the report by McKinsey & Company, by 2018, the U.S. will have anywhere from 140,000 to 180,000 fewer data scientists than it needs. And the shortage of data science managers is even greater. Roughly 1.5 million data decision-making managers will be needed by 2018. The most viable way to choose a career is to figure out how much you can maximize your future options in it. Applied to career decisions, we must flip the problem on its head: instead of asking “how do I pick a career path,” we should ask “how can I maximize my future self’s options so that I am not forced to pick only one path?”
A career in data science satisfies this option-maximizing criterion. A data analytics professional has a broad range of job titles and fields from which to choose. Since big data is used almost everywhere today, you can choose to be a:
- Metrics and Analytics Specialist
- Data Analyst
- Big Data Engineer
- Data Analytics Consultant
These are just some of the job titles you can hold in big companies such as IBM, ITrend, Opera, Oracle and the possibilities are endless.
#2 Making Money is always equally important
According to an O’Reilly data science salary survey, the annual base salary of U.S.-based survey respondents was $104,000. Robert Half’s tech guide places the range between $109,000 and $153,750. And in the Burtch Works data science salary survey, the median base salary ranges from $97,000 for Level 1 contributors to $152,000 for Level 3 contributors. In addition, median bonuses start at $10,000 for Level 1 contributors. As a point of comparison, the U.S. Bureau of Labor Statistics (BLS) reports that lawyers earn a median annual wage of $115,820.
In India, the number of job opportunities and the annual salary package for data innovators is the highest in Mumbai, followed by Bangalore and New Delhi. However, since Bangalore is the startup capital of India, it has the most opportunities for jobs in this field.
Below is the ranking of the mean annual pay in (INR) lakhs, according to Analytics India Magazine:
- Mumbai: 11.4 lakhs
- Bangalore: 10.3 lakhs
- New Delhi: 9.9 lakhs
- Pune: 8.8 lakhs
- Chennai: 8.4 lakhs
- Hyderabad: 8.3 lakhs
#3 The Value for Experience
Data science managers can earn almost as much – and sometimes more – than doctors. A study by Burtch Works reveals that, in the US, level 1 managers earn a median annual base salary of $140,000. Level 2 managers make $190,000, and Level 3 Managers earn $250,000. And that puts them in a pretty good league. The median annual wage of pediatricians, psychiatrists, and medicine doctors currently is between $226,408 and $245,673. So without years of med school, residencies, and debt, a career in Data Science offers a chance to earn more than the person in charge of the operating table, which is genuinely great.
In India too, the scenario is quite similar. A career in data especially appeals to young IT professionals because of the positive correlation between years of work experience and higher paying salaries. Salaries in the field of data might look something like the following, in the future:
For a fresh graduate – amongst the 81% of graduates who enter the data industry – could expect a starting salary of INR 4 lakhs (annually).
For an experienced individual – An employee with 5-10 years of experience would have the potential to secure between INR 6-10 lakhs.
For a highly experienced person – A highly experienced employee with decades of experience or who has held managerial roles can expect anywhere from INR 24 lakhs up to an astounding crore of rupees!
Also, an analyst’s salary increases by 50% with a transition/promotion from the role designated to them to a higher level.
#4 The Lack of Competition
Not only is there a shortage of data scientists, but professionals in other fields don’t necessarily want to step up to the plate. According to a report by Robert Half and the Institute of Management Accountants, employers are looking for accounting and finance candidates who can mine and extract data, identify key data trends, and are adept at statistical modeling and data analysis.
But the report reveals that most accounting and finance candidates don’t have any of these skills – in fact, many colleges don’t even teach this level of analytics to students majoring in a financial discipline.
#5 The Ease of Job Hunting
Requirements in data science and analytics jobs are often multidisciplinary and they all require an ability to link analytics to creating value for the organization. That is exactly why, data scientists are in such high demand and the supply is so limited. Organizations have recruiters solely dedicated to finding these professionals. While candidates in other fields are harassing recruiters and pestering hiring managers, as a data scientist, you merely need to let it be known that you’re looking for a job. In fact, the need is so dire that even if you already have a job, recruiters will try to lure you away with a better compensation/benefits package. So, let the bidding begin.
What skills do you need to possess to become a Data Scientist?
Skill #1: Programming
This is perhaps the most fundamental of a data scientist’s skill set – the job of a data scientist is much more applied than that of a traditional statistician. Programming is important in multiple ways, including the three below:
Ø Being able to program augments your ability to do statistics. If you have a bunch of statistics knowledge but no way to implement it, your statistics knowledge becomes much less useful.
Ø The ability to analyze large datasets: The datasets you get to work with in industry are not as small and cute as the sample iris dataset – you easily get data that reaches millions of rows and many more.
Ø You can create tools to do better data science. This includes everything from building systems that your company can use to visualize data, creates frameworks to automatically analyze experiments, and managing the data pipeline at your company so the necessary data can be in the right place by the right times.
The normal software engineering training here will help you develop programming skills (although you typically don’t have to go as far as a usual software engineer would).
Skill #2: Quantitative analysis
Quantitative analysis is heart of a data scientist’s skill set. Much of data science is about understanding the behavior of a particularly complex system by analyzing the data that it produces, both naturally and via experiments. The need for quantitative analysis skills are important in multiple ways, including the three below:
Ø Experimental design and analysis: Particularly for data scientists working on consumer internet applications – the way that data is logged and the way that experiments can be run gives way to a massive amount of experimentation to test various hypotheses. There’s a lot of ways that experiment analysis can go wrong (ask any statistician), so data scientists can help a lot here.
Ø Modeling of complex economic or growth systems: Typical models like churn models or customer lifetime value models are common here, as well as more complicated models such as supply + demand modeling, economically-optimal ways to match providers and suppliers, and methods to model the growth channels of a company to better quantify which growth avenues are the most valuable. The most famous example of this is Uber’s surge pricing.
Ø Machine Learning: Even for the data scientists that don’t implement Machine Learning models themselves, there is tremendous value that data scientists can provide in helping create prototypes to test assumptions, select and create features, and identify areas of strength and opportunity in existing machine learning systems.
The requirement of this skill is why in particular the data science field is attractive to #1 Physicists #2 Statisticians #3 Economists #4 Operations Researchers #5 Many more, who are very used to understanding complex systems through top-down approaches (making models) or bottom-up approaches (inferences from data).
Skill #3: Product intuition
Product intuition as a skill is tied to a data scientist’s ability to perform quantitative analysis on the system. Product knowledge means understanding the complex system that generates all of the data that data scientists analyze. This is incredibly important for quite a few reasons, including:
Ø Generating hypotheses: A data scientist who understands the product well can generate hypotheses about ways the system can behave if changed in a particular manner. Hypotheses are based on “hunches” about how certain aspects of the system can behave – and one needs to know about the system to be able to have hunches about how it works.
Ø Defining metrics: The traditional analytics skill set includes defining key primary and secondary metrics that the company can use to keep track of success at particular objectives. A data scientist needs to know about the product in order to create product metrics that both 1. Measure what is intended 2. measure something that is worth moving.
Ø Debugging analyses: Results that are “incredible” are more often caused by bugs than actual “incredible” features of the system. Good product knowledge can help with quick sanity checks and back-of-the-envelope calculations that can help more quickly identify things that might have gone wrong.
Product knowledge usually involves using the product that your company is creating. If that’s not possible, then at least trying to get to know the people who actually use the product.
Skill #4: Communication
This skill is important to help significantly increase the leverage of all of the previous skills listed. This one is particularly important and can help distinguish a good data scientist from a great one. Good communication can manifest in various ways, including:
Ø Communicating insights: Some data scientists call this “storytelling”. The important thing here is to communicate insights in a clear, concise, and valid way, so that others in the company can effectively act on those insights.
Ø Data visualization and presentation: Sometimes theres nothing more effective and satisfying than a good graph at making or conveying a point.
Ø General communication: Working as a data scientist almost always means working as a team – including working with engineers, designers, product managers, operations, and more. Good general communication can help facilitate trust and understanding, which is incredibly important for someone who is entrusted with being stewards of the data.
Skill #5: Teamwork
This last skill ties together the rest of the 4 skills. A data scientist in particular cannot exist in isolation, and from what I’ve seen does best when deeply embedded in the rest of the company (or at least within the product development org).Teamwork is important for many reasons, including:
Ø Being selfless: This includes offering help and mentorship to others, and putting the company’s mission before your own personal career ambitions.
Ø Constant iteration: A data scientist thrives on feedback, and most parts of the data scientist’s work will involve back-and-forth iteration and feedback with others to reach an impactful solution.
Ø Sharing knowledge with others: Since the data scientist profession is quite new, there is basically no one with the complete set of skills, especially if you collect together all of the possibly useful statistical techniques, frameworks, libraries, languages, and tools. Because knowledge will be spread out across the data scientists and the organizations, it is particularly useful for data scientists to be constantly sharing their knowledge, methods, and results with each other.
The first two skills: programming and quantitative analysis are perhaps what most people first think about when they think about the skills of a data scientist. While those are important and create the technical foundation of a data scientist’s skill set, the good thing is that three of these five most important skills are not technical skills.
The third skill is important in general for any product or service-focused company, and the fourth and fifth skills are critical for any job you do where you work with other people! (Answer by William Chen, Data Scientist at Quora.)
Good luck and best wishes on your own path to becoming a Data Scientist!