Is it possible to immerse yourself in the world of data by mastering Data Science on your own from scratch? Spoiler alert: yes. In this article, we will share with the Faculty of Artificial Intelligence the skills and disciplines you need to master on your way to a Data Scientist career.


Ace the data science interview: 201 real interview questions asked by FAANG, tech startups, & Wall Street Cracking the PM interview
Ace the data science interview: 201 real interview questions asked by FAANG, tech startups, & Wall Street

What does Data Scientist do?

In Data Science, training should be based on the tasks assigned to the specialist. However, the tasks of a Data Scientist may differ depending on the field of activity of the company. Here are some examples:

  • detection of anomalies - for example, non-standard actions with a bank card, fraud;
  • analysis and forecasting - performance indicators, quality of advertising campaigns;
  • scoring and grading systems - processing large amounts of data for making decisions, for example, on granting a loan;
  • basic interaction with the client - automatic replies in chats, voice assistants, sorting letters into folders.

But for any of the above tasks, you always need to follow approximately the same steps:

  • Data collection - search for sources and methods of obtaining information, as well as the collection process itself.
  • Checking - validation, removal of anomalies.
  • Analysis - the study of data, making assumptions, conclusions.
  • Visualization - bringing data into a human-readable form (graphs and diagrams).

The result is making decisions based on the analyzed data, for example, about changing the marketing strategy or increasing the budget for any of the company's activities.

What do you need to know? Despite the fact that you need to know a lot, there are now a huge number of online courses and books that will help you get the skills you need much faster.

Statistics, mathematics, linear algebra

You will need to study a fundamental course in probability theory, calculus, linear algebra, and mathematical statistics. Mathematical knowledge is important in order to be able to analyze the results of applying data processing algorithms.

Related books:

  • Ace the data science interview PDF: 201 real interview questions asked by faang, tech startups, & wall street by Nick Singh, 2021;
  • "Practical statistics for data scientists", P. Bruce, E. Bruce - suitable for those who already have basic knowledge in statistics;
  • "Data Science from Scratch" by J. Gras - a book for a quick immersion in the profession, covering most of the required disciplines;
  • "Neural networks. Full course ", S. Khaikin - material that reveals the mathematical component of neural networks.

Machine learning

Machine learning allows you to teach computers to make decisions on their own in order to automate the execution of certain tasks. For this reason, ML is applied in many areas, including data science.

To master Data Science from scratch, you first need to learn three main areas of machine learning:

Supervised Learning

Allows you to predict the result using pre-labeled data. If you need to predict several values ​​(for example, distinguish photographs of cars from airplanes and trains), then this is a classification problem, if one (say, assume the price of an apartment depending on its characteristics) is a regression problem.

Unsupervised learning

Here, the input data is not marked up, that is, neither the result nor the method of data processing is known in advance. An example is the search for anomalies - unusual credit card transactions, erroneous sensor readings, and the like.

Reinforcement learning

The initial data is also not labeled, but with each action the neural network receives a stimulus - positive or negative. According to this principle, AI is taught to play computer games, for example, Dota 2 and Starcraft II.

Related books

Machine Learning. The science and art of building algorithms that extract knowledge from data ”P. Flach - a book about methods of building models and algorithms for ML.
Probabilistic Programming in Python: Bayesian Inference and Algorithms, K. Davidson-Pylon - talks about data processing algorithms and develops analytical skills.
An Introduction to Machine Learning with Python by A. Müller, S. Guido is a book to hone your practical machine learning skills.