10-Week Plan to Become Data Scientist

PIN

It is beyond any doubt that Data Science is the most coveted career choice but learning data science through the right resources has been a challenge, particularly for beginners. There are wide choices to learn data science from traditional on-premises university degrees, vocational courses, commercial coaching institutes to free resources on the internet.

The free resources available on the internet has been a popular choice for many learners as it is free and there are a plethora of quality material available through blogs, free university courses, Youtube videos, etc.,

A common difficultly most of the learners cite is that the free resources lack a structured learning approach as it is scattered and unorganized. As the beginners have limited knowledge of the subject itself, it takes a lot of time to understand where to start and how to structure the learning approach.

A few aspirants manage to get the hang of the subject and start making progress in the data science learning, but most of the aspirants,after investing significant time, fall out as it gets confusing, they finally choose to take a formal learning channel through training institutes, university degrees, etc.,

This article provides a structure to learn from free resources available online for data science. I have written this article with blend of my experience in successfully coaching more than ten thousand data science aspirants, mentoring career transitions, and my academic background as a Ph.D. scholar.

The below diagram provides an overview of 10-week plan to become data scientist.

I have mentioned free resources links based on my research, including my videos, popular blogs, free books. etc.,

WEEK 1: Data Science Fundamentals and Python

One of the common beginners’ mistake is to skip/just glance through the basics and fundamentals so as to speed up the learning and get to advanced topics. This is one of the main reasons for the falling out as things get more confusing without a proper foundation. You should know clearly, what is data science? How does it work? How does data science add value to the business? before you proceed to learn the concepts and techniques.

Data Science has three key knowledge areas:
1. Programming 2. Statistics and Mathematics 3. Machine Learning.
The first week should be spent on the high level understanding of these key areas, which lays a strong foundation

If you have very basic or no knowledge in programming, you need not to worry as I have seen with absolutely no programming knowledge get hang of it in less than 10 days. You probably need to spend another week on learning Python before moving on to further topics.

Week 1 – Recommended Resources

Data Science Evolution ( video pending)

Machine Learning for Dummies is a good book for understanding the fundamentals of Machine Learning. Free Limited edition from IBM

https://akvd.nl/10w1-ibm-ml-for-dummies

Python Essentials
https://akvd.nl/10w1-edx-python-for-datascience

https://akvd.nl/10w1-youtube-python-essential


WEEK 2: Statistics

Statistics is a core knowledge area of Data Science. Though the statistics is an vast subject, we only need some topics that are relevant for Data Science analysis. This includes overview of statistics, harnessing data, exploratory analysis, measures of central tendencies, variability, normal distribution, hypothesis testing, correlation, regressions, etc.,

I would suggest to get hands on statistics practice rather than conceptual and theoretical for more active learning approach. The resources includes some of my videos which uses Python for active practice of statistical concepts. Also, included some free eBooks, which I find readable for beginners.

Week 2 – Recommended Resources

This book is published in 2014 so it uses old packages, methods in Python which may not useful but the theory and concepts are explained nicely. It is a free to download
https://akvd.nl/10w2-greeteapress-stats-book

This is a good book, a bit more detailed. Uses R for coding part but I recommend to stick to Python.
https://akvd.nl/10w2-marshall-stats-book


My Youtube videos.
https://akvd.nl/10w2-youtube-stats-overview

https://akvd.nl/10w2-youtube-stats-harnessing-data

https://akvd.nl/10w2-youtube-stats-eda

https://akvd.nl/10w2-youtube-stats-hyp-testing

https://akvd.nl/10w2-youtube-stats-correlation-regression

https://akvd.nl/10w2-bayesian-statistics-book

https://akvd.nl/10w2-youtube-bayesian-stats


WEEK 3: Mathematics – 1

Mathematics is at the heart of Data Science for any kind of descriptive and predict data modelling. Again, a good high-school mathematics knowledge is sufficient to get started with Data Science. I understand that most of us, couldn’t recall our high-school or university mathematics knowledge but a quick brush up of concepts can get you on track.

I strongly khan academy for mathematics. Mr Khan has does a tremendous job in presenting math concepts in easy learning manner. Do check out links in resource section below, also try to explore mathematics further.

I have recently recorded some videos on Mathematics concepts inspired by Khanacademy for DataMites, a training institute where I teach Data Science. Added those links in below section as well.

Week 3 – Recommended Resources

https://akvd.nl/10w3-youtube-maths-av

https://akvd.nl/10w3-mit-linearalg-course

https://akvd.nl/10w3-khanacademy-linearalg

https://akvd.nl/10w3-khanacademy-matrix

https://akvd.nl/10w3-khanacademy-matrix-transformation


WEEK 4: Mathematics – 2

Mathematics require another week of studies as you need to cover probability, Calculus mostly derivatives and other concepts. I know that mathematics may not be very interesting for some of us, but this is an essential knowledge required before we move on. So I suggest not to skip this week for Mathematics.

Week 4 – Recommended Resources

https://akvd.nl/10w4-khanacademy-maths-probability

https://akvd.nl/10w4-edx-maths-probability

https://akvd.nl/10w4-khanacademy-math-calculus

https://akvd.nl/10w4-edx-maths-algo-design


WEEK 5: Introduction and Data Preparation

Usually, data preparation takes majority of the efforts in Data Science projects. This week is dedicated for understanding data preparation methods, techniques and concepts in detail.

I strongly recommend beginning with Python packages Numpy, Pandas as they are versatile and widely used in real-time projects. There are other packages, which can handle big data and complex data such as Dask, PySpark, Koalas but I won’t recommend investing time on them now. Once you get the hang Pandas data preparation, the other packages can be easily learned when required as they have similar methods and concepts as Pandas. So don’t get overwhelmed and distracted. Focus on Numpy and Pandas in this week.

Week 5 – Recommended Resources

https://akvd.nl/10w5-github-wrangling

https://akvd.nl/10w5-youtube-data-preparation

https://akvd.nl/10w5-medium-ml-intro

https://akvd.nl/10w5-youtube-numpy-pandas

https://akvd.nl/10w5-youtube-pandas

https://akvd.nl/10w5-youtube-data-wrangling


WEEK 6: Machine Learning – Algorithms Modelling Python

Finally, we can start building Machine Learning models through python packages. It gets more interesting as we progress from this week. The first Machine Learning model you built will be based on small datasets but you already start getting how powerful Machine Learning in predictive analytics. In fact, we can say that the recent popularity of Data Science as a field is because on Machine Learning techniques.

As you learn the first Machine Learning algorithms, Linear Regression. Try to understand how it works, what is the objective functions, what the terminologies used in Machine Learning? Also, convince yourself why Machine Learning is called Machine Learning? I mean, how is that Learning part is justified?

Week 6 – Recommended Resources

http://akvd.nl/10w6-ibm-ml-for-dummies


WEEK 7: Machine Learning – Algorithms Modelling

This week , the focus is on learning various Machine Learning algorithms such as K Means, Logistic Regression, K Nearest Neighbor. Decision Tree, etcs.,

It is equally important to learn the theory, mathematical modeling of the algorithms as well as the practical implementation through Python packages and fine tuning algorithms for performance with real-world data.

Week 7 – Recommended Resources

https://akvd.nl/10w7-github-adv-ml-book

https://akvd.nl/10w7-xgboost

https://akvd.nl/10w7-mastering-ml-sklearn

https://akvd.nl/10w7-neural-network-guide

https://akvd.nl/10w7-youtube-hyperparameter-tuning


WEEK 8: Machine Learning Advanced Techniques

This week, we continue learning more advanced algorithms such as Artificial Neural Networks, XGBOOST and other ensemble techniques, which are proven to be effective in solving real-world problems. Also, focus on advanced techniques in data preparation, feature engineering and evaluation metrics.

Some of the learning, the topics of this week might take longer time. I suggest to spend time as required rather than rushing to keep your pace as per the schedule.

Week 8 – Recommended Resources

https://akvd.nl/10w8-youtube-advanced-metrics

https://akvd.nl/10w8-youtube-adv-techniques

https://akvd.nl/10w8-youtube-metrics

https://akvd.nl/10w8-primo-cost-function


WEEK 9: Projects – Solving Business Problems, Kaggle

If you have reached till this level, you should be pretty good in modeling Machine Learning algorithms and solving Data Science problems with Python. Now, it is time to challenge yourself with some real world problems.

I recommend to start with Kaggle competitions, participate, read voraciously topics relevant to you in Kaggle.

Don’t worry about advanced topics such as image classification (CNNs), Deep Learning, Natural Language processing (RNNs), reinforcement learning, etc., They are the next steps. As per the report from LinkedIn, 80% of the jobs in the market deal with structured data so your preparation is good enough to transition to a Data Science career.

Week 9 – Recommended Resources

https://akvd.nl/10w9-data-flair-ml-project-ideas

https://akvd.nl/10w9-kaggle-competitions


WEEK 10: Model Deployment

This week syllabus of model deployment is an optional topic. It usually considered as a job software engineering/ infrastructure team. But a good knowledge on deploying machine learning models into production and making it available for the end-users, enable you to deliver end-to-end Machine Learning / Data Science projects.

The topics cover deploying Machine Learning models through flask through API and amazon sage maker (scalable deployment) cloud deployment.

Week 10 – Recommended Resources

https://akvd.nl/10w10–flask-deployment

https://akvd.nl/10w10-ml-deploy-sagemaker

https://akvd.nl/10w10-cornell-datascience-book


End Note

As per WEF estimates, about 47 million new jobs in the field of Data Science and related domains by the year 2022. I have observed from LinkedIn job trends that Data Science job postings aren’t affected by Coronavirus global pandemic. This is very encouraging for Data Science aspirants.

This article will be continuously updated with better resources and guidance as I discover. If you find resources that can be included here, please share it with me in the comments. I shall review and add it to the list of resources. This will be available for beginners to save time and get into business in the shortest time possible.

I will be glad to answer your queries on Data Science preparation, career advice, or any related topics. You can reach me at ashok@rubixe.com / LinkedIn https://akvd.nl/linkedin-ashokveda for career advises

I wish you a successful Data Science Journey.

You May also Like
Coronavirus Impact on Jobs
Data Science Career Counselling

Leave Your Comments