Become a Data Scientist with Free Learning Materials
How to become a high level data scientist with free learning materials
It can take approx. 12 months. So you have to make a commitment to your devotion and focus to complete the whole syllabus. Then definitely your knowledge in data science will strong enough to compete for any challenges in this field.
- Python [https://www.learnpython.org/, https://www.kaggle.com/learn/python, https://www.freecodecamp.org/news/how-to-learn-python/]
- Pandas, Seaborn, Data Munging [https://www.analyticsvidhya.com/blog/2014/09/data-munging-python-using-pandas-baby-steps-python/, https://towardsdatascience.com/practical-data-analysis-with-pandas-and-seaborn-8fec3cb9cd16, https://www.kaggle.com/code/crawford/humble-intro-to-analysis-with-pandas-and-seaborn/notebook, https://seaborn.pydata.org/introduction.html, https://pandas.pydata.org/getting_started.html, ]
- Variables, Matrices, Functions, Derivatives [https://www.kaggle.com/getting-started/131094, https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/, https://ocw.mit.edu/courses/18-06-linear-algebra-spring-2010/, Calculus & Derivatives ]
- Sklearn [https://scikit-learn.org/stable/tutorial/index.html, ]
- TF2, Keras [TF, https://keras.io/ ]
- Google Data Studio [GCP ]
Mini-Projects in Data Cleaning, Handing, Python
- Probabilistic Description of Events and Data: Probability Axioms, Random Variables, PDF, PMF, Conditional Probability, Independence, Expectation, Variance [https://ocw.mit.edu/courses/res-6-012-introduction-to-probability-spring-2018/, https://ocw.mit.edu/courses/18-440-probability-and-random-variables-spring-2014/, ]
- Statistical Learning, Experiment Design, Confidence Interval and Hypothesis Testing [https://ocw.mit.edu/courses/18-05-introduction-to-probability-and-statistics-spring-2014/, https://ocw.mit.edu/courses/18-443-statistics-for-applications-spring-2015/, https://www.youtube.com/watch?v=Vfo5le26IhY, ]
- Bayesian Learning [https://www.youtube.com/watch?v=E3l26bTdtxI, ]
- Univariate and Multivariate Calculus, Norms of Vectors and Functions [https://e600.uni-mannheim.de/chapter-3/, https://www2.stat.duke.edu/~sayan/informal/vcalc.pdf, ]
- Taylor’s theorem and Automatic Differentiation [https://ivanky.wordpress.com/2016/06/14/taylor-method-with-automatic-differentiation/, https://openreview.net/pdf?id=SkxEF3FNPH, ]
- Fundamentals of Linear Algebra Spaces [https://math.libretexts.org/Courses/SUNY_Schenectady_County_Community_College/Fundamentals_of_Linear_Algebra, https://personal.math.ubc.ca/~carrell/NB.pdf, ]
- Machine Learning Tools [https://doc1.bibliothek.li/acb/FLMF040119.pdf, https://www.youtube.com/watch?v=v0uVu5__JGg, ]
Mini-Project in Foundations of Data Science (Bayesian Learning, Data Handling, Probability)
- The ML Process: Problem Formulation to Solution [https://docs.aws.amazon.com/machine-learning/latest/dg/formulating-the-problem.html, https://www.kaggle.com/discussions/getting-started/144378, https://www.coursera.org/learn/ai-for-everyone/home/week/1 ]
- Linear Regression, Bias/Variance, Regularization, Stochastic Gradient Descent [https://www.ics.uci.edu/~xhx/courses/CS273P/04-linear-regression-273p.pdf, https://towardsdatascience.com/bias-variance-and-regularization-in-linear-regression-lasso-ridge-and-elastic-net-8bf81991d0c5, https://scikit-learn.org/stable/modules/sgd.html, ]
- Linear Classification: Logistic Regression, Linear SVM, Classification Metrics (Confusion Matrix) [https://intellipaat.com/blog/confusion-matrix-python/, https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-mnist-handwriting-recognition-matplotlib-a6b31e2b166a, https://machinelearningmastery.com/metrics-evaluate-machine-learning-algorithms-python/, ]
- Nonlinear SVM, Decision Tree [https://www.youtube.com/watch?v=GcCG0PPV6cg, https://www.youtube.com/watch?v=_YPScrckx28, ]
- Ensemble Methods: Random Forest, Gradient Boosting [https://towardsdatascience.com/battle-of-the-ensemble-random-forest-vs-gradient-boosting-6fbfed14cb7, https://www.youtube.com/watch?v=gkXX4h3qYm4, https://www.youtube.com/watch?v=bA37PEJKed0, ]
- Unsupervised Learning: Clustering, Anomaly Detection [https://towardsdatascience.com/unsupervised-learning-for-anomaly-detection-44c55a96b8c1, https://www.youtube.com/watch?v=Fxat5OkinvQ, https://www.youtube.com/watch?v=5p8B2Ikcw-k, ]
Mini-Projects in Machine Learning Algorithms in Multiple Domains (Rental Business, Healthcare, Banking, NLP, Customer Segmentation)
- Object-oriented programming (OOP): [Python Coursera, https://www.youtube.com/watch?v=SRu1GAfr3LA, https://docs.python.org/3/reference/datamodel.html, https://datagy.io/python-object-oriented-programming/, ]
- 1. Inheritance
- 2. Encapsulation
- 3. Abstraction
- 4. Polymorphism
- 5. OOPs in Python
- 6. Applications of OOPs in Data Science
- Parallel Architectures: Fundamentals of Parallel computer Memory Architecture, Parallel programming with MPI [https://ipcc.cs.uoregon.edu/lectures/lecture-2-architecture.pdf, https://www.cs.utexas.edu/users/witchel/372/lectures/24.ParallelComputing.pdf, ]
- Parallel Computing with Accelerators: Parallel programming with OpenMP, Accelerated computing using GPU [https://developer.nvidia.com/blog/developing-accelerated-code-with-standard-language-parallelism/, https://hipc.org/openmp-gpu-offloading-openmpgo/, ]
- Scalable Computing with Python: Numba: [https://indico.cern.ch/event/824917/contributions/3571661/attachments/1934964/3206289/2019_10_DANCE_Numba.pdf, https://developer.nvidia.com/blog/numba-python-cuda-acceleration/, https://www.youtube.com/watch?v=3dHJ00mAQAY, https://www.youtube.com/watch?v=x58W9A2lnQc, https://numba.pydata.org/, ]
- 1. Just-In-Time (JIT) compiler for Python [https://people.duke.edu/~ccc14/sta-663-2016/18C_Numba.html, ]
- 2. Thread and multiprocessing in python Dask for NumPy [https://sydney-informatics-hub.github.io/training.artemis.python/aio.html, https://conference.scipy.org/proceedings/scipy2018/pdfs/anton_malakhov.pdf, ]
- 3. Pandas and Scikit-Learn [https://medium.com/@vasista/parallel-processing-with-pandas-c76f88963005, https://scikit-learn.org/stable/computing/parallelism.html, ]
- 4. Parallel computing with TensorFlow [https://www.tensorflow.org/guide/distributed_training, ]
- Introduction to MLOps, Foundations, MLOps for containers, Continuous Integration, Continuous Deployment for ML models, Monitoring and Feedback. [https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning, ]
Mini-Projects in Computing for AI/ML (Writing ML packages from scratch, Using OpenMP/MPI)
- Fundamentals of Deep Learning [http://perso.ens-lyon.fr/jacques.jayez/Cours/Implicite/Fundamentals_of_Deep_Learning.pdf, https://www.youtube.com/watch?v=aircAruvnKk, https://www.youtube.com/watch?v=VyWAvY2CF9c, ]
- Multi Layer Perceptron — Deep Neural Networks [https://en.wikipedia.org/wiki/Multilayer_perceptron, https://www.simplilearn.com/tutorials/deep-learning-tutorial/multilayer-perceptron, ]
- Training a MLP — Backprop, Optimization Methods, Rules of Thumb [http://www.cnel.ufl.edu/courses/EEL6814/chapter3.pdf, https://web.stanford.edu/class/biods220/lectures/lecture2.pdf, ]
- Convolutional Neural Network for Computer Vision [https://www.coursera.org/lecture/convolutional-neural-networks/computer-vision-Ob1nR, https://insightsimaging.springeropen.com/articles/10.1007/s13244-018-0639-9, ]
- Recurrent Neural Network for Sequential Modelling [https://towardsdatascience.com/sequence-models-and-recurrent-neural-networks-rnns-62cadeb4f1e1, https://www.youtube.com/watch?v=qjrad0V0uJE, ]
- Dimensionality Reduction (PCA, SNE), Generative Models (GANs, VAE) [https://safeai-lab.github.io/TAIAT/2021/6-Generative-methods.pdf, https://www.youtube.com/watch?v=5WoItGTWV54, ]
- Reinforcement Learning [https://www.youtube.com/watch?v=lvoHnicueoE, https://www.youtube.com/watch?v=FgzM3zpZ55o&list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u, ]
Mini-Projects in Neural Networks (Computer Vision, Image Analytics, Video Analytics, Financial Analytics, NLP, Reinforcement Learning Stock Trader)
- Introduction to Big Data storage systems [https://link.springer.com/chapter/10.1007/978-3-319-21569-3_7, https://www.coursera.org/lecture/big-data-management/data-storage-RplBY, ]
- Introduction to Big Data processing platforms [https://www.xenonstack.com/blog/big-data-platform, https://www.youtube.com/watch?v=-AcIJQPWXAo, ]
- Deep Dive into Spark: RDD, Narrow, Wide Transformations [https://blog.knoldus.com/deep-dive-into-apache-spark-transformations-and-action/, https://sparkbyexamples.com/apache-spark-rdd/spark-rdd-transformations/, ]
- Deep Dive into Spark: Designing, implementing, evaluation and validating computational and analytics application using Spark [https://www.toptal.com/spark/introduction-to-apache-spark, https://spark.apache.org/examples.html, ]
- Fast Data Processing Platforms: Apache Storm [https://storm.apache.org/, https://www.youtube.com/watch?v=QoEyXKIKZKY, ]
Mini-Projects in Data Engineering (Process Movie Data using NoSQL Cassandra, Complex Analytics on network intrusion using PySpark, End-to-end, PySpark Analytics on Tweets Data, ML on cloud)
- Time Series Models: Time Series for Business and Financial Data [http://home.ubalt.edu/ntsbarsh/stat-data/forecast.htm, https://www.youtube.com/watch?v=FPM6it4v8MY, ]
- Market Basket Analysis [https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-market-basket-analysis/, https://www.youtube.com/watch?v=sVFmbBOXo7A, ]
- Portfolio Optimization [https://www.youtube.com/watch?v=9fjs8FeLMJk, https://towardsdatascience.com/understanding-portfolio-optimization-795668cef596, ]
- Customer Churn Analysis [https://www.netsuite.com/portal/resource/articles/human-resources/customer-churn-analysis.shtml?mc24943=v1, https://www.youtube.com/watch?v=6EmjRXUcARc, ]
- Data Analytics in Infectious Disease Spread [https://www.youtube.com/watch?v=MmQfotHuG5E, https://www.youtube.com/watch?v=hk7YJagKVzc, ]
Mini-Projects in Business Analytics (Market Basket Analysis, Bitcoin Forecasting, Air Quality Forecasting, Customer Churn Analysis)
Projects can be done
- Real-time system for Tweet Analytics [https://www.kaggle.com/code/sandhyakrishnan02/nlp-with-disaster-tweets-using-lstm, ]
- Food Image Segmentation [https://www.kaggle.com/code/artgor/food-recognition-challenge-eda, ]
- Talent Retention and Attrition Prediction [https://www.analyticsvidhya.com/blog/2021/11/employee-attrition-prediction-a-comprehensive-guide/, ]
- Identification of Quora question pairs with the same intent [https://www.kaggle.com/c/quora-question-pairs, ]
- Stock Market predictions based on Time Series [https://www.kaggle.com/code/faressayah/stock-market-analysis-prediction-using-lstm, ]
- Prediction of Client Subscription to a Bank term Deposit [https://www.kaggle.com/code/aleksandradeis/bank-marketing-analysis, https://www.kaggle.com/code/sid321axn/prediction-of-term-deposit-using-catboost, ]
- Direct Retail Marketing efforts based on Customer Segmentation using ML based Clustering techniques [https://www.kaggle.com/code/azizozmen/customer-segmentation-cohort-rfm-analysis-k-means, ]
- Movie Recommendation System [https://www.kaggle.com/code/ibtesama/getting-started-with-a-movie-recommendation-system, https://www.kaggle.com/code/ayushimishra2809/movie-recommendation-system, ]
- Predict the future daily-demand for a large Logistics Company [https://www.kaggle.com/code/anushkaml/walmart-time-series-sales-forecasting, https://medium.com/@sheetalmk04/to-predict-the-future-daily-demand-for-a-large-logistics-company-61a9e10494ac, ]
- Achieving image super-resolution using a Generative Adversarial Network [https://www.kaggle.com/datasets/akhileshdkapse/super-image-resolution, https://www.kaggle.com/code/jesucristo/gan-introduction, ]
- Predictive Data Analytics [http://www.iraj.in/journal/journal_file/journal_pdf/3-174-143867230926-31.pdf, https://www.kaggle.com/datasets/iamsouravbanerjee/analytics-industry-salaries-2022-india, ]
- Urban Crime Data Analytics for safety improvement [https://www.mitre.org/publications/project-stories/smart-cities-use-analytics-to-improve-safety-and-well-being, https://www.kaggle.com/code/prashant111/decision-tree-classifier-tutorial, ]
- Breast Cancer classification from digitized FNA image feature measurements [https://www.frontiersin.org/articles/10.3389/fgene.2019.00080/full, https://towardsdatascience.com/breast-cancer-classification-using-support-vector-machine-svm-a510907d4878, https://www.kaggle.com/code/pierrelouisdanieau/breast-cancer-prediciton-ml-dl, ]
- Exploratory and Predictive Data Analytics using Indian Premier League (IPL) dataset [https://www.kaggle.com/code/amrutanshudash/ipl-data-analytics-08-19, ]
- Anomaly detection in Bearing Vibration Measurements [https://www.kaggle.com/code/victorambonati/unsupervised-anomaly-detection, ]
The following course will cover various topics of the above syllabus
· Machine Learning Specialization with Andrew Ng
· https://openlearninglibrary.mit.edu/courses/course-v1:MITx+6.036+1T2019/about
· https://ocw.mit.edu/courses/6-036-introduction-to-machine-learning-fall-2020/
· https://www.youtube.com/watch?v=qZ6TJc5bSKU
·