Module 1: Python Basics: It will help learn the tool, Python to be used for working with data
Introduction to Python
OOP: Object & Class
Serialization: Pickle Library
Variables
Lists
Tuples
Dictionary
Sets
List and Dictionary Comprehensions
Conditional Statements (If, If-else,elif)
Loops (For, While)
Functions
Lambda Function
Apply Function
Class Exercises
Module2: Python NUMPY Library: It is used to perform a wide
variety of mathematical operations on arrays
Array Characteristics
Array Creation (arrange, linspace, flatten)
Array Indexing (Slicing)
Array Manipulation
Reshape
Concatenate
Append
Insert
Delete
Transpose
Class Exercises
Module3: Python PANDAS Library: It is used for data manipulation,
data cleaning, data analysis
Series
Data Frames
Reading csv file
Sub Setting / Filtering / Slicing Data
Dropping rows & columns
Adding/Deleting columns
Binning
Renaming columns or rows
Sorting
Data type conversions
Handling duplicates /missing
Broadcasting
Group by Function
Map Function
Visualization (bar graph, histogram, box plot)
Merging (Inner, Left, Right, Outer)
EDA
Class Exercises
Module4: Python MATPLOTLIB Library: Data Visualization part1
Bar Plot
Stacked Bar Plot
Histogram
Line Chart
Box plot
Pie-Chart
Class Exercises
Module5: Python SEABORN Library: Data Visualization part2
Bar Plot
Histogram
Pairwise Plots: Joint Plot, Pair Plot
Categorical Scatter Plot: Strip-plot, Swarm-plot
Box-Plot
Violin Plot
Cat Plot
Facet Grid
Pair Grid
Line Plot
Class Exercises
Module6: Basic Statistics: For business analysis
Type of Data
Statistics
Type of Statistics
Descriptive Statistics
Mean, Median, Mode (Measures of Central Tendency)
Standard Deviation, Variance (Measures of Dispersion)
Normal Distribution
Standard Normal Distribution
Standard Error
Sampling
Probability
Class Exercises
Module7: Advance Statistics: For business analysis
Confidence Interval
T-Test & Z-Test
P-value
Hypothesis Testing
Type I Error & Type II Error
Chi-Square Test
ANOVA
Covariance
Correlation
Class Exercises
Module8: Machine Learning
Supervised
Unsupervised
Module9: Supervised Machine Learning: Linear Regression (Solve
business problems where we have to predict a value)
Introduction
Assumptions (Linearity, Hetroskedasticity, Multivariate Normality,
etc)
Data Preparation (Outlier Treatment, Missing Value Imputation)
Building Linear Regression Model
Understanding model metrics (p-value, R-square/Adjusted R-
square etc)
Multicolinearity (VIF)
Model Validation (MAPE,RMSE)
Case study
Module10: Supervised Machine Learning: Logistic Regression
(Used for binary classification business problems)
Introduction
Linear Regression Vs. Logistic Regression
Data Preparation (Outlier Treatment, Missing Value Imputation,
Dummy Variable Creation)
Building Logistic Regression Model
Understanding model metrics (p-value)
Multicolinearity (VIF)
Model Validation (Confusion Matrix, ROC curve, AUC, etc)
Case study
Module11: Supervised Machine Learning: Decision Tress (Used for
multi-class classification business problems & regression business
problems)
Introduction
Types
Entropy, Gini Index, Chi-Square
Overfitting
Pruning
Cross – Validation
Case study
Module12: Supervised Machine Learning: Ensemble (Used for
multi-class classification business problems & regression business
problems)
Introduction
Bagging
Random forest
Boosting
Gradient Boosting Machines (GBM)
Case study
Module13: Supervised Machine Learning: KNN (Used for multi-
class classification business problems & regression business
problems)
Introduction
Working of KNN
Optimal value of K
Case study
Module14: Unsupervised Machine Learning: Clustering (Used for
segmenting data points into different groups)
Introduction
K -Means Clustering
Cluster Evaluation and Profiling
Case study
Module15: Unsupervised Machine Learning: PCA (Used for
segmenting data points into different groups)
Introduction
Curse of dimensionality
Process of working
Case study
Module16: Unsupervised Machine Learning: Isolation Forest (Used
for anomaly detection business problems)
Introduction
Contamination Factor
Case study
Module17: Time Series Forecasting: Used for inventory planning or
forecasting business problems
Introduction
Time Series Components : Trend, Seasonality, Cyclicity
Smoothening Techniques– Moving Averages, Exponential
ARIMA
Accuracy
Case study
Module18: Text Analytics: Used for text mining business problems
working with unstructured data
Introduction
Text Pre-processing
Noise Removal
Lemmatization
Stemming
Feature Engineering on Text Data
Bag of words
TF-IDF
Case study
Module19: AI: Deep Learning, Keras
Introduction: Deep Learning
Deep Learning vs Machine learning
Neural Networks
Activation Functions, hidden layers, hidden units
Backpropagation
Vanishing Gradient Problem
Exploding Gradient Problem
Perceptron & Multi-layer Perceptron
Case study
Module20: Model Deployment: Using model for predicting output
on new input values
Flask
Case study
Capstone Project at the end of the course
Course Duration: 50hours
Availability: 2hours per day (6 days a week)
Laptop Requirement: Any laptop with 64 GB RAM
Software Requirement: Install Anaconda latest version
Recommended Certifications: IBM Data Science Professional
Certificate, Data Science Council of America (DASCA) Senior Data
Scientist(SDS), AWS Certified Machine Learning –, AWS Certified Data
Analytics – Specialty, Azure Data Scientist Associate
Objective: The primary objective of these training sessions is to equip the participants with the necessary skills and knowledge to excel in data-driven decision-making, exploratory data analysis, predictive modeling, and machine learning techniques. By mastering these disciplines, your organization will gain a competitive edge and leverage the power of data to drive innovation and make informed business decisions.
Session Details:
1. Data Science Fundamentals (12 hours)
Introduction to data science concepts
Exploratory data analysis techniques
Data visualization using Python libraries.
2. Data Analytics with Python (18 hours)
Data preprocessing and cleaning
Statistical analysis and hypothesis testing
Advanced data visualization techniques
Introduction to SQL and data querying
3. Machine Learning using Python (30 hours)
Supervised and unsupervised learning algorithms
Model evaluation and validation techniques
Feature selection and engineering
Ensemble methods and model deployment
Certification: To validate the acquired knowledge and skills, they can also try to get the
following certifications -
Microsoft Certified: Python Developer Associate,
Python Institute Certifications (PCAP, PCPP, PCEP),
IBM Data Science Professional Certificate
Microsoft Certified: Azure Data Scientist Associate
Microsoft Certified: Azure Data Analyst Associate
Google Data Analytics Professional Certificate
These certificates will serve as a testament to your participants expertise in their respective areas
and can be utilized for career advancement and professional growth.