Data Science Training

Trainer 1

Module 1: Python Basics: It will help learn the tool, Python to be used for working with data

 Introduction to Python

 OOP: Object & Class

 Serialization: Pickle Library






 List and Dictionary Comprehensions

 Conditional Statements (If, If-else,elif)

 Loops (For, While)


 Lambda Function

 Apply Function

Class Exercises

Module2: Python NUMPY Library: It is used to perform a wide

variety of mathematical operations on arrays

 Array Characteristics

 Array Creation (arrange, linspace, flatten)

 Array Indexing (Slicing)

 Array Manipulation







 Class Exercises

Module3: Python PANDAS Library: It is used for data manipulation,

data cleaning, data analysis


 Data Frames

 Reading csv file

 Sub Setting / Filtering / Slicing Data

 Dropping rows & columns

 Adding/Deleting columns


 Renaming columns or rows


 Data type conversions

 Handling duplicates /missing


 Group by Function

 Map Function

 Visualization (bar graph, histogram, box plot)

 Merging (Inner, Left, Right, Outer)


Class Exercises

Module4: Python MATPLOTLIB Library: Data Visualization part1

 Bar Plot

 Stacked Bar Plot


 Line Chart

 Box plot


Class Exercises

Module5: Python SEABORN Library: Data Visualization part2

 Bar Plot


 Pairwise Plots: Joint Plot, Pair Plot

 Categorical Scatter Plot: Strip-plot, Swarm-plot


 Violin Plot

 Cat Plot

 Facet Grid

 Pair Grid

 Line Plot

Class Exercises

Module6: Basic Statistics: For business analysis

 Type of Data


 Type of Statistics

 Descriptive Statistics

 Mean, Median, Mode (Measures of Central Tendency)

 Standard Deviation, Variance (Measures of Dispersion)

 Normal Distribution

 Standard Normal Distribution

 Standard Error



Class Exercises

Module7: Advance Statistics: For business analysis

 Confidence Interval

 T-Test & Z-Test


 Hypothesis Testing

 Type I Error & Type II Error

 Chi-Square Test




Class Exercises

Module8: Machine Learning



Module9: Supervised Machine Learning: Linear Regression (Solve

business problems where we have to predict a value)


 Assumptions (Linearity, Hetroskedasticity, Multivariate Normality,


 Data Preparation (Outlier Treatment, Missing Value Imputation)

 Building Linear Regression Model

 Understanding model metrics (p-value, R-square/Adjusted R-

square etc)

 Multicolinearity (VIF)

 Model Validation (MAPE,RMSE)

 Case study

Module10: Supervised Machine Learning: Logistic Regression

(Used for binary classification business problems)


 Linear Regression Vs. Logistic Regression

 Data Preparation (Outlier Treatment, Missing Value Imputation,

Dummy Variable Creation)

 Building Logistic Regression Model

 Understanding model metrics (p-value)

 Multicolinearity (VIF)

 Model Validation (Confusion Matrix, ROC curve, AUC, etc)

 Case study

Module11: Supervised Machine Learning: Decision Tress (Used for

multi-class classification business problems & regression business




 Entropy, Gini Index, Chi-Square



 Cross – Validation

 Case study

Module12: Supervised Machine Learning: Ensemble (Used for

multi-class classification business problems & regression business




 Random forest


 Gradient Boosting Machines (GBM)

 Case study

Module13: Supervised Machine Learning: KNN (Used for multi-

class classification business problems & regression business



 Working of KNN

 Optimal value of K

 Case study

Module14: Unsupervised Machine Learning: Clustering (Used for

segmenting data points into different groups)


 K -Means Clustering

 Cluster Evaluation and Profiling

 Case study

Module15: Unsupervised Machine Learning: PCA (Used for

segmenting data points into different groups)


 Curse of dimensionality

 Process of working

 Case study

Module16: Unsupervised Machine Learning: Isolation Forest (Used

for anomaly detection business problems)


 Contamination Factor

 Case study

Module17: Time Series Forecasting: Used for inventory planning or

forecasting business problems


 Time Series Components : Trend, Seasonality, Cyclicity

 Smoothening Techniques– Moving Averages, Exponential



 Case study

Module18: Text Analytics: Used for text mining business problems

working with unstructured data


 Text Pre-processing

 Noise Removal



 Feature Engineering on Text Data

 Bag of words


 Case study

Module19: AI: Deep Learning, Keras

 Introduction: Deep Learning

 Deep Learning vs Machine learning

 Neural Networks

 Activation Functions, hidden layers, hidden units


 Vanishing Gradient Problem

 Exploding Gradient Problem

 Perceptron & Multi-layer Perceptron

 Case study

Module20: Model Deployment: Using model for predicting output

on new input values


 Case study

Capstone Project at the end of the course

Course Duration: 50hours

Availability: 2hours per day (6 days a week)

Laptop Requirement: Any laptop with 64 GB RAM

Software Requirement: Install Anaconda latest version

Recommended Certifications: IBM Data Science Professional

Certificate, Data Science Council of America (DASCA) Senior Data

Scientist(SDS), AWS Certified Machine Learning –, AWS Certified Data

Analytics – Specialty, Azure Data Scientist Associate

Trainer 2

Objective: The primary objective of these training sessions is to equip the participants with the necessary skills and knowledge to excel in data-driven decision-making, exploratory data analysis, predictive modeling, and machine learning techniques. By mastering these disciplines, your organization will gain a competitive edge and leverage the power of data to drive innovation and make informed business decisions.

Session Details:

1. Data Science Fundamentals (12 hours)

 Introduction to data science concepts

 Exploratory data analysis techniques

 Data visualization using Python libraries.

2. Data Analytics with Python (18 hours)

 Data preprocessing and cleaning

 Statistical analysis and hypothesis testing

 Advanced data visualization techniques

 Introduction to SQL and data querying

3. Machine Learning using Python (30 hours)

 Supervised and unsupervised learning algorithms

 Model evaluation and validation techniques

 Feature selection and engineering

 Ensemble methods and model deployment

Certification: To validate the acquired knowledge and skills, they can also try to get the

following certifications -

 Microsoft Certified: Python Developer Associate,

 Python Institute Certifications (PCAP, PCPP, PCEP),

 IBM Data Science Professional Certificate

 Microsoft Certified: Azure Data Scientist Associate

 Microsoft Certified: Azure Data Analyst Associate

 Google Data Analytics Professional Certificate

These certificates will serve as a testament to your participants expertise in their respective areas

and can be utilized for career advancement and professional growth.

Trainer 3