Data Science Training

Trainer 1

Module 1: Python Basics: It will help learn the tool, Python to be used for working with data

 Introduction to Python

 OOP: Object & Class

 Serialization: Pickle Library

 Variables

 Lists

 Tuples

 Dictionary

 Sets

 List and Dictionary Comprehensions

 Conditional Statements (If, If-else,elif)

 Loops (For, While)

 Functions

 Lambda Function

 Apply Function

Class Exercises


Module2: Python NUMPY Library: It is used to perform a wide

variety of mathematical operations on arrays

 Array Characteristics

 Array Creation (arrange, linspace, flatten)

 Array Indexing (Slicing)

 Array Manipulation

 Reshape


 Concatenate

 Append

 Insert

 Delete

 Transpose

 Class Exercises


Module3: Python PANDAS Library: It is used for data manipulation,

data cleaning, data analysis

 Series

 Data Frames

 Reading csv file

 Sub Setting / Filtering / Slicing Data

 Dropping rows & columns

 Adding/Deleting columns

 Binning

 Renaming columns or rows

 Sorting

 Data type conversions

 Handling duplicates /missing

 Broadcasting

 Group by Function

 Map Function

 Visualization (bar graph, histogram, box plot)

 Merging (Inner, Left, Right, Outer)

 EDA

Class Exercises


Module4: Python MATPLOTLIB Library: Data Visualization part1

 Bar Plot

 Stacked Bar Plot

 Histogram

 Line Chart

 Box plot

 Pie-Chart

Class Exercises


Module5: Python SEABORN Library: Data Visualization part2

 Bar Plot

 Histogram

 Pairwise Plots: Joint Plot, Pair Plot

 Categorical Scatter Plot: Strip-plot, Swarm-plot

 Box-Plot

 Violin Plot

 Cat Plot

 Facet Grid

 Pair Grid

 Line Plot

Class Exercises


Module6: Basic Statistics: For business analysis

 Type of Data

 Statistics

 Type of Statistics

 Descriptive Statistics


 Mean, Median, Mode (Measures of Central Tendency)

 Standard Deviation, Variance (Measures of Dispersion)

 Normal Distribution

 Standard Normal Distribution

 Standard Error

 Sampling

 Probability

Class Exercises


Module7: Advance Statistics: For business analysis

 Confidence Interval

 T-Test & Z-Test

 P-value

 Hypothesis Testing

 Type I Error & Type II Error

 Chi-Square Test

 ANOVA

 Covariance

 Correlation

Class Exercises


Module8: Machine Learning

 Supervised

 Unsupervised


Module9: Supervised Machine Learning: Linear Regression (Solve

business problems where we have to predict a value)


 Introduction

 Assumptions (Linearity, Hetroskedasticity, Multivariate Normality,

etc)

 Data Preparation (Outlier Treatment, Missing Value Imputation)

 Building Linear Regression Model


 Understanding model metrics (p-value, R-square/Adjusted R-

square etc)


 Multicolinearity (VIF)

 Model Validation (MAPE,RMSE)

 Case study


Module10: Supervised Machine Learning: Logistic Regression

(Used for binary classification business problems)

 Introduction

 Linear Regression Vs. Logistic Regression

 Data Preparation (Outlier Treatment, Missing Value Imputation,

Dummy Variable Creation)

 Building Logistic Regression Model

 Understanding model metrics (p-value)

 Multicolinearity (VIF)

 Model Validation (Confusion Matrix, ROC curve, AUC, etc)

 Case study


Module11: Supervised Machine Learning: Decision Tress (Used for

multi-class classification business problems & regression business

problems)

 Introduction

 Types

 Entropy, Gini Index, Chi-Square


 Overfitting

 Pruning

 Cross – Validation

 Case study


Module12: Supervised Machine Learning: Ensemble (Used for

multi-class classification business problems & regression business

problems)

 Introduction

 Bagging

 Random forest

 Boosting

 Gradient Boosting Machines (GBM)

 Case study


Module13: Supervised Machine Learning: KNN (Used for multi-

class classification business problems & regression business


problems)

 Introduction

 Working of KNN

 Optimal value of K

 Case study


Module14: Unsupervised Machine Learning: Clustering (Used for

segmenting data points into different groups)

 Introduction

 K -Means Clustering

 Cluster Evaluation and Profiling

 Case study


Module15: Unsupervised Machine Learning: PCA (Used for

segmenting data points into different groups)

 Introduction

 Curse of dimensionality

 Process of working

 Case study


Module16: Unsupervised Machine Learning: Isolation Forest (Used

for anomaly detection business problems)

 Introduction

 Contamination Factor

 Case study


Module17: Time Series Forecasting: Used for inventory planning or

forecasting business problems

 Introduction

 Time Series Components : Trend, Seasonality, Cyclicity

 Smoothening Techniques– Moving Averages, Exponential

 ARIMA

 Accuracy

 Case study


Module18: Text Analytics: Used for text mining business problems

working with unstructured data

 Introduction

 Text Pre-processing

 Noise Removal


 Lemmatization

 Stemming

 Feature Engineering on Text Data

 Bag of words

 TF-IDF

 Case study


Module19: AI: Deep Learning, Keras

 Introduction: Deep Learning

 Deep Learning vs Machine learning

 Neural Networks

 Activation Functions, hidden layers, hidden units

 Backpropagation

 Vanishing Gradient Problem

 Exploding Gradient Problem

 Perceptron & Multi-layer Perceptron

 Case study


Module20: Model Deployment: Using model for predicting output

on new input values

 Flask

 Case study


Capstone Project at the end of the course


Course Duration: 50hours

Availability: 2hours per day (6 days a week)


Laptop Requirement: Any laptop with 64 GB RAM

Software Requirement: Install Anaconda latest version

Recommended Certifications: IBM Data Science Professional

Certificate, Data Science Council of America (DASCA) Senior Data

Scientist(SDS), AWS Certified Machine Learning –, AWS Certified Data

Analytics – Specialty, Azure Data Scientist Associate

Trainer 2

Objective: The primary objective of these training sessions is to equip the participants with the necessary skills and knowledge to excel in data-driven decision-making, exploratory data analysis, predictive modeling, and machine learning techniques. By mastering these disciplines, your organization will gain a competitive edge and leverage the power of data to drive innovation and make informed business decisions.

Session Details:

1. Data Science Fundamentals (12 hours)


 Introduction to data science concepts

 Exploratory data analysis techniques

 Data visualization using Python libraries.


2. Data Analytics with Python (18 hours)

 Data preprocessing and cleaning

 Statistical analysis and hypothesis testing

 Advanced data visualization techniques

 Introduction to SQL and data querying


3. Machine Learning using Python (30 hours)


 Supervised and unsupervised learning algorithms

 Model evaluation and validation techniques

 Feature selection and engineering

 Ensemble methods and model deployment


Certification: To validate the acquired knowledge and skills, they can also try to get the

following certifications -


 Microsoft Certified: Python Developer Associate,

 Python Institute Certifications (PCAP, PCPP, PCEP),

 IBM Data Science Professional Certificate

 Microsoft Certified: Azure Data Scientist Associate

 Microsoft Certified: Azure Data Analyst Associate

 Google Data Analytics Professional Certificate

These certificates will serve as a testament to your participants expertise in their respective areas

and can be utilized for career advancement and professional growth.

Trainer 3