Data Science Training

Trainer 1

Module 1: Python Basics: It will help learn the tool, Python to be used for working with data

Introduction to Python

OOP: Object & Class

Serialization: Pickle Library

Variables

Lists

Tuples

Dictionary

Sets

List and Dictionary Comprehensions

Conditional Statements (If, If-else,elif)

Loops (For, While)

Functions

Lambda Function

Apply Function

Class Exercises

Module2: Python NUMPY Library: It is used to perform a wide

variety of mathematical operations on arrays

Array Characteristics

Array Creation (arrange, linspace, flatten)

Array Indexing (Slicing)

Array Manipulation

Reshape

Concatenate

Append

Insert

Delete

Transpose

Class Exercises

Module3: Python PANDAS Library: It is used for data manipulation,

data cleaning, data analysis

Series

Data Frames

Reading csv file

Sub Setting / Filtering / Slicing Data

Dropping rows & columns

Adding/Deleting columns

Binning

Renaming columns or rows

Sorting

Data type conversions

Handling duplicates /missing

Broadcasting

Group by Function

Map Function

Visualization (bar graph, histogram, box plot)

Merging (Inner, Left, Right, Outer)

EDA

Class Exercises

Module4: Python MATPLOTLIB Library: Data Visualization part1

Bar Plot

Stacked Bar Plot

Histogram

Line Chart

Box plot

Pie-Chart

Class Exercises

Module5: Python SEABORN Library: Data Visualization part2

Bar Plot

Histogram

Pairwise Plots: Joint Plot, Pair Plot

Categorical Scatter Plot: Strip-plot, Swarm-plot

Box-Plot

Violin Plot

Cat Plot

Facet Grid

Pair Grid

Line Plot

Class Exercises

Module6: Basic Statistics: For business analysis

Type of Data

Statistics

Type of Statistics

Descriptive Statistics

Mean, Median, Mode (Measures of Central Tendency)

Standard Deviation, Variance (Measures of Dispersion)

Normal Distribution

Standard Normal Distribution

Standard Error

Sampling

Probability

Class Exercises

Module7: Advance Statistics: For business analysis

Confidence Interval

T-Test & Z-Test

P-value

Hypothesis Testing

Type I Error & Type II Error

Chi-Square Test

ANOVA

Covariance

Correlation

Class Exercises

Module8: Machine Learning

Supervised

Unsupervised

Module9: Supervised Machine Learning: Linear Regression (Solve

business problems where we have to predict a value)

Introduction

Assumptions (Linearity, Hetroskedasticity, Multivariate Normality,

etc)

Data Preparation (Outlier Treatment, Missing Value Imputation)

Building Linear Regression Model

Understanding model metrics (p-value, R-square/Adjusted R-

square etc)

Multicolinearity (VIF)

Model Validation (MAPE,RMSE)

Case study

Module10: Supervised Machine Learning: Logistic Regression

(Used for binary classification business problems)

Introduction

Linear Regression Vs. Logistic Regression

Data Preparation (Outlier Treatment, Missing Value Imputation,

Dummy Variable Creation)

Building Logistic Regression Model

Understanding model metrics (p-value)

Multicolinearity (VIF)

Model Validation (Confusion Matrix, ROC curve, AUC, etc)

Case study

Module11: Supervised Machine Learning: Decision Tress (Used for

multi-class classification business problems & regression business

problems)

Introduction

Types

Entropy, Gini Index, Chi-Square

Overfitting

Pruning

Cross – Validation

Case study

Module12: Supervised Machine Learning: Ensemble (Used for

multi-class classification business problems & regression business

problems)

Introduction

Bagging

Random forest

Boosting

Gradient Boosting Machines (GBM)

Case study

Module13: Supervised Machine Learning: KNN (Used for multi-

class classification business problems & regression business

problems)

Introduction

Working of KNN

Optimal value of K

Case study

Module14: Unsupervised Machine Learning: Clustering (Used for

segmenting data points into different groups)

Introduction

K -Means Clustering

Cluster Evaluation and Profiling

Case study

Module15: Unsupervised Machine Learning: PCA (Used for

segmenting data points into different groups)

Introduction

Curse of dimensionality

Process of working

Case study

Module16: Unsupervised Machine Learning: Isolation Forest (Used

for anomaly detection business problems)

Introduction

Contamination Factor

Case study

Module17: Time Series Forecasting: Used for inventory planning or

forecasting business problems

Introduction

Time Series Components : Trend, Seasonality, Cyclicity

Smoothening Techniques– Moving Averages, Exponential

ARIMA

Accuracy

Case study

Module18: Text Analytics: Used for text mining business problems

working with unstructured data

Introduction

Text Pre-processing

Noise Removal

Lemmatization

Stemming

Feature Engineering on Text Data

Bag of words

TF-IDF

Case study

Module19: AI: Deep Learning, Keras

Introduction: Deep Learning

Deep Learning vs Machine learning

Neural Networks

Activation Functions, hidden layers, hidden units

Backpropagation

Vanishing Gradient Problem

Exploding Gradient Problem

Perceptron & Multi-layer Perceptron

Case study

Module20: Model Deployment: Using model for predicting output

on new input values

Flask

Case study

Capstone Project at the end of the course

Course Duration: 50hours

Availability: 2hours per day (6 days a week)

Laptop Requirement: Any laptop with 64 GB RAM

Software Requirement: Install Anaconda latest version

Recommended Certifications: IBM Data Science Professional

Certificate, Data Science Council of America (DASCA) Senior Data

Scientist(SDS), AWS Certified Machine Learning –, AWS Certified Data

Analytics – Specialty, Azure Data Scientist Associate

Trainer 2

Objective: The primary objective of these training sessions is to equip the participants with the necessary skills and knowledge to excel in data-driven decision-making, exploratory data analysis, predictive modeling, and machine learning techniques. By mastering these disciplines, your organization will gain a competitive edge and leverage the power of data to drive innovation and make informed business decisions.

Session Details:

1. Data Science Fundamentals (12 hours)

Introduction to data science concepts

Exploratory data analysis techniques

Data visualization using Python libraries.

2. Data Analytics with Python (18 hours)

Data preprocessing and cleaning

Statistical analysis and hypothesis testing

Advanced data visualization techniques

Introduction to SQL and data querying

3. Machine Learning using Python (30 hours)

Supervised and unsupervised learning algorithms

Model evaluation and validation techniques

Feature selection and engineering

Ensemble methods and model deployment

Certification: To validate the acquired knowledge and skills, they can also try to get the

following certifications -

Microsoft Certified: Python Developer Associate,

Python Institute Certifications (PCAP, PCPP, PCEP),

IBM Data Science Professional Certificate

Microsoft Certified: Azure Data Scientist Associate

Microsoft Certified: Azure Data Analyst Associate

Google Data Analytics Professional Certificate

These certificates will serve as a testament to your participants expertise in their respective areas

and can be utilized for career advancement and professional growth.

Trainer 3

Report abuse