This course is an in-depth course on Machine Learning in Python with an introduction to Deep Learning. Any previous knowledge about Machine Learning or Deep Learning is not required.

# At this workshop you will learn:

- How to process data with Pandas?
- How to explore unknown data?
- How to do machine learning with scikit-learn?
- How to convert data between different formats?
- And much, much more.

# Course Syllabus

- Tooling
- Python 3 vs Python 2
- Python 3.x Installation
- PyCharm – IDE
- Executing Python Scripts
- pip – Packet Manager
- IPython – Interactive Console
- Jupyter Notebook
- virtualenv – Isolated Python Installations
- Tooling Summary
- Tooling for Data Science

- Data Visualization with matplotlib
- Basic Line Plots
- More Series Customization
- Log and Symlog Scale
- Multiple Plots
- Interactive Plots

- Python Crash Course
- Data Types
- Functions
- Useful Builtin Functions

- Data Processing with Pandas
- Importing and Exporting Data
- Basic Transformations
- Aggregation
- Filtering
- Split-Apply-Combine Pattern
- Rolling
- Processing Missing Values

- Introduction to Machine Learning
- What is Machine Learning?
- Basic Concepts
- Problem Types
- Basic Questions
- Common Workflow
- Links
- Use Case: Iris Classification

- Supervised Learning
- Underfitting, overfitting
- k-Nearest Neighbors
- Classification
- Regression

- Linear Models
- Ordinary Least Squares
- Ridge Regression
- Lasso Regression
- Logistic Regression i Linear Support Vector Machines
- Multiclass Classification
- Naive Bayes Classifier

- Decision Trees
- Single Decision Trees
- Single Decision Tree for Regression
- Ensembles of Decision Trees
- Random Forests
- Gradient Boosted Regression Trees

- Kernelized Support Vector Machines
- Neural Networks
- Working Principle
- Parameters
- Use Case

- Classificators Uncertainty
- Classificators Comparision

- Unsupervised Learning
- Preprocessing and Scaling
- Unsupervised Transformations
- Principal Component Analysis (PCA)
- Feature Extraction with PCA
- Non-negative matrix factorization (NMF)
- Dekompozycja sygnału z NMF
- Manifold Learning with t-SNE

- Clustering
- k-Means Clustering
- Agglomerative Clustering
- Hierarchical Clustering and Dendograms
- DBSCAN
- Evaluating Clustering with Ground Truth
- Evaluating Clustering without Ground Truth
- Comparing Clustering on Digits

- Semi-Supervised Learning

- Model Evaluation and Improvement
- Cross Validation
- Grid Search
- Naive Implementation
- Grid Search with Cross Validation
- Analysing Results of Cross-Validation
- Search Over Spaces That Are Not Grids
- Nested Cross Validation

- Evaluation Metrics for Classification
- Confusion Matrix
- Accuracy, Precision, Recall, F-score
- Taking Uncertainty into Account
- Precision-Recall Curve
- Receiver Operating Characteristics (ROC) and AUC
- Multiclass Classification

- Using Evaluation Metrics in Model Selection

- Representing Data and Engineering Features
- Categorical Features
- One-Hot-Encoding
- Numbers as Categories

- Feature Engineering
- Binning (Discretization)
- Interactions
- Polynominals
- Polynominal Interactions
- Nonlinear Transformations

- Feature Selection
- Univariate Statistics
- Model-Based Feature Selection
- Iterative Feature Selection

- Expert Knowledge

- Categorical Features
- Model Evaluation and Improvement
- Cross Validation
- Grid Search
- Naive Implementation
- Grid Search with Cross Validation
- Analysing Results of Cross-Validation
- Search Over Spaces That Are Not Grids
- Nested Cross Validation

- Evaluation Metrics for Classification
- Confusion Matrix
- Accuracy, Precision, Recall, F-score
- Taking Uncertainty into Account
- Precision-Recall Curve
- Receiver Operating Characteristics (ROC) and AUC
- Multiclass Classification

- Using Evaluation Metrics in Model Selection

- Algorithm Chains and Pipelines
- Building Pipelines
- General Pipeline Interface
- Writing Custom Estimators
- Grid-Searching Preprocessing Steps and Model Parameters
- Grid-Searching Which Model to Use

- Recommendation Systems
- Introduction to Recommendation Systems
- Surprise Library
- CI&T Deskdrop Dataset
- Cold Start
- Building Model and Evaluation Metric
- Popularity Model
- Content-Based Filtering
- Collaborative Filtering
- Testing Models

- Deep Learning
- Neural Networks
- Feedforward Neural Networks
- Convolutional Neural Networks
- Activation Functions
- Backpropagation
- Recurrent Neural Networks
- Long-Short Term Memory
- Neural Network Architectures
- Keras
- Overview of Available Tooling

- Working on Big Datasets with dask
- Dask as a Task Scheduler
- Working on a Computational Cluster
- DataFrame
- Bag
- Dask-ML