This course is an in-depth course on Machine Learning in Python with an introduction to Deep Learning. Any previous knowledge about Machine Learning or Deep Learning is not required.

At this workshop you will learn:

  • How to process data with Pandas?
  • How to explore unknown data?
  • How to do machine learning with scikit-learn?
  • How to convert data between different formats?
  • And much, much more.

Course Syllabus

  1. Tooling
    1. Python 3 vs Python 2
    2. Python 3.x Installation
    3. PyCharm – IDE
    4. Executing Python Scripts
    5. pip – Packet Manager
    6. IPython – Interactive Console
    7. Jupyter Notebook
    8. virtualenv – Isolated Python Installations
    9. Tooling Summary
    10. Tooling for Data Science
  2. Data Visualization with matplotlib
    1. Basic Line Plots
    2. More Series Customization
    3. Log and Symlog Scale
    4. Multiple Plots
    5. Interactive Plots
  3. Python Crash Course
    1. Data Types
    2. Functions
    3. Useful Builtin Functions
  4. Data Processing with Pandas
    1. Importing and Exporting Data
    2. Basic Transformations
    3. Aggregation
    4. Filtering
    5. Split-Apply-Combine Pattern
    6. Rolling
    7. Processing Missing Values
  5. Introduction to Machine Learning
    1. What is Machine Learning?
    2. Basic Concepts
    3. Problem Types
    4. Basic Questions
    5. Common Workflow
    6. Links
    7. Use Case: Iris Classification
  6. Supervised Learning
    1. Underfitting, overfitting
    2. k-Nearest Neighbors
      1. Classification
      2. Regression
    3. Linear Models
      1. Ordinary Least Squares
      2. Ridge Regression
      3. Lasso Regression
      4. Logistic Regression i Linear Support Vector Machines
      5. Multiclass Classification
      6. Naive Bayes Classifier
    4. Decision Trees
      1. Single Decision Trees
      2. Single Decision Tree for Regression
      3. Ensembles of Decision Trees
      4. Random Forests
      5. Gradient Boosted Regression Trees
    5. Kernelized Support Vector Machines
    6. Neural Networks
      1. Working Principle
      2. Parameters
      3. Use Case
    7. Classificators Uncertainty
    8. Classificators Comparision
  7. Unsupervised Learning
    1. Preprocessing and Scaling
    2. Unsupervised Transformations
      1. Principal Component Analysis (PCA)
      2. Feature Extraction with PCA
      3. Non-negative matrix factorization (NMF)
      4. Dekompozycja sygnału z NMF
      5. Manifold Learning with t-SNE
    3. Clustering
      1. k-Means Clustering
      2. Agglomerative Clustering
      3. Hierarchical Clustering and Dendograms
      4. DBSCAN
      5. Evaluating Clustering with Ground Truth
      6. Evaluating Clustering without Ground Truth
      7. Comparing Clustering on Digits
    4. Semi-Supervised Learning
  8. Model Evaluation and Improvement
    1. Cross Validation
    2. Grid Search
      1. Naive Implementation
      2. Grid Search with Cross Validation
      3. Analysing Results of Cross-Validation
      4. Search Over Spaces That Are Not Grids
      5. Nested Cross Validation
    3. Evaluation Metrics for Classification
      1. Confusion Matrix
      2. Accuracy, Precision, Recall, F-score
      3. Taking Uncertainty into Account
      4. Precision-Recall Curve
      5. Receiver Operating Characteristics (ROC) and AUC
      6. Multiclass Classification
    4. Using Evaluation Metrics in Model Selection
  9. Representing Data and Engineering Features
    1. Categorical Features
      1. One-Hot-Encoding
      2. Numbers as Categories
    2. Feature Engineering
      1. Binning (Discretization)
      2. Interactions
      3. Polynominals
      4. Polynominal Interactions
      5. Nonlinear Transformations
    3. Feature Selection
      1. Univariate Statistics
      2. Model-Based Feature Selection
      3. Iterative Feature Selection
    4. Expert Knowledge
  10. Model Evaluation and Improvement
    1. Cross Validation
    2. Grid Search
      1. Naive Implementation
      2. Grid Search with Cross Validation
      3. Analysing Results of Cross-Validation
      4. Search Over Spaces That Are Not Grids
      5. Nested Cross Validation
    3. Evaluation Metrics for Classification
      1. Confusion Matrix
      2. Accuracy, Precision, Recall, F-score
      3. Taking Uncertainty into Account
      4. Precision-Recall Curve
      5. Receiver Operating Characteristics (ROC) and AUC
      6. Multiclass Classification
    4. Using Evaluation Metrics in Model Selection
  11. Algorithm Chains and Pipelines
    1. Building Pipelines
    2. General Pipeline Interface
    3. Writing Custom Estimators
    4. Grid-Searching Preprocessing Steps and Model Parameters
    5. Grid-Searching Which Model to Use
  12. Recommendation Systems
    1. Introduction to Recommendation Systems
    2. Surprise Library
    3. CI&T Deskdrop Dataset
    4. Cold Start
    5. Building Model and Evaluation Metric
    6. Popularity Model
    7. Content-Based Filtering
    8. Collaborative Filtering
    9. Testing Models
  13. Deep Learning
    1. Neural Networks
    2. Feedforward Neural Networks
    3. Convolutional Neural Networks
    4. Activation Functions
    5. Backpropagation
    6. Recurrent Neural Networks
    7. Long-Short Term Memory
    8. Neural Network Architectures
    9. Keras
    10. Overview of Available Tooling
  14. Working on Big Datasets with dask
    1. Dask as a Task Scheduler
    2. Working on a Computational Cluster
    3. DataFrame
    4. Bag
    5. Dask-ML
Close Menu