This course is a great continuation of Introduction to Machine Learning.

At this workshop you will learn:

  • How to process data with Pandas?
  • How to explore unknown data?
  • How to do machine learning with scikit-learn?
  • How to convert data between different formats?
  • How to engineer and select features?
  • How to speed up computations by running them in a cloud with Dask?
  • And much, much more.

Course Syllabus

  1. Tooling
    1. Python 3 vs Python 2
    2. Python 3.x Installation
    3. PyCharm – IDE
    4. Executing Python Scripts
    5. pip – Packet Manager
    6. IPython – Interactive Console
    7. Jupyter Notebook
    8. virtualenv – Isolated Python Installations
    9. Tooling Summary
    10. Tooling for Data Science
  2. Data Visualization with matplotlib
    1. Basic Line Plots
    2. More Series Customization
    3. Log and Symlog Scale
    4. Multiple Plots
    5. Interactive Plots
  3. Python Crash Course
    1. Data Types
    2. Functions
    3. Useful Builtin Functions
  4. Data Processing with Pandas
    1. Importing and Exporting Data
    2. Basic Transformations
    3. Aggregation
    4. Filtering
    5. Split-Apply-Combine Pattern
    6. Rolling
    7. Processing Missing Values
  5. Representing Data and Engineering Features
    1. Categorical Features
      1. One-Hot-Encoding
      2. Numbers as Categories
    2. Feature Engineering
      1. Binning (Discretization)
      2. Interactions
      3. Polynominals
      4. Polynominal Interactions
      5. Nonlinear Transformations
    3. Feature Selection
      1. Univariate Statistics
      2. Model-Based Feature Selection
      3. Iterative Feature Selection
    4. Expert Knowledge
  6. Model Evaluation and Improvement
    1. Cross Validation
    2. Grid Search
      1. Naive Implementation
      2. Grid Search with Cross Validation
      3. Analysing Results of Cross-Validation
      4. Search Over Spaces That Are Not Grids
      5. Nested Cross Validation
    3. Evaluation Metrics for Classification
      1. Confusion Matrix
      2. Accuracy, Precision, Recall, F-score
      3. Taking Uncertainty into Account
      4. Precision-Recall Curve
      5. Receiver Operating Characteristics (ROC) and AUC
      6. Multiclass Classification
    4. Using Evaluation Metrics in Model Selection
  7. Algorithm Chains and Pipelines
    1. Building Pipelines
    2. General Pipeline Interface
    3. Writing Custom Estimators
    4. Grid-Searching Preprocessing Steps and Model Parameters
    5. Grid-Searching Which Model to Use
  8. Recommendation Systems
    1. Introduction to Recommendation Systems
    2. Surprise Library
    3. CI&T Deskdrop Dataset
    4. Cold Start
    5. Building Model and Evaluation Metric
    6. Popularity Model
    7. Content-Based Filtering
    8. Collaborative Filtering
    9. Testing Models
  9. Deep Learning
  10. Working on Big Datasets with dask
    1. Dask as a Task Scheduler
    2. Working on a Computational Cluster
    3. DataFrame
    4. Bag
    5. Dask-ML