Python has always been popular among scientists and researchers. Thanks to that, it has one of the best environments for data analysis. A lot of advanced libraries and tools are developed. However, variety of all of them confuses newbies. This workshop removes this confusion.

At this workshop you will learn:

  • How to quickly visualize data and draw beautiful plots with Seaborn?
  • How to process data with Pandas?
  • How to do machine learning with scikit-learn?
  • How to convert data between different formats?
  • How to speed up computations by running them in a cloud with Dask?
  • And much, much more.

Course Syllabus

  1. Tools
    1. IPython
    2. Jupyter Notebook
    3. IDE (PyCharm, Visual Studio Code)
    4. pip – Packet Manager
    5. Working with virtualenv
    6. Executing Programs
  2. Data Visualization with matplotlib and Seaborn
    1. Useful Links
    2. Idioms
    3. Plot Types
    4. Plot Customization
    5. Advanced Plots with Seaborn
  3. Data Processing with Pandas
    1. Loading and Exporting Data
    2. Working with Data Series
    3. Basic Data Structure: DataFrame
    4. Processing Dates
    5. Processing Strings
    6. Processing Missing Values
    7. Joins
    8. Grouping and Aggregation (Split-Apply-Combine Pattern)
    9. Pivot Tables
    10. Working with Indexes
  4. Machine Learning with scikit-learn
    1. Supervised and Unsupervised Learning
    2. Features
    3. Features Normalization
    4. Classification, Regression, Clustering and Other Classes of Problems
    5. Evaluation and Cross-validation
    6. Choosing Model Parameters
    7. Choosing Right Features
    8. Support Vector Machines
    9. Bayess Filter
    10. Decision Trees
    11. Neural Networks
    12. k-means Algorithm
    13. Principial Component Analysis
  5. Distributed Processing with Dask
    1. Basic Principles and Good Practices of Parallelizing Computations
    2. Configuring Computing Cloud on Amazon EC2
    3. Creating Local Cloud in Dask
    4. Creating Amazon Cloud in Dask
    5. Executing Code on Nodes
    6. Basics Data Structures in Dask
    7. Loading Data
    8. Data Aggregation
    9. Debugging Dask Cloud
    10. Profiling Cloud