Lecture 2
Tools for Machine Learning
Melissa Chen
ECE 208/408 - The Art of Machine Learning
1/13/2022
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 2
Table of Contents
1. Shell Essentials
2. Python Basics
3. Python Packages for ML and Visualization
4. Deep Learning Frameworks
5. MLOps platform
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 3
1. Shell Essentials
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 4
Linux Shell Scripts
Unix shell is a!command-line interpreter [1]. It is a program that takes commands from the keyboard and gives
them to the operating system to perform.
“Terminal” is where you can interact with the shell.
Unix-like systems: Linux, MacOS. For Windows, see “batch files”.
[1] Kernighan, Brian W.;!Pike, Rob!(1984), "3. Using the Shell",!The UNIX Programming Environment, Prentice Hall, Inc., p.!94,!ISBN!0-13-937699-2
[2] https://github.com/wookayin/gpustat
Source: gpustat
Basic structure: command -[option] parameter1 parameter2 …
Interact with the system:
Cheatsheet: https://github.com/RehanSaeed/Bash-Cheat-Sheet
Package management:
Advanced Package Tool (or APT), the main command-line!package manager!for Debian and its derivatives.
System and hardware monitor:
CPU (example here): htop
GPU:
nvidia-smi
watch -n 2 | nvidia-smi
gpustat [2] (dynamic, recommended)
APT Examples:
$ apt update && sudo apt upgrade
$ apt install xxx
$ apt remove xxx
Bourne shell!(sh), GNU Bash!(bash), PowerShell!(msh), Z shell!(zsh), Secure Shell!(ssh)
Variations
Two ways to use it
Line-by-line in terminal or In a file
Usages
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 5
2. Python Basics
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 6
Why use Python?
Popular candidates: Python, Matlab, R, Java, C, C++
Middle-level or High-level
C: Fast, efficient, portable, but hard to write/understand
Python: Highly abstracted from the computer hardware, easy to understand
Compiled or Interpreted
Java/C++: Fast, protect source code, but can be more difficult to debug
Python/Matlab: Interpreted line-by-line and on-the-fly, flexible, cross-platform
ML Ecosystem and Developer Community
Python has the most active ML developer community
TensorFlow, Keras, PyTorch …
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 7
Programming Language for ML
There is no such thing as a “best language for machine learning.
The choice of language largely depends on specific applications and devices.
Source: KDnuggets
Source: Simform
Most Popular Languages in Every Country
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 8
Virtual Environment
Anaconda
https://en.wikipedia.org/wiki/Anaconda_%28Python_distribution%29
Anaconda is a distribution of the Python and R for scientific computing
Aims to simplify package management and deployment
For Windows, Linux, and macOS
Installation: https://www.anaconda.com/products/distribution#linux
Usage: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/
52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 9
Python Package Management
pip is the!package installer for Python
Included with modern versions of Python
Python Package Index is the official third-party software repository for Python.!
https://opensource.com/sites/default/files/gated-content/cheat_sheet_pip.pdf
Install specific version
$ pip install requests==2.22.0
Install packages from a requirements file
$ pip install -r requirements.txt
Capture all currently installed versions in a text file
$ pip freeze > requirements.txt
Pip
PyPI
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 10
Python Basics
Installation
Download and double click: https://www.python.org/downloads/
Install in shell using apt or homebrew:
$ sudo apt-get update
$ sudo apt-get install python3.6
Install in Anaconda: conda install python=3.8
Versions
Source: https://devguide.python.org/versions/
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 11
Python Basics
Data Types
String (str): A string is a sequence of characters. Anything inside quotes (single
quotes or double quotes) is a string.
Boolean (bool): True/False values. Can be used as integer 1/0.
Integer (int): A pointer to a compound C structure.
Float (float): Numbers that contain floating decimal points. 64-bit!double-precision.
Encoding
UTF-8 (Default), or plain ASCII
Integer operations:
Addition +
Subtraction –
Multiplication *
Division /
Exponents **
String concatenation:
3 * 'un' + 'ium'
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 12
Python Basics
Data Structures
Lists
A collection of items that are ordered and changeable
Lists might contain items of different types, but usually the
items all have the same type
Dictionaries
A collection of items that are unordered, changeable and
indexed
Contain a collection of!keys, and values associated with them
Sets
A collection of items that are unordered and unindexed
The elements contained in a set must be unique and
unchangeable
Sets seem very similar to lists, but they are very different
Tuples
A collection of items that are ordered and unchangeable
Almost the same as List, bur cannot be modified once created
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 13
Python Basics
Control Flow Tools
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 14
Python Basics
Classes
class!Person:
!!def!__init__(self, name, age):
!!!!self.name = name
!!! self.age = age
!!def!myfunc(abc):
!!!!print("Hello my name is "!+ abc.name)
p1 = Person("John",!36)
p1.name
p1.myfunc()
The self parameter is a reference to the current instance of the class,
and is used to access variables and functions that belongs to the class.
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 15
Python Style Guide
Source: https://peps.python.org/pep-0008/
Python Enhancement Proposal 8
Recommended by creators of Python
Intended to improve the readability of code and make it consistent
“Code is read much more often than it is written.
— Guido van Rossum
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 16
How Write Beautiful Code with Python
Use 4 spaces per indentation level.
Spaces are the preferred indentation method.
Allow mixing tabs and spaces, but keep consistent!
Continuation lines should align wrapped elements vertically.
Imports should usually be on separate lines.
Limit all lines to a maximum of 79 characters.
Single-quoted strings and double-quoted strings are the same.!
# Align
foo = long_function_name(var_one, var_two,
var_three, var_four)
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
# Hanging indents should add a level
foo = long_function_name(
var_one, var_two,
var_three, var_four)
import os
import sys
Use object-oriented programming style in multi-file complex projects.
Many projects have their own coding style guidelines. Find one and start with a good example.
Pick your rule and stick to it.
Rule of thumb
A Good example: https://github.com/brentspell/hifi-gan-bwe
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 17
Learning Resources
Step-by-step guide:
https://www.w3schools.com/python/
Official document:
https://docs.python.org/3.8/tutorial/index.html
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 18
Jupyter Notebook
Jupyter Notebook is a web-based interactive development environment (IDE)
Contain live code, equations, visualizations, and narrative text
Easy create and share documents
Jupyter Notebook is written in Python, but it supports over 40 programming
languages, including Python, R, Julia, and Scala.
Installation
PyPI distribution: pip install notebook
Anaconda distribution available
Intro example here:
https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 19
3. Python Packages for ML
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 20
NumPy
Numerical computing tool
Fast and versatile
Mathematical functions, random number generators, linear algebra
routines, Fourier transforms, and more.
Installation
conda install numpy
pip install numpy
Core concepts: numpy.ndarrays
NumPy vectorization, indexing, and broadcasting
Basic operations
Broadcasting
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 21
Pandas
Pandas is a Python library for data manipulation and analysis.
Provides data structures for efficiently storing and manipulating large datasets
Allows easy data cleaning, filtering, manipulation, and analysis
Built-in support for data I/O in a variety of file formats
A more natural way to display data than list or numpy array
Many cool and handy functions
Usage:https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Pandas DataFrame, source: https://devopedia.org/images/article/304
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 22
Pandas
Data type conversion, source: https://devopedia.org/images/article/304
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 23
Scikit-learn
Scikit-learn
Scikit-learn is a machine learning library built
on NumPy, SciPy, and matplotlib, and is designed
to be easy to use and efficient.
Installation
Pip: pip3 install -U scikit-learn
Conda version available
Usage:
https://scikit-learn.org/stable/user_guide.html
You can find source code of ML algorithms here
Source: sklearn official website
Linear regression example
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 24
Plotting Tools
Matplotlib and Seaborn
Installation: pip
Matplotlib is a general purpose plotting library
Seaborn is built on top of Matplotlib and is specialized for statistical graphics.
Seaborn working with DataFrames.
Matplotlib examples: https://matplotlib.org/stable/gallery/index.html
Seaborn examples: https://seaborn.pydata.org/examples/index.html
import matplotlib.pyplot as plt
import seaborn as sns
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 25
4. Deep Learning Frameworks
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 26
DL Platform Comparison
An example of a computational graph, source: https://
pytorch.org
PyTorch
dynamic computational graph framework
change the graph on the fly
easier to debug
Best for development
TensorFlow
static computational graph
must define the entire computation graph before the model can run
optimized to make the models run faster
more suitable for production
Keras
built on top of other libraries like Tensorflow, Theano and CNTK
quickly and easily build, train, and evaluate deep learning models with minimal code
highly modular
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 27
PyTorch
Installation
Version matters!
Make sure PyTorch version matches with CUDA version.
Check here for details: https://pytorch.org/get-started/locally/
Core concepts
Tensors: PyTorch's main data structure, similar to numpy's ndarrays
Autograd: A PyTorch feature that allows for automatic differentiation of
tensors. It is used to compute gradients.
Neural networks: PyTorch provides a built-in module for building and training
neural networks in torch.nn.
Optimizers: SGD, Adam, etc. in torch.optim
Data loading and preprocessing: torch.utils.data
Great step-by-step tutorial: https://pytorch.org/tutorials/
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 28
PyTorch-Lightning
PyTorch-Lightning is a wrapper library built on top of PyTorch
Great for researchers
Easy to build, train, and evaluate deep learning models
Support for distributed training across multiple GPUs and machines.
Automated logging of training metrics, model architecture and other information.
Automated checkpointing and early stopping.
Support for mixed precision training
Built-in support for common callbacks
https://github.com/Lightning-AI/lightning
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 29
Weka
Weka is a machine learning software developed in Java
Supports machine learning and deep-learning algorithms
User-friendly graphical interface
Coding-free
https://www.cs.waikato.ac.nz/ml/weka/
Source: https://en.wikipedia.org/wiki/Weka_(machine_learning)
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 30
5.
MLOps Platform
ECE 208/408 Lecture 2, Melissa Chen1/13/2022 31
MLOps Platform
Version control
Monitor training
Find optimal models
Increase reproducibility
Share insights
Visualization
An example of wandb
Weights & Biases
https://wandb.ai/site
An good example: wandb.ai/brentspell/hifi-gan-bwe
TensorBoard
Free
Unlimited storage
Developed by the Tensorflow team
Need port forwarding if used on remote server
https://www.tensorflow.org/tensorboard/get_started