Getting started with PyTorch
Before You Begin
Make sure you have Python installed in your system, otherwise, take a look at Setting up Python development environment
NoteThis guide is a practical guide that I followed from PyTorch tutorial
Introduction
PyTorch is a Python-based scientific computing package targeted at two sets of audiences:
- A replacement for NumPy to use the power of GPUs
- a deep learning research platform that provides maximum flexibility and speed
Installation
To install PyTorch, please take a look at Install PyTorch
Getting Started
Tensors Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.
import torchConstruct a 3x6 matrix, uninitialized:
x = torch.empty(3, 6)
print(x)Out:
tensor ([[0.0000e + 00, 0.0000e + 00, 0.0000e + 00, 0.0000e + 00, 0.0000e + 00, 0.0000e + 00],
[0.0000e + 00, 0.0000e + 00, 0.0000th + 00, 0.0000th + 00, 1.4854e-42, 0.0000th + 00],
[0.0000e + 00, 4.7428e + 30, 0.0000th + 00, 0.0000th + 00, 0.0000th + 00, 0.0000th + 00]])Construct a randomly initialized matrix:
x = torch.rand(3, 6)
print(x)Out:
tensor ([[0.1346, 0.0086, 0.8915, 0.2503, 0.3227, 0.2293],
[0.1182, 0.6759, 0.4475, 0.7565, 0.1387, 0.1020],
[0.8162, 0.7662, 0.6263, 0.6072, 0.4265, 0.0322]])Construct a matrix filled zeros and of dtype long:
x = torch.zeros(3, 6, dtype=torch.long)
print(x)Out:
tensor ([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])Construct a tensor directly from data:
x = torch.tensor([10.0, 2.3])
print(x)Out:
tensor ([10.0000, 2.3000])Get the size:
print(x.size())Out:
torch.Size ([2])Operations
Addition
x = torch.rand(3,4)
y = torch.rand(3,4)
print(x + y)Out:
tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
[1.0705, 0.1865, 1.0091, 0.9641],
[1.0066, 0.1990, 1.2998, 0.5362]])Another syntax for addition:
print(torch.add(x,y))Out:
tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
[1.0705, 0.1865, 1.0091, 0.9641],
[1.0066, 0.1990, 1.2998, 0.5362]])Providing an output tensor as argument
result = torch.empty(3, 4)
torch.add(x, y, out=result)
print(result)Out:
tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
[1.0705, 0.1865, 1.0091, 0.9641],
[1.0066, 0.1990, 1.2998, 0.5362]])In-place
x.add_(y)
print(x)Out:
tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
[1.0705, 0.1865, 1.0091, 0.9641],
[1.0066, 0.1990, 1.2998, 0.5362]])NoteAny operation that mutates a tensor in-place is post-fixed with an . For example: x.copy(y), x.t_(), will change x.
Resizing
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8) # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())Out:
torch.Size ([4, 4]) torch.Size ([16]) torch.Size ([2, 8])Numpy Bridge
Converting a Torch Tensor to a NumPy Array
x = torch.ones(10)
print(x)Out:
tensor ([1, 1., 1., 1., 1., 1., 1., 1., 1., 1.])y = x.numpy()
print(y)Out:
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]Converting NumPy Array to Torch Tensor
import numpy as np
a = np.ones(10)
b = torch.from_numpy(a)
np.add(a, 2, out=a)
print(a)
print(b)Out:
[3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
tensor ([3, 3., 3., 3., 3., 3., 3., 3., 3., 3.], dtype = torch.float64)Neural networks
Neural networks can be constructed using the torch.nn package.
Define the network
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyNet(nn.Module):
def __init__(self):
super().__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = MyNet()
print(net)Out:
MyNet (
(conv1): Conv2d (1, 6, kernel_size = (5, 5), stride = (1, 1))
(conv2): Conv2d (6, 16, kernel_size = (5, 5), stride = (1, 1))
(fc1): Linear (in_features = 400, out_features = 120, bias = True)
(fc2): Linear (in_features = 120, out_features = 84, bias = True)
(fc3): Linear (in_features = 84, out_features = 10, bias = True)
)You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.
The learnable parameters of a model are returned by net.parameters()
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weightOut:
10
torch.Size ([6, 1, 5, 5])Loss function
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.
There are several different loss functions under the nn package. A simple loss is: nn.MSELoss which computes the mean-squared error between the input and the target.
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)Out:
tensor([[-0.0669, 0.1497, -0.0850, 0.0584, -0.1100, -0.0551, 0.0742, 0.0469,
-0.0051, 0.0237]], grad_fn=<AddmmBackward>)
tensor(1.0492, grad_fn=<MseLossBackward>)Backprop
To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.
Now we shall call loss.backward(), and have a look at conv1’s bias gradients before and after the backward.
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)Out:
conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([-0.0004, 0.0102, 0.0056, 0.0015, 0.0002, -0.0062])