Getting started with PyTorch

Before You Begin

Make sure you have Python installed in your system, otherwise, take a look at Setting up Python development environment

Note
This guide is a practical guide that I followed from PyTorch tutorial

Introduction

PyTorch is a Python-based scientific computing package targeted at two sets of audiences:

A replacement for NumPy to use the power of GPUs
a deep learning research platform that provides maximum flexibility and speed

Installation

To install PyTorch, please take a look at Install PyTorch

Getting Started

Tensors Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

import torch

Construct a 3x6 matrix, uninitialized:

x = torch.empty(3, 6)
print(x)

Out:

tensor ([[0.0000e + 00, 0.0000e + 00, 0.0000e + 00, 0.0000e + 00, 0.0000e + 00, 0.0000e + 00],
        [0.0000e + 00, 0.0000e + 00, 0.0000th + 00, 0.0000th + 00, 1.4854e-42, 0.0000th + 00],
        [0.0000e + 00, 4.7428e + 30, 0.0000th + 00, 0.0000th + 00, 0.0000th + 00, 0.0000th + 00]])

Construct a randomly initialized matrix:

x = torch.rand(3, 6)
print(x)

Out:

tensor ([[0.1346, 0.0086, 0.8915, 0.2503, 0.3227, 0.2293],
        [0.1182, 0.6759, 0.4475, 0.7565, 0.1387, 0.1020],
        [0.8162, 0.7662, 0.6263, 0.6072, 0.4265, 0.0322]])

Construct a matrix filled zeros and of dtype long:

x = torch.zeros(3, 6, dtype=torch.long)
print(x)

Out:

tensor ([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

Construct a tensor directly from data:

x = torch.tensor([10.0, 2.3])
print(x)

Out:

tensor ([10.0000, 2.3000])

Get the size:

print(x.size())

Out:

torch.Size ([2])

Operations

Addition

x = torch.rand(3,4)
y = torch.rand(3,4)

print(x + y)

Out:

tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
        [1.0705, 0.1865, 1.0091, 0.9641],
        [1.0066, 0.1990, 1.2998, 0.5362]])

Another syntax for addition:

print(torch.add(x,y))

Out:

tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
        [1.0705, 0.1865, 1.0091, 0.9641],
        [1.0066, 0.1990, 1.2998, 0.5362]])

Providing an output tensor as argument

result = torch.empty(3, 4)
torch.add(x, y, out=result)
print(result)

Out:

tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
        [1.0705, 0.1865, 1.0091, 0.9641],
        [1.0066, 0.1990, 1.2998, 0.5362]])

In-place

x.add_(y)
print(x)

Out:

tensor ([[1.6089, 1.2154, 1.1974, 1.0267],
        [1.0705, 0.1865, 1.0091, 0.9641],
        [1.0066, 0.1990, 1.2998, 0.5362]])

Note
Any operation that mutates a tensor in-place is post-fixed with an . For example: x.copy(y), x.t_(), will change x.

Resizing

x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

Out:

torch.Size ([4, 4]) torch.Size ([16]) torch.Size ([2, 8])

Numpy Bridge

Converting a Torch Tensor to a NumPy Array

x = torch.ones(10)
print(x)

Out:

tensor ([1, 1., 1., 1., 1., 1., 1., 1., 1., 1.])

y = x.numpy()
print(y)

Out:

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Converting NumPy Array to Torch Tensor

import numpy as np
a = np.ones(10)
b = torch.from_numpy(a)
np.add(a, 2, out=a)
print(a)
print(b)

Out:

[3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
tensor ([3, 3., 3., 3., 3., 3., 3., 3., 3., 3.], dtype = torch.float64)

Neural networks

Neural networks can be constructed using the torch.nn package.

Define the network

import torch
import torch.nn as nn
import torch.nn.functional as F

class MyNet(nn.Module):
    def __init__(self):
        super().__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = MyNet()
print(net)

Out:

MyNet (
  (conv1): Conv2d (1, 6, kernel_size = (5, 5), stride = (1, 1))
  (conv2): Conv2d (6, 16, kernel_size = (5, 5), stride = (1, 1))
  (fc1): Linear (in_features = 400, out_features = 120, bias = True)
  (fc2): Linear (in_features = 120, out_features = 84, bias = True)
  (fc3): Linear (in_features = 84, out_features = 10, bias = True)
)

You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.

The learnable parameters of a model are returned by net.parameters()

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

Out:

10
torch.Size ([6, 1, 5, 5])

Loss function

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the nn package. A simple loss is: nn.MSELoss which computes the mean-squared error between the input and the target.

input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
output = net(input)
target = torch.randn(10)        # a dummy target, for example
target = target.view(1, -1)     # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Out:

tensor([[-0.0669,  0.1497, -0.0850,  0.0584, -0.1100, -0.0551,  0.0742,  0.0469,
         -0.0051,  0.0237]], grad_fn=<AddmmBackward>)
tensor(1.0492, grad_fn=<MseLossBackward>)

Backprop

To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call loss.backward(), and have a look at conv1’s bias gradients before and after the backward.

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

Out:

conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([-0.0004,  0.0102,  0.0056,  0.0015,  0.0002, -0.0062])

Last modified October 4, 2020