# Getting acquainted with torch tensors

0
11  Two days in the past, I launched `torch`, an R bundle that gives the native performance that is delivered to Python customers by PyTorch. In that put up, I assumed primary familiarity with TensorFlow/Keras. Consequently, I portrayed `torch` in a method I figured could be useful to somebody who “grew up” with the Keras method of coaching a mannequin: Aiming to deal with variations, but not lose sight of the general course of.

This put up now modifications perspective. We code a easy neural community “from scratch”, making use of simply one among `torch`’s constructing blocks: tensors. This community might be as “uncooked” (low-level) as could be. (For the much less math-inclined folks amongst us, it might function a refresher of what’s really happening beneath all these comfort instruments they constructed for us. However the actual goal is for example what could be executed with tensors alone.)

Subsequently, three posts will progressively present methods to cut back the trouble – noticeably proper from the beginning, enormously as soon as we end. On the finish of this mini-series, you’ll have seen how automated differentiation works in `torch`, methods to use `module`s (layers, in `keras` converse, and compositions thereof), and optimizers. By then, you’ll have lots of the background fascinating when making use of `torch` to real-world duties.

This put up would be the longest, since there’s a lot to find out about tensors: The best way to create them; methods to manipulate their contents and/or modify their shapes; methods to convert them to R arrays, matrices or vectors; and naturally, given the omnipresent want for pace: methods to get all these operations executed on the GPU. As soon as we’ve cleared that agenda, we code the aforementioned little community, seeing all these elements in motion.

## Tensors

### Creation

Tensors could also be created by specifying particular person values. Right here we create two one-dimensional tensors (vectors), of varieties `float` and `bool`, respectively:

``````library(torch)
# a 1d vector of size 2
t <- torch_tensor(c(1, 2))
t

# additionally 1d, however of sort boolean
t <- torch_tensor(c(TRUE, FALSE))
t
``````
``````torch_tensor
1
2
[ CPUFloatType{2} ]

torch_tensor
1
0
[ CPUBoolType{2} ]``````

And listed here are two methods to create two-dimensional tensors (matrices). Notice how within the second strategy, you must specify `byrow = TRUE` within the name to `matrix()` to get values organized in row-major order.

``````# a 3x3 tensor (matrix)
t <- torch_tensor(rbind(c(1,2,0), c(3,0,0), c(4,5,6)))
t

t <- torch_tensor(matrix(1:9, ncol = 3, byrow = TRUE))
t
``````
``````torch_tensor
1  2  0
3  0  0
4  5  6
[ CPUFloatType{3,3} ]

torch_tensor
1  2  3
4  5  6
7  8  9
[ CPULongType{3,3} ]``````

In greater dimensions particularly, it may be simpler to specify the kind of tensor abstractly, as in: “give me a tensor of <…> of form n1 x n2”, the place <…> might be “zeros”; or “ones”; or, say, “values drawn from an ordinary regular distribution”:

``````# a 3x3 tensor of standard-normally distributed values
t <- torch_randn(3, 3)
t

# a 4x2x2 (3d) tensor of zeroes
t <- torch_zeros(4, 2, 2)
t
``````
``````torch_tensor
-2.1563  1.7085  0.5245
0.8955 -0.6854  0.2418
0.4193 -0.7742 -1.0399
[ CPUFloatType{3,3} ]

torch_tensor
(1,.,.) =
0  0
0  0

(2,.,.) =
0  0
0  0

(3,.,.) =
0  0
0  0

(4,.,.) =
0  0
0  0
[ CPUFloatType{4,2,2} ]``````

Many comparable features exist, together with, e.g., `torch_arange()` to create a tensor holding a sequence of evenly spaced values, `torch_eye()` which returns an id matrix, and `torch_logspace()` which fills a specified vary with an inventory of values spaced logarithmically.

If no `dtype` argument is specified, `torch` will infer the info sort from the passed-in worth(s). For instance:

``````t <- torch_tensor(c(3, 5, 7))
t\$dtype

t <- torch_tensor(1L)
t\$dtype
``````
``````torch_Float
torch_Long``````

However we are able to explicitly request a unique `dtype` if we wish:

``````t <- torch_tensor(2, dtype = torch_double())
t\$dtype
``````
``torch_Double``

`torch` tensors reside on a machine. By default, this would be the CPU:

``torch_device(sort='cpu')``

However we may additionally outline a tensor to reside on the GPU:

``````t <- torch_tensor(2, machine = "cuda")
t\$machine
``````
``torch_device(sort='cuda', index=0)``

There’s one other crucial parameter to the tensor-creation features: `requires_grad`. Right here although, I have to ask to your persistence: This one will prominently determine within the follow-up put up.

### Conversion to built-in R knowledge varieties

To transform `torch` tensors to R, use `as_array()`:

``````t <- torch_tensor(matrix(1:9, ncol = 3, byrow = TRUE))
as_array(t)
``````
``````     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9``````

Relying on whether or not the tensor is one-, two-, or three-dimensional, the ensuing R object might be a vector, a matrix, or an array:

``````t <- torch_tensor(c(1, 2, 3))
as_array(t) %>% class()

t <- torch_ones(c(2, 2))
as_array(t) %>% class()

t <- torch_ones(c(2, 2, 2))
as_array(t) %>% class()
``````
`````` "numeric"

 "matrix" "array"

 "array"``````

For one-dimensional and two-dimensional tensors, it is usually potential to make use of `as.integer()` / `as.matrix()`. (One purpose you would possibly need to do that is to have extra self-documenting code.)

If a tensor presently lives on the GPU, you must transfer it to the CPU first:

``````t <- torch_tensor(2, machine = "cuda")
as.integer(t\$cpu())
``````
`` 2``

### Indexing and slicing tensors

Usually, we need to retrieve not an entire tensor, however solely a few of the values it holds, and even only a single worth. In these instances, we discuss slicing and indexing, respectively.

In R, these operations are 1-based, which means that once we specify offsets, we assume for the very first component in an array to reside at offset `1`. The identical conduct was applied for `torch`. Thus, lots of the performance described on this part ought to really feel intuitive.

The best way I’m organizing this part is the next. We’ll examine the intuitive elements first, the place by intuitive I imply: intuitive to the R consumer who has not but labored with Python’s NumPy. Then come issues which, to this consumer, could look extra shocking, however will change into fairly helpful.

#### Indexing and slicing: the R-like half

None of those needs to be overly shocking:

``````t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))
t

# a single worth
t[1, 1]

# first row, all columns
t[1, ]

# first row, a subset of columns
t[1, 1:2]
``````
``````torch_tensor
1  2  3
4  5  6
[ CPUFloatType{2,3} ]

torch_tensor
1
[ CPUFloatType{} ]

torch_tensor
1
2
3
[ CPUFloatType{3} ]

torch_tensor
1
2
[ CPUFloatType{2} ]``````

Notice how, simply as in R, singleton dimensions are dropped:

``````t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))

# 2x3
t\$measurement()

# only a single row: might be returned as a vector
t[1, 1:2]\$measurement()

# a single component
t[1, 1]\$measurement()
``````
`````` 2 3

 2

integer(0)``````

And similar to in R, you may specify `drop = FALSE` to maintain these dimensions:

``````t[1, 1:2, drop = FALSE]\$measurement()

t[1, 1, drop = FALSE]\$measurement()
``````
`````` 1 2

 1 1``````

#### Indexing and slicing: What to look out for

Whereas R makes use of destructive numbers to take away parts at specified positions, in `torch` destructive values point out that we begin counting from the top of a tensor – with `-1` pointing to its final component:

``````t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))

t[1, -1]

t[ , -2:-1]
``````
``````torch_tensor
3
[ CPUFloatType{} ]

torch_tensor
2  3
5  6
[ CPUFloatType{2,2} ]``````

It is a characteristic you would possibly know from NumPy. Identical with the next.

When the slicing expression `m:n` is augmented by one other colon and a 3rd quantity – `m:n:o` –, we’ll take each `o`th merchandise from the vary specified by `m` and `n`:

``````t <- torch_tensor(1:10)
t[2:10:2]
``````
``````torch_tensor
2
4
6
8
10
[ CPULongType{5} ]``````

Typically we don’t know what number of dimensions a tensor has, however we do know what to do with the ultimate dimension, or the primary one. To subsume all others, we are able to use `..`:

``````t <- torch_randint(-7, 7, measurement = c(2, 2, 2))
t

t[.., 1]

t[2, ..]
``````
``````torch_tensor
(1,.,.) =
2 -2
-5  4

(2,.,.) =
0  4
-3 -1
[ CPUFloatType{2,2,2} ]

torch_tensor
2 -5
0 -3
[ CPUFloatType{2,2} ]

torch_tensor
0  4
-3 -1
[ CPUFloatType{2,2} ]``````

Now we transfer on to a subject that, in observe, is simply as indispensable as slicing: altering tensor shapes.

### Reshaping tensors

Modifications in form can happen in two basically other ways. Seeing how “reshape” actually means: maintain the values however modify their format, we may both alter how they’re organized bodily, or maintain the bodily construction as-is and simply change the “mapping” (a semantic change, because it have been).

Within the first case, storage should be allotted for 2 tensors, supply and goal, and parts might be copied from the latter to the previous. Within the second, bodily there might be only a single tensor, referenced by two logical entities with distinct metadata.

Not surprisingly, for efficiency causes, the second operation is most well-liked.

#### Zero-copy reshaping

We begin with zero-copy strategies, as we’ll need to use them at any time when we are able to.

A particular case usually seen in observe is including or eradicating a singleton dimension.

`unsqueeze()` provides a dimension of measurement `1` at a place specified by `dim`:

``````t1 <- torch_randint(low = 3, excessive = 7, measurement = c(3, 3, 3))
t1\$measurement()

t2 <- t1\$unsqueeze(dim = 1)
t2\$measurement()

t3 <- t1\$unsqueeze(dim = 2)
t3\$measurement()
``````
`````` 3 3 3

 1 3 3 3

 3 1 3 3``````

Conversely, `squeeze()` removes singleton dimensions:

``````t4 <- t3\$squeeze()
t4\$measurement()
``````
`` 3 3 3``

The identical might be achieved with `view()`. `view()`, nonetheless, is way more common, in that it permits you to reshape the info to any legitimate dimensionality. (Legitimate which means: The variety of parts stays the identical.)

Right here we have now a `3x2` tensor that’s reshaped to measurement `2x3`:

``````t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t1

t2 <- t1\$view(c(2, 3))
t2
``````
``````torch_tensor
1  2
3  4
5  6
[ CPUFloatType{3,2} ]

torch_tensor
1  2  3
4  5  6
[ CPUFloatType{2,3} ]``````

(Notice how that is totally different from matrix transposition.)

As a substitute of going from two to a few dimensions, we are able to flatten the matrix to a vector.

``````t4 <- t1\$view(c(-1, 6))

t4\$measurement()

t4
``````
`````` 1 6

torch_tensor
1  2  3  4  5  6
[ CPUFloatType{1,6} ]``````

In distinction to indexing operations, this doesn’t drop dimensions.

Like we stated above, operations like `squeeze()` or `view()` don’t make copies. Or, put in another way: The output tensor shares storage with the enter tensor. We will the truth is confirm this ourselves:

``````t1\$storage()\$data_ptr()

t2\$storage()\$data_ptr()
``````
`````` "0x5648d02ac800"

 "0x5648d02ac800"``````

What’s totally different is the storage metadata `torch` retains about each tensors. Right here, the related info is the stride:

A tensor’s `stride()` technique tracks, for each dimension, what number of parts must be traversed to reach at its subsequent component (row or column, in two dimensions). For `t1` above, of form `3x2`, we have now to skip over 2 gadgets to reach on the subsequent row. To reach on the subsequent column although, in each row we simply must skip a single entry:

`` 2 1``

For `t2`, of form `3x2`, the gap between column parts is identical, however the distance between rows is now 3:

`` 3 1``

Whereas zero-copy operations are optimum, there are instances the place they gained’t work.

With `view()`, this will occur when a tensor was obtained through an operation – aside from `view()` itself – that itself has already modified the stride. One instance could be `transpose()`:

``````t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t1
t1\$stride()

t2 <- t1\$t()
t2
t2\$stride()
``````
``````torch_tensor
1  2
3  4
5  6
[ CPUFloatType{3,2} ]

 2 1

torch_tensor
1  3  5
2  4  6
[ CPUFloatType{2,3} ]

 1 2``````

In `torch` lingo, tensors – like `t2` – that re-use current storage (and simply learn it in another way), are stated to not be “contiguous”. One approach to reshape them is to make use of `contiguous()` on them earlier than. We’ll see this within the subsequent subsection.

#### Reshape with copy

Within the following snippet, attempting to reshape `t2` utilizing `view()` fails, because it already carries info indicating that the underlying knowledge shouldn’t be learn in bodily order.

``````t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))

t2 <- t1\$t()

t2\$view(6) # error!
``````
``````Error in (perform (self, measurement)  :
view measurement shouldn't be appropriate with enter tensor's measurement and stride (not less than one dimension spans throughout two contiguous subspaces).
Use .reshape(...) as a substitute. (view at ../aten/src/ATen/native/TensorShape.cpp:1364)``````

Nevertheless, if we first name `contiguous()` on it, a new tensor is created, which can then be (nearly) reshaped utilizing `view()`.

``````t3 <- t2\$contiguous()

t3\$view(6)
``````
``````torch_tensor
1
3
5
2
4
6
[ CPUFloatType{6} ]``````

Alternatively, we are able to use `reshape()`. `reshape()` defaults to `view()`-like conduct if potential; in any other case it is going to create a bodily copy.

``````t2\$storage()\$data_ptr()

t4 <- t2\$reshape(6)

t4\$storage()\$data_ptr()
``````
`````` "0x5648d49b4f40"

 "0x5648d2752980"``````

### Operations on tensors

Unsurprisingly, `torch` gives a bunch of mathematical operations on tensors; we’ll see a few of them within the community code under, and also you’ll encounter heaps extra if you proceed your `torch` journey. Right here, we shortly check out the general tensor technique semantics.

Tensor strategies usually return references to new objects. Right here, we add to `t1` a clone of itself:

``````t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t2 <- t1\$clone()

``````
``````torch_tensor
2   4
6   8
10  12
[ CPUFloatType{3,2} ]``````

On this course of, `t1` has not been modified:

``````torch_tensor
1  2
3  4
5  6
[ CPUFloatType{3,2} ]``````

Many tensor strategies have variants for mutating operations. These all carry a trailing underscore:

``````t1\$add_(t1)

# now t1 has been modified
t1
``````
``````torch_tensor
4   8
12  16
20  24
[ CPUFloatType{3,2} ]

torch_tensor
4   8
12  16
20  24
[ CPUFloatType{3,2} ]``````

Alternatively, you may in fact assign the brand new object to a brand new reference variable:

``````torch_tensor
8  16
24  32
40  48
[ CPUFloatType{3,2} ]``````

There’s one factor we have to focus on earlier than we wrap up our introduction to tensors: How can we have now all these operations executed on the GPU?

## Working on GPU

To test in case your GPU(s) is/are seen to torch, run

``````cuda_is_available()

cuda_device_count()
``````
`````` TRUE

 1``````

Tensors could also be requested to reside on the GPU proper at creation:

``````machine <- torch_device("cuda")

t <- torch_ones(c(2, 2), machine = machine)
``````

Alternatively, they are often moved between gadgets at any time:

``torch_device(sort='cuda', index=0)``
``torch_device(sort='cpu')``

That’s it for our dialogue on tensors — nearly. There’s one `torch` characteristic that, though associated to tensor operations, deserves particular point out. It’s known as broadcasting, and “bilingual” (R + Python) customers will comprehend it from NumPy.

We frequently must carry out operations on tensors with shapes that don’t match precisely.

Unsurprisingly, we are able to add a scalar to a tensor:

``````t1 <- torch_randn(c(3,5))

t1 + 22
``````
``````torch_tensor
23.1097  21.4425  22.7732  22.2973  21.4128
22.6936  21.8829  21.1463  21.6781  21.0827
22.5672  21.2210  21.2344  23.1154  20.5004
[ CPUFloatType{3,5} ]``````

The identical will work if we add tensor of measurement `1`:

``````t1 <- torch_randn(c(3,5))

t1 + torch_tensor(c(22))
``````

Including tensors of various sizes usually gained’t work:

``````t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5,5))

``````
``````Error in (perform (self, different, alpha)  :
The dimensions of tensor a (2) should match the scale of tensor b (5) at non-singleton dimension 1 (infer_size at ../aten/src/ATen/ExpandUtils.cpp:24)``````

Nevertheless, underneath sure circumstances, one or each tensors could also be nearly expanded so each tensors line up. This conduct is what is supposed by broadcasting. The best way it really works in `torch` isn’t just impressed by, however really similar to that of NumPy.

The foundations are:

1. We align array shapes, ranging from the correct.

Say we have now two tensors, one among measurement `8x1x6x1`, the opposite of measurement `7x1x5`.

Right here they’re, right-aligned:

``````# t1, form:     8  1  6  1
# t2, form:        7  1  5``````
1. Beginning to look from the correct, the sizes alongside aligned axes both must match precisely, or one among them needs to be equal to `1`: during which case the latter is broadcast to the bigger one.

Within the above instance, that is the case for the second-from-last dimension. This now offers

``````# t1, form:     8  1  6  1
# t2, form:        7  6  5``````

, with broadcasting taking place in `t2`.

1. If on the left, one of many arrays has an extra axis (or a couple of), the opposite is nearly expanded to have a measurement of `1` in that place, during which case broadcasting will occur as said in (2).

That is the case with `t1`’s leftmost dimension. First, there’s a digital enlargement

``````# t1, form:     8  1  6  1
# t2, form:     1  7  1  5``````

``````# t1, form:     8  1  6  1
# t2, form:     8  7  1  5``````

In response to these guidelines, our above instance

``````t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5,5))

``````

might be modified in numerous ways in which would permit for including two tensors.

For instance, if `t2` have been `1x5`, it will solely have to get broadcast to measurement `3x5` earlier than the addition operation:

``````t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(1,5))

``````
``````torch_tensor
-1.0505  1.5811  1.1956 -0.0445  0.5373
0.0779  2.4273  2.1518 -0.6136  2.6295
0.1386 -0.6107 -1.2527 -1.3256 -0.1009
[ CPUFloatType{3,5} ]``````

If it have been of measurement `5`, a digital main dimension could be added, after which, the identical broadcasting would happen as within the earlier case.

``````t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5))

``````
``````torch_tensor
-1.4123  2.1392 -0.9891  1.1636 -1.4960
0.8147  1.0368 -2.6144  0.6075 -2.0776
-2.3502  1.4165  0.4651 -0.8816 -1.0685
[ CPUFloatType{3,5} ]``````

Here’s a extra advanced instance. Broadcasting how occurs each in `t1` and in `t2`:

``````t1 <- torch_randn(c(1,5))
t2 <- torch_randn(c(3,1))

``````
``````torch_tensor
1.2274  1.1880  0.8531  1.8511 -0.0627
0.2639  0.2246 -0.1103  0.8877 -1.0262
-1.5951 -1.6344 -1.9693 -0.9713 -2.8852
[ CPUFloatType{3,5} ]``````

As a pleasant concluding instance, by broadcasting an outer product could be computed like so:

``````t1 <- torch_tensor(c(0, 10, 20, 30))

t2 <- torch_tensor(c(1, 2, 3))

t1\$view(c(4,1)) * t2
``````
``````torch_tensor
0   0   0
10  20  30
20  40  60
30  60  90
[ CPUFloatType{4,3} ]``````

And now, we actually get to implementing that neural community!

## A easy neural community utilizing `torch` tensors

Our process, which we strategy in a low-level method as we speak however significantly simplify in upcoming installments, consists of regressing a single goal datum based mostly on three enter variables.

We instantly use `torch` to simulate some knowledge.

#### Toy knowledge

``````library(torch)

# enter dimensionality (variety of enter options)
d_in <- 3
# output dimensionality (variety of predicted options)
d_out <- 1
# variety of observations in coaching set
n <- 100

# create random knowledge
# enter
x <- torch_randn(n, d_in)
# goal
y <- x[, 1, drop = FALSE] * 0.2 -
x[, 2, drop = FALSE] * 1.3 -
x[, 3, drop = FALSE] * 0.5 +
torch_randn(n, 1)
``````

Subsequent, we have to initialize the community’s weights. We’ll have one hidden layer, with `32` items. The output layer’s measurement, being decided by the duty, is the same as `1`.

#### Initialize weights

``````# dimensionality of hidden layer
d_hidden <- 32

# weights connecting enter to hidden layer
w1 <- torch_randn(d_in, d_hidden)
# weights connecting hidden to output layer
w2 <- torch_randn(d_hidden, d_out)

# hidden layer bias
b1 <- torch_zeros(1, d_hidden)
# output layer bias
b2 <- torch_zeros(1, d_out)
``````

Now for the coaching loop correct. The coaching loop right here actually is the community.

#### Coaching loop

In every iteration (“epoch”), the coaching loop does 4 issues:

• runs by the community, computing predictions (ahead move)

• compares these predictions to the bottom fact and quantify the loss

• runs backwards by the community, computing the gradients that point out how the weights needs to be modified

• updates the weights, making use of the requested studying charge.

Right here is the template we’re going to fill:

``````for (t in 1:200) {

# right here we'll compute the prediction

### -------- compute loss --------

# right here we'll compute the sum of squared errors

### -------- Backpropagation --------

# right here we'll move by the community, calculating the required gradients

### -------- Replace weights --------

# right here we'll replace the weights, subtracting portion of the gradients
}
``````

The ahead move effectuates two affine transformations, one every for the hidden and output layers. In-between, ReLU activation is utilized:

``````  # compute pre-activations of hidden layers (dim: 100 x 32)
# torch_mm does matrix multiplication
h <- x\$mm(w1) + b1

# apply activation perform (dim: 100 x 32)
# torch_clamp cuts off values under/above given thresholds
h_relu <- h\$clamp(min = 0)

# compute output (dim: 100 x 1)
y_pred <- h_relu\$mm(w2) + b2
``````

Our loss right here is imply squared error:

Calculating gradients the guide method is a bit tedious, however it may be executed:

``````  # gradient of loss w.r.t. prediction (dim: 100 x 1)
grad_y_pred <- 2 * (y_pred - y)
# gradient of loss w.r.t. w2 (dim: 32 x 1)
# gradient of loss w.r.t. hidden activation (dim: 100 x 32)
# gradient of loss w.r.t. hidden pre-activation (dim: 100 x 32)

# gradient of loss w.r.t. b2 (form: ())

# gradient of loss w.r.t. w1 (dim: 3 x 32)
# gradient of loss w.r.t. b1 (form: (32, ))
``````

The ultimate step then makes use of the calculated gradients to replace the weights:

``````  learning_rate <- 1e-4

w2 <- w2 - learning_rate * grad_w2
b2 <- b2 - learning_rate * grad_b2
w1 <- w1 - learning_rate * grad_w1
b1 <- b1 - learning_rate * grad_b1
``````

Let’s use these snippets to fill within the gaps within the above template, and provides it a attempt!

#### Placing all of it collectively

``````library(torch)

### generate coaching knowledge -----------------------------------------------------

# enter dimensionality (variety of enter options)
d_in <- 3
# output dimensionality (variety of predicted options)
d_out <- 1
# variety of observations in coaching set
n <- 100

# create random knowledge
x <- torch_randn(n, d_in)
y <-
x[, 1, NULL] * 0.2 - x[, 2, NULL] * 1.3 - x[, 3, NULL] * 0.5 + torch_randn(n, 1)

### initialize weights ---------------------------------------------------------

# dimensionality of hidden layer
d_hidden <- 32
# weights connecting enter to hidden layer
w1 <- torch_randn(d_in, d_hidden)
# weights connecting hidden to output layer
w2 <- torch_randn(d_hidden, d_out)

# hidden layer bias
b1 <- torch_zeros(1, d_hidden)
# output layer bias
b2 <- torch_zeros(1, d_out)

### community parameters ---------------------------------------------------------

learning_rate <- 1e-4

### coaching loop --------------------------------------------------------------

for (t in 1:200) {

# compute pre-activations of hidden layers (dim: 100 x 32)
h <- x\$mm(w1) + b1
# apply activation perform (dim: 100 x 32)
h_relu <- h\$clamp(min = 0)
# compute output (dim: 100 x 1)
y_pred <- h_relu\$mm(w2) + b2

### -------- compute loss --------

loss <- as.numeric((y_pred - y)\$pow(2)\$sum())

if (t %% 10 == 0)
cat("Epoch: ", t, "   Loss: ", loss, "n")

### -------- Backpropagation --------

# gradient of loss w.r.t. prediction (dim: 100 x 1)
grad_y_pred <- 2 * (y_pred - y)
# gradient of loss w.r.t. w2 (dim: 32 x 1)
# gradient of loss w.r.t. hidden activation (dim: 100 x 32)
w2\$t())
# gradient of loss w.r.t. hidden pre-activation (dim: 100 x 32)

# gradient of loss w.r.t. b2 (form: ())

# gradient of loss w.r.t. w1 (dim: 3 x 32)
# gradient of loss w.r.t. b1 (form: (32, ))

### -------- Replace weights --------

w2 <- w2 - learning_rate * grad_w2
b2 <- b2 - learning_rate * grad_b2
w1 <- w1 - learning_rate * grad_w1
b1 <- b1 - learning_rate * grad_b1

}
``````
``````Epoch:  10     Loss:  352.3585
Epoch:  20     Loss:  219.3624
Epoch:  30     Loss:  155.2307
Epoch:  40     Loss:  124.5716
Epoch:  50     Loss:  109.2687
Epoch:  60     Loss:  100.1543
Epoch:  70     Loss:  94.77817
Epoch:  80     Loss:  91.57003
Epoch:  90     Loss:  89.37974
Epoch:  100    Loss:  87.64617
Epoch:  110    Loss:  86.3077
Epoch:  120    Loss:  85.25118
Epoch:  130    Loss:  84.37959
Epoch:  140    Loss:  83.44133
Epoch:  150    Loss:  82.60386
Epoch:  160    Loss:  81.85324
Epoch:  170    Loss:  81.23454
Epoch:  180    Loss:  80.68679
Epoch:  190    Loss:  80.16555
Epoch:  200    Loss:  79.67953 ``````

This seems prefer it labored fairly effectively! It additionally ought to have fulfilled its goal: Exhibiting what you may obtain utilizing `torch` tensors alone. In case you didn’t really feel like going by the backprop logic with an excessive amount of enthusiasm, don’t fear: Within the subsequent installment, this can get considerably much less cumbersome. See you then!