Skip to content

Tensors

vincentherrmann edited this page Jun 21, 2016 · 12 revisions

In this package, a Tensor is a MultidimensionalData object, that has numbers as elements. At the moment most implementations are constrained to Float as element type.

tensor operations

On the MultidimensionalData page, we've already seen examples of different operations on tensors. Many of them are very flexible and powerful, they can be used on tensors with any number of modes in any combination. The notation however is quite verbose and maybe a bit obscure. As an example, we look at how the weight gradient of a neural net layer gets computed:

let activationDerivative = activationFunction.derivative(currentPreactivations)
let previousActivations = previousLayer!.currentActivations

let preactivationGradient = multiplyElementwise(a: gradientWrtOutput, outerModesA: [], b: activationDerivative, outerModesB:[])
let weightGradient = multiply(a: previousActivations, remainingModesA: previousActivations.modeCount - 1, b: preactivationGradient, remainingModesB: preactivationGradient.modeCount - 1)

The role, that each mode of these tensors has for the operation, must be specified in additional arguments. When writing and reading this code, we have to think about all properties the tensors could potentially have, and keep them in our memory. The activations of the neural net for instance could come in a batch, or as a single vector. Because of this, the remaining modes in the multiply function cannot be a fixed number, but must be calculated to be the last mode in the tensor.

abstract indices

A clearer and more elegant way to use these operations is provided by abstract indices. We can give each mode in a tensor an abstract index in form of the TensorIndex enum. These indices are simply latin or greek letters. The easiest way to assign abstract indices to a tensor is with a subscript: let tensorWithIndices = tensor[.i, .j, .k]. Tensors are structs, which means that they are passed by value. The abstract indices get passed with them of course, but furthermore, they are retained in a sensible way through all operations.
There are two constraints regarding abstract indices: First, in a single tensor, no modes can have the same abstract index. And second, if two tensors meet in an operation, modes with the same abstract index must have the same size.

Operations based on abstract indices act differently on these modes with the same abstract index. For example, for tensor addition, subtraction, element wise multiplication and division, the modes with the same abstract index are taken as inner modes, all others as outer modes. For tensor multiplication, this corresponds to the Einstein notation: Modes with the same abstract index are summed over (summationModes) and vanish, the other modes remain in the product (remainingModes).

This allows the clear and unambiguous use of infix operators for the most common mathematical tensor operations (+, -, °* and °/ for element wise multiplication and division, * for Einstein tensor multiplication). The difference of two tensor products can therefore be written as:

let difference = (t1[.i, .j] * t2[.j, .k]) - (t3[.i, .l, .k] * t4[.l])

In many cases, it can be useful to store abstract indices as local variables with meaningful names, and maybe even markdown quick help descriptions. The weights gradient in a neural net layer would look like this:

/// mode for samples in batch
let batch = TensorIndex.a
/// mode for neurons in the previous layer
let prev = TensorIndex.b
/// mode for neurons in this layer
let this = TensorIndex.c

let activationDerivative = activationFunction.derivative(currentPreactivations)[batch, this]
let previousActivations = previousLayer!.currentActivations[batch, prev]

let preactivationGradient = gradientWrtOutput[batch, this] °* activationDerivative
let weightGradient = previousActivations * preactivationGradient 

The correct result is calculated in a clear, intuitive way. Because the weight tensor has [prev, this] as abstract indices, a gradient descent update could simply be weight = weight - 0.1 * weightGradient. If we are not sure if a tensor has the right abstract indices, or we want to consciously change them, we can just reassign them using subscripts. One useful detail detail hereby is, that, if the actual tensor has fewer modes than assigned indices, the first indices are omitted. If the tensors had no batch mode, this example would work just as well.

For even more flexibility regarding the number of modes of a tensor, the .uniquelyIndexed(excludedIndices:) method can be used to automatically create unique indices for each mode of this tensor. This comes in handy for multilinear subspace learning algorithms like the multilinear principal component analysis.

initialization

contravariance and covariance

A property of mathematically narrow defined, geometric tensors. If such a tensor exists in a non-cartesian space, each modes can transform in two different ways if the basis vectors change, either contravariant or covariant. For data tensors, like we are using here, the isCartesian property is by default true, so it makes no difference.