Skip to content

Latest commit

 

History

History
267 lines (267 loc) · 14.3 KB

ml.md

File metadata and controls

267 lines (267 loc) · 14.3 KB

wasi-nn is a WASI API for performing machine learning (ML) inference. The API is not (yet) capable of performing ML training. WebAssembly programs that want to use a host's ML capabilities can access these capabilities through wasi-nn's core abstractions: graphs and tensors. A user loads an ML model -- instantiated as a graph -- to use in an ML backend. Then, the user passes tensor inputs to the graph, computes the inference, and retrieves the tensor outputs.

This example world shows how to use these primitives together.

  • Imports:
    • interface wasi:nn/tensor
    • interface wasi:nn/errors
    • interface wasi:nn/inference
    • interface wasi:nn/graph

All inputs and outputs to an ML inference are represented as tensors.


Types

tensor-dimensions

The dimensions of a tensor.

The array length matches the tensor rank and each element in the array describes the size of each dimension

The type of the elements in a tensor.

Enum Cases

tensor-data

The tensor data.

Initially conceived as a sparse representation, each empty cell would be filled with zeros and the array length must match the product of all of the dimensions and the number of bytes in the type (e.g., a 2x2 tensor with 4-byte f32 elements would have a data array of length 16). Naturally, this representation requires some knowledge of how to lay out data in memory--e.g., using row-major ordering--and could perhaps be improved.


Functions

Params
Return values
Params
Return values
Params
Return values
Params
Return values

TODO: create function-specific errors (WebAssembly#42)


Types

Enum Cases

Functions

Params
Return values

Return the error code.

Params
Return values

Errors can propagated with backend specific status through a string value.

Params
Return values
  • string

An inference "session" is encapsulated by a graph-execution-context. This structure binds a graph to input tensors before compute-ing an inference:


Types

error

#### `type tensor` [`tensor`](#tensor)

#### `type tensor-data` [`tensor-data`](#tensor_data)

#### `resource graph-execution-context`


Functions

Define the inputs to use for inference.

Params
Return values

Compute the inference on the given inputs.

Note the expected sequence of calls: set-input, compute, get-output. TODO: this expectation could be removed as a part of WebAssembly#43.

Params
Return values

Extract the outputs after inference.

Params
Return values

A graph is a loaded instance of a specific ML model (e.g., MobileNet) for a specific ML framework (e.g., TensorFlow):


Types

error

#### `type tensor` [`tensor`](#tensor)

#### `type graph-execution-context` [`graph-execution-context`](#graph_execution_context)

#### `resource graph`

Describes the encoding of the graph. This allows the API to be implemented by various backends that encode (i.e., serialize) their graph IR with different formats.

Enum Cases

Define where the graph should be executed.

Enum Cases

graph-builder

The graph initialization data.

This gets bundled up into an array of buffers because implementing backends may encode their graph IR in parts (e.g., OpenVINO stores its IR and weights separately).


Functions

Params
Return values

Load a graph from an opaque sequence of bytes to use for inference.

Params
Return values

Load a graph by name.

How the host expects the names to be passed and how it stores the graphs for retrieval via this function is implementation-specific. This allows hosts to choose name schemes that range from simple to complex (e.g., URLs?) and caching mechanisms of various kinds.

Params
Return values