Clap
0|0|

Intro to Web Assembly#

Modern web browsers expose a set of APIs related to the user interface and user interactions (DOM, CSS, WebGL, etc.) as well as an execution environment for working with these APIs which executes JavaScript. WebAssembly, abbreviated Wasm, is a type of code which was created to be run inside this browser execution environment as an additional language alongside JavaScript. The purpose is therefore not to replace JavaScript but rather to augment it in situations which have different constraints.

Wasm is a language but as its name implies it is akin to a low level assembly language which is not meant to be written by hand, in contrast with JavaScript. Rather Wasm is intended to be a compilation target for higher level languages. As we will discuss, the strongly typed nature of Wasm along with the low-level memory model, imply that the currently most suitable languages for compiling to Wasm are C++ and Rust.

As part of the specification of Wasm, no web specific assumptions are made, thus although the original intent was to introduce a low level language to the web, the design allows Wasm to be used in a variety of other contexts as well.

The primary goals of Wasm are safety, speed, efficiency, and portability. Wasm is safe as code is validated and executed in a sandboxed environment that guarantees memory safety. JavaScript in the browser is also executed in a sandboxed environment which is part of where the idea comes from. In more traditional contexts, languages that are translated to machine code and run directly on host hardware are harder to secure in this fashion. As Wasm is designed to be translated to machine code it provides near native performance. There are still a few performance penalties to the environment that differentiate Wasm from truly native code, but the gap is continuing to narrow as the platform matures.

The representation of Wasm code is designed so that the process of transmitting, decoding, validating, and compiling is streamable, parallelizable, and efficient. In other words, a binary format was created for Wasm based on all of the learnings from the growth of JavaScript on the web over the past several decades.

Type System#

Wasm has four value types, abbreviated valtype:

  • i32

  • i64

  • f32

  • f64

These types represent 32 and 64 bit integers, as well as 32 and 64 bit floating point numbers. These floating point numbers are also known as single and double precision as defined by IEEE 754. The integer types are not signed or unsigned in the spec even. Do not be confused by the fact that the term i32 is the Rust syntax for a signed 32 bit integer. As we will see later we can use either unsigned or signed integer types in Rust code, e.g. both i32 and u32, with the signedness in Wasm inferred by usage.

Wasm has functions which map a vector of value types to a vector of value types:

function = vec(valtype) -> vec(valtype)

However, the return type vector is currently limited to be of length at most 1. In other words, Wasm functions can take 0 or more arguments and can either return nothing or return a single value. This restriction may be removed in the future.

Memory#

Wasm has a linear memory model which is just a contiguous vector of raw bytes. Your code can grow this memory but not shrink it. Data in the memory region is accessed via load and store operations based on an aligned offset from the beginning of the memory region. Access via an offset from the beginning of the region is where the linear name comes from. This memory region is exposed by a Wasm module and can be accessed directly in JavaScript. Sharing memory is dangerous but is the primary way by which JavaScript and Wasm can interact performantly.

Execution#

The WebAssembly computational model is based on a stack machine. This means that every operation can be modeled by maybe popping some values off a virtual stack, possibly doing something with these values, and then maybe pushing some values onto this stack. Each possible operation is well-defined in the Wasm specification as to exactly how a specific opcode interacts with the stack. For example, imagine we start with an empty stack and we execute the following operations:

The first operation has two parts, i64.const is an opcode which takes one argument, 16 in this case. Thus, i64.const 16 means push the value 16 as an i64 constant onto the top of the stack. Similarly the second operation pushes 2 onto the stack leaving our stack to look like:

The next operation i64.div_u is defined to pop two values off the top of the stack, perform unsigned division between those two values and push the result back onto the top of the stack. In this case the first number popped off divides the second number popped off, so at the end of this operation the stack contains:

Rust in the browser#

Rust uses the LLVM project as the backend for its compiler. This means that the Rust compiler does all of the Rust specific work necessary to build an intermediate representation (IR) of your code which is understood by LLVM. From that point, LLVM is used to turn that IR into machine code for the particular target of your choosing.

Targets in LLVM can roughly be thought of as architectures, such as x86_64 or armv7. Wasm is just another target. Hence, Rust can support Wasm if LLVM supports Wasm, which luckily it does. This is also one of the ways that C++ supports Wasm through the Clang compiler which is part of the LLVM project.

Rust installations are managed with the rustup tool which makes it easy to have different toolchains (nightly, beta, stable) as well as different targets installed simultaneously. The target triple for wasm is wasm32-unknown-unknown. The target triples have the form <arch>-<vendor>-<sys>, for example x86_64-apple-darwin is the default triple for a modern Macbook. The unknown vendor and system mean to use the defaults for the specific architecture. We can install this by running:

The Smallest Wasm Library#

Let's create a Wasm library that does absolute nothing but generate a Wasm library. We start off by creating a library with Cargo:

We need to specify that our library will be a C dynamic library as this is the type of compilation format that Wasm uses. Depending on the target architecture this is also how you would build a .so on Linux or .dylib on Mac OS. We do this by adding the lib section to our Cargo.toml:

We can then remove all of the code in src/lib.rs to leave a blank file. This technically is a Wasm library, just with no code. Let's make a release build for the Wasm target:

By default, Cargo puts the build artifacts in target/<target-triple>/<mode>/ or target/<mode> for the default target, which in this case is target/wasm32-unknown-unknown/release. Inside that directory, we have a do_nothing.wasm file which is our "empty" Wasm module. But how big is it:

1.4M to do nothing! That's insane! But it turns out it is because the output binary still includes debug symbols.

Stripping out debug symbols can be done with a tool called wasm-strip which is part of the WebAssembly Binary Toolkit (WABT). The repository includes instructions on how to build that suite of tools. Assuming you have that installed somewhere on your path, we can run it to strip our binary:

Clap
0|0
 

This page is a preview of Fullstack Rust

No discussions yet. Be the first. All notification go to the author.