Understanding WebAssembly

Let’s get the terminology straight:

  • Source code: What a developer writes.
  • Compiler: An application that turns source code into assembly, bytecode or machine code (what other apps or hardware run).
  • Assembly: A low-level source-like language specific to a machine or an application.
  • Bytecode: A low-level binary representation of code that can be run by other applications.
  • Machine code: A binary representation of code that can be run directly by hardware.

WebAssembly aims to be the bytecode for the web. Here is how a developer would use WebAssembly in the future:

  1. Develop an app (write the source code in any language that can be compiled to WebAssembly).
  2. Use a compiler to turn the source code into WebAssembly bytecode (and potentially into assembly-code if required).
  3. Load the bytecode in a browser and run it.

WebAssembly development flow

WebAssembly is meant to fill a place that JavaScript has been forced to occupy up to now: a low-level code representation that can serve as a compiler target. As more and more languages and platforms begin to target the web, more stress is put on JavaScript and browser vendors to provide missing features that are much needed. Some of these features do not play well with the already complex semantics of JavaScript. WebAssembly is the right answer:

  • It was designed as a compiler target from the beginning.
  • It is supported by all major browser vendors.
  • It can diverge from JavaScript semantics as much as needed.

 

History

At Mozilla, a group of hardcore developers tried to provide an answer in the form of asm.js: a subset of JavaScript meant to serve as a compiler target. On the other side, Google worked on Native Client (NaCl) and Portable Native Client (PNaCl), a binary format for the web based on LLVM. Although each of these solutions worked to some degree, they did not provide a satisfactory answer to all the problems. It is from this experience that Web Assembly was born: a joint effort aimed at providing a cross-browser compiler target. The future looks bright for WebAssembly.

WebAssembly is backwards compatible

Backwards-compatibility is an essential feature of the web. WebAssembly will not be an exception: a polyfill will be available for old-browsers. In fact, a prototype is already available. You can see it working here or here.

Fact 4: WebAssembly does not look like CPU assembly

When reading the word “assembly” you might immediately hear “unreadable” in your head. Fortunately, that is not the case for WebAssembly. In contrast to other low-level code representations, or most bytecodes, WebAssembly describes an abstract syntax tree (AST). That’s right, WebAssembly provides higher level constructs such as loops and branches. This means that it is actually possible to write WebAssembly directly, or decompile existing binary files into something that is much more readable than opcodes or instructions. You might be thinking “what about variable names?”. WebAssembly will support adding debugging information to the compiled files.

 

Historically, the VM has only been able to load JavaScript — this has worked well for us, as JavaScript is powerful enough to solve most problems people have on the Web today. We have run into performance problems, however, when trying to use JavaScript for more intensive use cases like 3D games, Virtual and Augmented Reality, computer vision, image/video editing and a number of other domains that demand native performance (see WebAssembly use cases for more ideas).

Additionally, the cost of downloading, parsing and compiling very large JavaScript applications can be prohibitive.  Mobile and other resource-constrained platforms can further amplify these performance bottlenecks.

WebAssembly is a different language to JavaScript, but it is not intended as a replacement. Instead, it is designed to complement and work alongside JavaScript, allowing web developers to take advantage of both language’s strong points:

  • JavaScript is a high-level language, flexible and expressive enough to write web applications.  It has many advantages — it is dynamically typed, requires no compile step, and has a huge ecosystem that provides powerful frameworks, libraries, and other tools.
  • WebAssembly is a low-level assembly-like language with a compact binary format that runs with near-native performance and provides languages with low-level memory models such as C++ and Rust a compilation target so that they can run on the web. (Note that WebAssembly has the high-level goal of supporting languages with garbage-collected memory models in the future.)

With the advent of WebAssembly appearing in browsers, the virtual machine that we talked about earlier will now load and run two types of code — JavaScript AND WebAssembly.

The different code types can call each other as required — the WebAssembly JavaScript API wraps exported WebAssembly code with JavaScript functions that can be called normally and WebAssembly code can import and synchronously call normal JavaScript functions.  In fact, the basic unit of WebAssembly code is called a module and WebAssembly modules are symmetric in many ways to ES6 modules.

WebAssembly key concepts

There are several key concepts needed to understand how WebAssembly runs in the browser.  All of these concepts are reflected 1:1 in the WebAssembly JavaScript API.

  • Module: Represents a WebAssembly binary that has been compiled by the browser into executable machine code.  A Module is stateless and thus, like a Blob, can be explicitly cached or shared between windows or workers (via postMessage()).  A Module declares imports and exports just like an ES6 module.
  • Memory: A resizable ArrayBuffer that contains the linear array of bytes read and written by WebAssembly’s low-level memory access instructions.
  • Table: A resizable typed array of references (e.g. to functions) that could not otherwise be stored as raw bytes in Memory (for safety and portability reasons).
  • Instance: A Module paired with all the state it uses at runtime including a Memory, Table, and set of imported values.  An Instance is like an ES6 module that has been loaded into a particular global with a particular set of imports.

The JavaScript API provides developers with the ability to create modules, memories, tables, and instances.  Given a WebAssembly instance, JavaScript code can synchronously call its exports, which are exposed as normal JavaScript functions.  Arbitrary JavaScript functions can also be synchronously called by WebAssembly code by passing in those JavaScript functions as the imports to a WebAssembly instance.

Since JavaScript has complete control over how WebAssembly code is downloaded, compiled and run, JavaScript developers could even think of WebAssembly as just a JavaScript feature for efficiently generating high-performance functions.

In the future, WebAssembly modules will be loadable just like ES6 modules (using <script type='module'>), meaning that JavaScript will be able to fetch, compile, and import a WebAssembly module as easily as an ES6 module.

How do I use WebAssembly in my app?EDIT

Above we talked about the raw primitives that WebAssembly adds to the Web platform: a binary format for code and APIs for loading and running this binary code.  Now let’s talk about how we can use these primitives in practice.

The WebAssembly ecosystem is at a nascent stage; more tools will undoubtedly emerge going forward. Right now, there are two main entry points:

  • Porting a C/C++ application with Emscripten.
  • Writing or generating WebAssembly directly at the assembly level.

Let’s talk about these options:

Porting from C/C++

The Emscripten tool is able to take C/C++ source code and compile it into a .wasm module, plus the necessary JS API “glue” code for loading and running the module, and an HTML document to display the results of the code.

In a nutshell, the process works as follows:

  1. Emscripten first feeds the C/C++ into clang+LLVM — a mature open-source C/C++ compiler toolchain, shipped as part of XCode on OSX for example.
  2. Emscripten transforms the compiled result of clang+LLVM into a .wasm binary.
  3. By itself, WebAssembly cannot currently directly access the DOM; it can only call JS, passing in integer and floating point primitive data types. Thus, to access any Web API, WebAssembly needs to call out to JavaScript, which then makes the Web API call. Emscripten therefore creates the HTML and JavaScript glue code needed to achieve this.

Note: There are future plans to allow WebAssembly to call Web APIs directly.

The JS glue code is not as simple as you might imagine. For a start, Emscripten implements popular C/C++ libraries like SDL, OpenGL, OpenAL, and parts of POSIX. These libraries are implemented in terms of Web APIs and thus each one requires some JavaScript glue code to connect WebAssembly to the underlying Web API.

So part of the glue code is implementing the functionality of each respective library used by the C/C++ code. The glue code also contains the logic for calling the abovementioned WebAssembly JavaScript APIs to fetch, load and run the .wasm file.

The generated HTML document loads the JavaScript glue file and writes stdout to a <textarea>. If the application uses OpenGL, the HTML also contains a <canvas> element that is used as the rendering target. It’s very easy to modify the Emscripten output and turn it into whatever web app you require.

You can find full documentation on Emcripten at emscripten.org, and a guide to implementing the toolchain and compiling your own C/C++ app across to wasm at Compiling from C/C++ to WebAssembly.

Why is Compiled subset of JS (asm.js) FASTER?

LLVM’s optimizer uses type information to perform many useful optimizations. Decades of work have gone into developing optimization passes for C/C++ compilers.

These optimization are only available for compiled code!

Running them manually on a “normal” JavaScript codebase would be hard and make the code less maintainable

JAVASCRIPT ENGINE OPTIMIZATIONS

  • Modern JavaScript engines infer types at runtime

This especially helps on code that is implicitly typed – which is exactly what compiled code is! Eg

function compiledCalculation() {
var x = f()|0; // x is a 32-bit value
var y = g()|0; // so is y
return (x+y)|0; // 32-bit addition, no type or overflow checks
}

  • Modern JavaScript engines optimize typed arrays very well

  var MEM8  = new Uint8Array(1024*1024);
  var MEM32 = new Uint32Array(MEM8.buffer); // alias MEM8's data

  function compiledMemoryAccess(x) {
    MEM8[x] = MEM8[x+10]; // read from x+10, write to x
    MEM32[(x+16)>>2] = 100;
  }
            

Compiled C/C++ uses a typed array as “memory”

asm.js code avoids potential slowdowns in code: no variables with mixed types, etc.

asm.js code does only low-level assembly-like computation, precisely what compiled C/C++ needs (and hence the name)

ASM.JS – FORMAL TYPE SYSTEM BENEFITS
Type check output of a C/C++ to JavaScript compiler
Type check input to a JavaScript engine at runtime

ASM.JS – RUNTIME OPTIMIZATIONS (1)
Variable types pop out during type checking. This makes it possible to do ahead of time (AOT) compilation, not only just in time (JIT)

ASM.JS – RUNTIME OPTIMIZATIONS (2)
JavaScript engine has a guarantee that there are no speed bumps – variable types won’t change, etc. – so it can generate simpler and more efficient code

ASM.JS – RUNTIME OPTIMIZATIONS (3)
The asm.js type system makes it easy to reason about global program structure: function calls, memory access, etc.

NOT JUST C/C++!
Many languages can be compiled to C, C++ or LLVM IR, which means they can be compiled to JavaScript with the same approach and benefits

JAVA => C => JAVASCRIPT
Demo using  XMLVM and Emscripten

ref:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s