Sunday, 25 February 2018

Running JavaScript on GPU

GPUs have played crucial role in computer graphics during the last two decades. But recently GPUs have taken over the tasks performed by CPU for compute intensive applications. These applications include data analysis which involves dealing with large multi-dimensional arrays, parallel processing and cryptocurrency mining applications for processing 256-bit hashes. 

To enable faster execution of certain JavaScript code it can be run on the GPU. The gpu.js library makes running JavaScript on GPU convenient for developers.  Fig 1.0 shows the flow for gpu.js library where the JavaScript code is converted to shader language which is then run on GPU. In case there is no GPU then code directly runs on CPU. 

Fig 1.0 Flow Diagram for gpu.js


npm install gpu.js --save
script tag:
The js file can directly be downloaded from GitHub repository and included as static file.
<script src="/path/to/js/gpu.min.js"></script>
const gpu = new GPU();
Before using this library in the code a new instance of GPU object can be created for later use.
Basically the code which is to be run on the GPU is put in a 'kernel' function which will return a single output. Additionally multiples out puts can be returned using a 'kernel map' which processes multiple math operations. 

Using 'createKernel' method developers can define the code that is to be run on GPU. The following code is used to multiply a 512 x 512 matrix and get the output.

const gpu = new GPU();
const multiplyMatrix = gpu.createKernel(function(a, b) {
  var sum = 0;
  for (var i = 0; i < 512; i++) {
    sum += a[this.thread.y][i] * b[i][this.thread.x];
  return sum;
 output: [512, 512]

let a = [[10,...,5120],...,[10,...,5120]];//developer defined matrix;
let b = [[0.1,...,5.12],...,[0.1,...,5.12]];//developer defined matrix;
let output = multiplyMatrix(a, b);
The first argument for 'gpu.createKernel' method is the function that contains the code to be run on GPU and the second argument is the configuration options object. The following options are provided in the API:
  • output: array or object that describes the output of kernel.
    • as array: [width][width, height], or [width, height, depth]
    • as object: { x: width, y: height, z: depth }
  • outputToTexture: boolean
  • graphical: boolean
  • loopMaxIterations: number
  • constants: object
  • wraparound: boolean
  • hardcodeConstants: boolean
  • floatTextures: boolean
  • floatOutput: boolean
  • functions: array or boolean
  • nativeFunctions: object
  • subKernels: array
In the sample code above 'multiplyMatrix(a, b)' executes the code inside the kernel where a, b are the two matrices that developers can pass as arguments. The thread identifiers 'this.thread.x' and 'this.thread.y' correspond to item positions in the matrix. 

Multiple kernels can be combined into single kernel to get a single output.
const add = gpu.createKernel(function(a, b) {
	return a[this.thread.x] + b[this.thread.x];

const multiply = gpu.createKernel(function(a, b) {
	return a[this.thread.x] * b[this.thread.x];

const combinedKernel = gpu.combineKernels(add, multiply, function(a, b, c) {
	return multiply(add(a, b), c);

combinedKernel(a, b, c);
The 'gpu.combinedKernels' method is used to combine many different kernels. The last argument is a function containing combined logic. The first few arguments contain other dependencies for the combination. 

Similarly multiple kernels can be run at once using the 'gpu.createKernelMap' method and it will return outputs for each individual kernel.

const kernelMap = gpu.createKernelMap({
  addResult: function add(a, b) {
    return a[this.thread.x] + b[this.thread.x];
  multiplyResult: function multiply(a, b) {
    return a[this.thread.x] * b[this.thread.x];
}, function(a, b, c) {
	return multiply(add(a, b), c);

kernelMap(a, b, c);
// Result: { addResult: [], multiplyResult: [], result: [] }
Here 'addResult' and 'multiplyResult' kernels are individual kernels that are put into a kernel map. The final argument of 'gpu.createKernelMap' function can utilize individual kernels to get a result. The final result is in form of an object where 'addResult' corresponds to 'addResult' kernel, 'multiplyResult' corresponds to 'multiplyResult' kernel and 'result' corresponds to combined kernel. 

This library is very useful for tasks where large matrix manipulation is needed like machine learning and image processing. In addition multiple math operations can be done at once using 'createKernelMap' and 'combinKernels' methods without affecting performance. Overall this is a good library to increase code performance by running it on available GPU. 

More details about the API can be found at .

Wednesday, 24 January 2018

Siri visualization in Browser

Sound recording and playback are not visually appealing features in most web and mobile applications. But since sound is a 3D wave it can be visualized easily in 2D via x-y plane where amplitude and period decide the shape of the wave. Lately with the introduction of Siri in iOS devices the visualization of sound has become an eye-catching UI for the user. In this blog post a single web page is implemented to record sound. The sound is visualized using siriwavejs library to make the process of recording better presentable.

Fig 1.0 iOS 9 style

Fig 2.0 Default style

Technical Background

Fig 3.0 Web Audio Block Diagram

The block diagram in Fig 1.0 shows  how the audio is processed from source to the destination node. The 'source' in this case is the microphone from user's computing device. 
The 'Analyser' represents the AnalyserNode which is used to get the frequency data. The 'Processor' node represents the ScriptProcessorNode which is used to detect change in audio signal via its '.onaudioprocess' event handler and the amplitude data is also collected using ScriptProcessorNode. The 'destination' is the output device(speakers/headphones) for the computer. But in this case the 'Processor' node outputs a buffer with all 0 values for amplitude because no processing is done on the output buffer. 


The frequency data from AnalyserNode is collected using FFT(Fast Fourier Transform). The size for FFT is set to be 4096. Default sampling rate is 48000 Hz for the AnalyserNode.

Sampling Rate: 48000 Hz
Frequency band: (Sampling Rate)/2 = 24000 Hz
FFT size: 4096
#spectral data points: (FFT size)/ 2 = 2048
Spectral data point resolution: (Frequency band)/(FFT size) = 24000/2048 = 11.7185 Hz

From calculations it is evident that AnalyserNode will return 2048 data points for each time step and the resolution of those data points increases by 11.7185 Hz. Each spectral data point represents magnitude in dB for specific frequency resolution.

In the code the speed for siriwave is set using following equation: 
speed =  ((1+spectral data index)*spectral data point resolution)/(Frequency band)
           =   ((1+spectral data index)*11.7185)/24000

'spectral data index' represents the index which has the highest magnitude in dB.


The buffer size used for the script processor node is 1024 with 1 input and 1 output channel. The PCM data is retrieved from this node. It is scaled between -1 and 1 for amplitude at a single time step with a total of 1024 time steps.

In the code the amplitude for siriwave is set using the following equation:
amplitude = (max amplitude)*10

Live Demo Link:

If issues occur directly copy link and paste in browser.

Demo source code:

siwirwavejs library link:

Tuesday, 19 December 2017

WebAssembly at a Glance


Figure 1.0 WebAssembly Evolution

asm.js which first appeared in 2013 and was developed by Mozilla as a sub-set of JavaScript. It is meant to be a compile target for languages like C/C++. This allows porting of existing applications written in C/C++ to run in the browser without the need for any additional third party library.  Using asm.js significantly increases performance when compared to same code written in JavaScript. Although asm.js runs on all browsers( it has highest performance on Firefox. 

Web assembly is a new compile target being standardized by W3C along with companies such as Google, Microsoft, Apple and Mozilla. Although asm.js performed close to native speed it was not consistent across all browsers but WebAssembly promises to perform consistently. Web assembly is influenced by asm.js and it can be thought of as an optimized version of asm.js for faster performance. 

Compared to asm.js WebAssembly provides the following advantages to developers:

  • Reduced load time.
  • Reduced run-time memory consumption for code.
  • Smooth performance.
  • Full set of capabilities for C/C++ applications with almost native performance.

Current Spec:
Figure 2.0 WebAssembly Pipeline (Stable)

Figure 3.0 WebAssembly Pipeline (Unstable)

WebAssembly is essentially efficient low level byte code which is fast to load and execute. It is stack machine language that loads values on to the stack to use them for computation later. 

It supports the following value types:

  • i32: 32-bit integer
  • i64: 64-bit integer
  • f32: 32-bit floating point
  • f64: 64-bit floating point
Currently Emscripten can be used to convert existing C/C++ code to '.wasm' files. The conversion pipeline from C/C++ to wasm is visible in figure 2.0 which is the stable pipeline. And figure 3.0 show shows the unstable pipeline as of 12/16/2017. Browser compatibility for WebAssembly can be found at .

Example: Adding two integers
emcc add.c -O3 -s WASM=1 -s SIDE_MODULE=1 -o add.wasm
The above script builds the C file to wasm. It is built as a stand alone module using 'SIDE_MODULE=1' option. And the optimization level is O3.

//File: add.c
int add(int a, int b) {
  return a + b;

WASM(WAST representation):
//File: add.wasm

  (type $type0 (func (param i32 i32) (result i32)))
  (type $type1 (func))
  (import "env" "memoryBase" (global $global0 i32))
  (import "env" "memory" (memory (;0;) 256))
  (import "env" "table" (table $table0 0 anyfunc))
  (import "env" "tableBase" (global $global1 i32))
  (global $global2 (mut i32) (i32.const 0))
  (global $global3 (mut i32) (i32.const 0))
  (export "_add" (func $func0))
  (export "__post_instantiate" (func $func2))
  (export "runPostSets" (func $func1))
  (func $func0 (param $var0 i32) (param $var1 i32) (result i32)
    get_local $var1
    get_local $var0
  (func $func1
  (func $func2
    get_global $global0
    set_global $global2
    get_global $global2
    i32.const 5242880
    set_global $global3

<!doctype html><!-- index.html -->
    <meta charset="utf-8">
    <title>WASM add example</title>
        function loadModule(filename) {
          return fetch(filename)
              .then(response => response.arrayBuffer())
              .then(buffer => WebAssembly.compile(buffer))
              .then(module => {
                  const imports = {
                      env: {
                          memoryBase: 0,
                          tableBase: 0,
                          memory: new WebAssembly.Memory({
                              initial: 256
                          table: new WebAssembly.Table({
                              initial: 0,
                              element: 'anyfunc'

                  return new WebAssembly.Instance(module, imports);

      loadModule('add.wasm').then(instance => {
          const add = instance.exports._add;

          console.log(add(1, 6));//7

In the example above a function adding two integers in C is complied to wasm. Here '.wasm' binary file contains complied code but here the intermediary textual representation (s-expression) is shown. The '.wasm' file will be loaded into the browser using the provided JavaScript code. Finally the function 'add' can be called by passing in two arguments 1 and 6 which leads to 7. 


    JSperf link:

Fig 4.0 Performance Measurement Fibonacci Sequence

This performance setup implements two versions of Fibonacci sequence generator methods. One is slow(recursive version) and the other one is fast(iterative version). Both JS and wasm implementation are compared and it is evident that wasm is the fastest.

Learning/Exploration Tools:

WebAssembly Explorer(

Fig 5.0 WebAssembly Explorer

This is a great tool for exploring and understanding WebAssembly. Developers can input their C/C++ code in the first pane and this tool automatically compiles that code to appropriate 'wast' format visible in the second pane. It then converts 'wast' to 'x86' assembly format on the right most pane. 

Alternatively users can input 'wast' directly and it will be converted to 'x86' assembly.

Developers can toggle 'LLVM x86 Assembly'  to allow LLVM compiler to do the conversion to 'x86' assembly rather than Firefox's compiler. 

There are also built-in examples and other features that developers can choose from the options pane.

WebAssembly Fiddle(

Fig 6.0 WebAssembly Fiddle

This is tool can be used to share WebAssembly snippets with others using unique URL. It has 4 panes on each corner.

Top-left pane: The top-left pane is where developers can write C/C++ code.

Top-right pane:This is where the JavaScript resides for utilizing the wasm module.

Bottom-left: This is where the compiled 'wast' can be seen. There is also an options drop down to show other representations of the compiled C/C++ code.

Bottom-Right: This is where the output of the compiled program can be seen for debugging purposes. 

WebAssembly Binaries(

Fig 7.0 WebAssembly Binary Explorer

This tool is good for understanding the binary representation of WebAssembly. On the left pane there is binary representation and on the right pane developers can open a 'wasm' or 'wast' file. 


The development team is trying to improve on the following aspects:

  • Improve the WebAssembly LLVM backend from experimental to stable implementation in Emscripten.
  • Add support for more languages to be source for wasm compile target.
  • Backward compatibility of new WebAssembly features.
  • Developer tools integration for WebAssembly in the browsers.
  • Add support for multi-threading.
  • Efficiently allocate and manipulate GC objects directly from WebAssembly code.

Saturday, 21 October 2017

WebSocket Server with JavaScript


Handshake Client/Server:

Fig 1.0 HTTP Handshake

Before the connection is established the client sends a handshake request to the server via GET method. It passes 'Set-WebSocket-Version' header to let server know if a specific  version of WebSocket is required for communication. It also passes 'Set-WebSocket-Key' which is then used by the server to generate a new key which it sends via the 'Set-WebSocket-Accept' header. The key that server sends back to the client is generated from concatenating '258EAFA5-E914-47DA-95CA-C5AB0DC85B11' with the  key set from the client and taking SHA-1 hash of the resulting string. Then the server sends back base64 encoded version of the hash. 


To ensure the connection is alive the client or server may send a ping with opcode 0x9 to each other after the initial HTTP handshake. The client or server need to send a pong with opcode 0xA right after they receive a ping. The pong should have the same payload as it was received in the ping. A pong received without sending a ping should be ignored.

Client Tracking:

To prevent multiple handshakes for the same client each client needs to be tracked by the server. Tracking clients will also help recognize DOS attacks. This is where comes into picture as a server side framework that makes creating a websocket server less cumbersome.

installation: can be easily installed from the npm repository.
npm install --save
server setup: 
The server can be setup using either http module in nodejs or using express. In this case http module is used.
let app = require('http').createServer(handler)
let io = require('')(app);
let fs = require('fs');


function handler(req, res) {
  fs.readFile(__dirname + '/index.html',
  function (err, data) {
    if (err) {
      return res.end('Error loading index.html');


io.on('connection', function (socket) {
  //handle communication here

In the sample code above the 'handler' functions serves the static content like index.html file. This is where the UI for the application may reside. The 'server' object is passed to the '' object so it may also start listening. The 'connection' event is fired as soon as the connection with the client is established.

It should be noted that the port where the app listens should either be 80 or 443 for best results. Other ports may cause issues due to firewalls and proxies.

Information can be received from the 'socket' event.
io.on('connection', function (socket) {
    socket.on('message', function(data){

In the above code the 'message' event is used to receive data. But in practice the client can send data via any event ('abc', 'new york', 'hello').

message single client: allows developers to send message to a single client. This can be done two ways: 

1. Sending message back to the sender-client.
io.on('connection', function (socket) {
    socket.on('message', function(data) {        socket.emit('message', 'testing');
2. Sending message to a specific client using socketId. In the sample code below developers can replace '<socketid>' with specific id. 
io.on('connection', function (socket) {
    socket.on('message', function(data){<socketid>).emit('message', 'message to specific client');

While broadcasting the message is sent to all other clients except the one which has sent the message to be broadcast.
io.on('connection', function(socket) {
    socket.on('message', function(data) {
        socket.broadcast.emit('message', "this is a test");
client tracking:
Clients can be tracked using their specific socket ids. When the client makes a connection the socket object contains the id associated with that particular client.
io.on('connection', function (socket) {
    let socketId=;
    //do something with socket id.

Fig 2.0 Namespaces and Room Tree


In 'Namespace' is a specific endpoint on the same TCP/IP connection. This is helpful if a single connection is to be used for different use cases. By default the clients can connect to '/' endpoint but custom endpoints like '/gaming' and '/messaging' can be specified for the same connection. 
let app = require('http').createServer(handler)
let io = require('')(app);
let chatNameSpace = io.of('/chat');
let gamingNameSpace = io.of('/gaming');
let fs = require('fs');


function handler(req, res) {
  fs.readFile(__dirname + '/index.html',
  function (err, data) {
    if (err) {
      return res.end('Error loading index.html');


chatNameSpace.on('connection', function (socket) {
  //handle chat communication here

gamingNameSpace.on('connection', function (socket) {
  //handle gaming communication here

In the sample code above custom 'Namespaces' have been defined using 'io.of' method. So '/chat' and '/gaming' name  spaces can have separate communication endpoints where the clients connect.


Within each 'Namespace' the communication channels can be further separated into 'rooms'. Clients can join or leave the 'room' which makes the application more dynamic and flexible. 
io.on('connection', function(socket){
  //send message'myroom').emit('message', 'mymessage');


The above code snippet shows how socket can join a room using '.join' method. It can then send message message to 'myroom' and the leave the room using '.leave()' method.

'io' vs 'socket' object:

The 'socket' variable represents a single connection. Each time a new client connects a new 'socket' object is created and passed in the callback and it can only be used to communicate on that connection. The 'io' variable on the other hand represents a group of 'socket' objects for all connections.