OffscreenCanvas and Commit

2018-07-04

Chrome is planning to ship OffscreenCanvas.

I know lots of devs that have been wanting that feature for ages so it's exciting to see it finally here. What is OffscreenCanvas? It's basically the ability to draw to a canvas from a web worker.

Drawing a complex scene often takes lots of CPU power. By being able to move all those calculations to a web worker we can make sure the main thread, the one reading the keyboard, responding to the mouse, etc... has all the power it needs to stay responsive.

There was debate for a long time about how it should be done. Ian Hickson wrote one idea orginally and with zero review stuck it in the spec. MDN even documented it though it was never implemented by anyone. I wrote another proposal in around 2012 that pointed out the issues with the one in the spec and suggested another solution. That was never implemented either though it was referenced from time to time as a reminder of some of the issues invovled.

In any case the current solution that chrome appears about to ship is that WebGL and Canvas2D mostly work in workers exactly the same as they do outside if workers. There's a small amount of code you need to write to transfer control of a canvas to some object that will exist inside the worker. The worker then creates a WebGL context or a 2D context and renders just like it would if it was in the main page. Results show up automatically just like they do on the main page. In other words, for those familar with graphics programming, there is no explict present or swapBuffers call. The moment you call one of the rendering functions in the respective APIs the browser "queues a task to do the present/swap" when your event exits.

This is great as it's the path of least surprise. No crazy new changes are needed to your code.

Even better they added requestAnimationFrame to workers so a worker can effectively just do a standard render loop

function render(time) {
  renderScene();
}
requestAnimationFrame(render);

So far so good.

But, ... they are also considering adding something else which is an explicit present/swapbuffers function called commit. Normal JavaScript apps would likely not use this API. Rather it's an API for WebAssembly ports of native games.

The issue they are trying to solve is that most native games run in what's called a "spinloop". They have code like this

   initialize();
   while(!userWantsToExit) {
      readUserInput();
      renderScene();
      glSwapBuffers();
   }

They never pause and never stop rendering they just run as fast as they can in a loop. By adding commit they feel they can better support native ports.

I see several problems with this approach and I hope I've convinced them to put the brakes and do a little more testing before releasing this API.

You can NOT use any other Web APIs with this model!

For those that don't know how JavaScript works it works on an event model. You provide functions to be called when certain events happen. Events include things like key pressed, mouse clicked, button clicked, slider moved, image downloaded, websocket message received, etc..

When one of these events arrives the browser calls the JavaScript function you assigned to that event. Your JavaScrpt runs AND THE BROWSER IS FROZEN until your JavaScript exits. Once your JavaScript exits the browser will run any other events on the list of events waiting to be run.

A spinloop like the one enabled by commit means your JavaScript never exits so you'll never process any other events. In other words, the worker rendering with a commit spinloop CAN NOT USE ANY OTHER WEB APIs. It can not receive messages from the main page. It can not download images. It can not read files or request data from a server. It can not use a websocket.

WOW! An api that removes the ability to use all other APIs!?!?!

When I asked about it I was told the solution is to use SharedArrayBuffers. SharedArrayBuffers are a way for workers and the main thread to share a chunk of memory with each other. They can all read and write from it and so they can use shared array buffers to commicate with each other.

Ok, I guess that works. It sounds like a ton of work. For example there is no way to get the raw data from an image in the current Web APIs. You can download images and use them in Canvas 2D and WebGL but as we've just pointed out you can't use those APIs in a worker using commit. Because you can't get the raw data you also can't download those imaegs in another worker or the main thread and pass them via sharedmemorybuffers into the render/commit worker. Soooo, you're left to write your own image decoders throwing away a bunch of the web API again. This is one reason why webassembly apps are so bloated is they include their own versions of image loading libraries even though libraries already exist in the browser.

I suppose that's a minor thing but that's not the end of it.

The next issue is what happens when your page that has worker that's using commit is not the front tab. This is a problem too.

With a normal requestAnimationFrame loop the only thing that happens is the browser stops sending animation frame events. It's still fully able to deliever other events. Events for fetching json, events for loading images, events loading other data. Your program can keep responding.

With commit it was suggested they'd just block commit forever until the tab is put in the front again. The problem then is that you've got the main page and or workers still receiving mesages but when they try to communicate those messages to the rendering worker that worker never responds. It's frozen. This will be a HUGE source of race bugs. Developers think their code works only to find it fails is subtle and hard to reproduce ways depending on when the user switches tabs. Not good.

Okay, so they suggested maybe they can throttle commit. They'll call it just once a second for example. Unfortunately we can show that's not a solution. Many GPU pages (and many even non-GPU pages) can be really slow. Here's a page that's really slow at least on my machine. When it's the front tab I can barely type. I'm glad that page exists as it's super educational so I think pages like that should exist. If I make some other tab the front tab my machine is back to normal and responsive. Now imagine if that page used commit and commit was only throttled. Imagine it was called once per second. The experience would be horrible and my machine would still seem unusable as once a second my machine would hiccup as it processes that graphics page offscreen. So no, throttling is not a viable solution. whatever solution happens must stop the rendering period.

So what solutions are their?

Well why do we need commit at all?

The reason some people think we need commit is because they want to support native ports to webassembly. A typical spinloop based C/C++ program might have some code like this

void main() {
  KeyboardSystem* keySys = new KeyboardSystem();
  GraphicsSyatem* gfxSys = new GraphicsSystem();
  DataSystem* dataSys = new DataSystem();

  GameData* gameData = dataSys.loadData();

  bool done = false;
  while(!done) {
    done = keySys.checkKeyboard();
    gfxSys.renderScene(gameData);
    glSwapBuffers();
  }

  gameData.cleanup();
  dataSys.cleanup();
  gfxSys.cleanup();
  keySys.cleanup();
}

The problem they see is that this code can't work in the browser's current system. Like I mentioned above the browser only calls your code via events. Your code needs to exit so that the browser can then process the next event. The code above never exits. If it does exit then keySys, gfxSys, dataSys and gameData would all be cleaned up which is not what we want.

Of course programmers can refactor their code so this isn't a problem but the people pushing for the commit are trying to to make it so those developers don't have to change their code and things will just work.

Here comes a place where we disagree. First, the amount of work to refactor that code is small. Of course the example above is small but I suspect even large native code bases would not take that much work to refactor to work with events. You'd need a few globals or singletons but otherwise you just split up your code

static KeyboardSystem* keySys;
static GraphicsSyatem* gfxSys;
static DataSystem* dataSys;

static GameData* gameData;

void init() {
  KeyboardSystem* keySys = new KeyboardSystem();
  GraphicsSyatem* gfxSys = new GraphicsSystem();
  DataSystem* dataSys = new DataSystem();

  GameData* gameData = dataSys.loadData();
}

void render()
   keySys.checkKeyboard();
   gfxSys.renderScene(gameData);
}

void cleanup() {
  gameData.cleanup();
  dataSys.cleanup();
  gfxSys.cleanup();
  keySys.cleanup();
}

now call init then call render on a requestAnimationFrame loop just like JavaScript. What was so hard?

Second is that even if native developers don't have to refactor that code there are tons of other places they have to refactor. There is no path from native to browser that does not require a bunch of work if you want users to have a good experence. As a simple example I tried porting some native code. The first thing I had to do was refactor to be event based. The app came up. But, then I needed to deal with the fact that the native app was hardcoded to, at compile time, decide what keys to use for Windows, Mac, Linux. That doens't work in the browser where depending on what machine the page is viewed the keys need to change at runtime. Ctrl-C vs Cmd-C for copy etc. For that particular app it would have been far more work to make it do the correct thing at runtime instead of compile time than it was to refactor to make it event based.

That wasn't the end of it though. Next up was the clipboard support. The native code was designed to expect it could read the clipboard on demand but that's not how the clipboard works in the browser. In the browser the user presses Paste (Ctrl-V or Cmd-V etc) and only then is the clipboard made available to the app via a clipboard event. In this way the page can't read the clipboard as data is being passed to other apps. It can only read it when the user has pasted into this app.

And those were just the start. The apps that use a spinloop are 99% games. Non-games are more often than not event based. Games have lots of issues needing far more data that most other native apps. No user wants to wait 5mintes to an hour for all that data to download so if the ported native apps hope to have any kind of audience they need to refactor to stream the data and ideally start up with a minimal amount of data while they continue to download the rest in the background.

They also need to be able to save state, read mods, and lots of other things which change drastically and all of which require lots of work to be a good user experience in a browser.

My point being that just adding commit will not be enough. There's a ton of work involved in bringing a native app to the browser and it not having a very bad user experience. By adding commit it just makes it slightly easier to barf bad content on the web. That shouldn't be encoraged. If devs are going to bring their native app to the browser they need to actually do the work to make it a good experience. Refactoring to be event based is the least of their problems.

I hope that at least gives some creedence to the idea that we shouldn't use the fact that many native games use a spinloop as an arguement to support spinloops. Let them refactor their apps.

The bigger issue is I don't believe there is actually a solution to the issues above about blocking commit in a spinloop. If you block I guarantee there will be race issues. The next most obvious solution is to provide some kind of API that lets developers stop rendering. They can use the focus and blur events to do their own throttling or commit can return some value saying effectively "the next time you call me I'm going to freeze so you'd better get ready". Another idea is the browser runs the spinloop a few more iterations but some other API lets the spinloop worker check if it's going to be frozen.

It really doesn't matter which of those solutions happen. The problem is they are solutions that require perfection. Developers are told do, A, then B, then C and it will work. Yet we know with 100% certainty that will not happen. Developers, especially web developers, never do things perfectly. An API that requires perfection to work correctly will basically never work correctly on the web. To put it another way, if developers have to deal with all the race conditions that come up from using sharedarraybuffers with a commit function that can block at anytime then likely the majority of pages will have race conditions that trigger randomly.

IMO that's not a solution we should chose. rAF just works. If you're not the front page rAF does not get called but other events still get processed. You can still communiate with the worker. Blocking commit doesn't work. The moment it's blocked ZERO commication with that worker can happen. You can't rescue it or nudge it out of it's blocked state. It's blocked. And, as pointed out above, thottling is not a solution.

So, in summary, I would argue commit should NOT be added to the set of web APIs period. Require devs to refactor to use events is the only reasonable solution IMO.

Comments
Game Credits Out Of Control