Could ImGUI be the future of GUIs?

A random un-thought out idea that came up was could something like Dear ImGUI ever be the future for a mainstream UI library?

For those that don't know what an Immediate Mode GUI or ImGUI is there's a semi famous video by Casey Muratori about it from 2005ish.

Most programmers that use an ImGUI style find it infinitely easier to make UIs with them than traditional retained mode GUIs. They also find them significantly more performant.

The typical retained mode object oriented GUI framework is a system where you basically create a scenegraph of GUI framework widgets. (windows, grids, slider, buttons, checkboxes, etc). You copy your data into those widgets. You then wait for events or callbacks to get told when a widget was edited. You then query the widget's values and copy them back into your data.

This pattern is used in practically every GUI system out there. Windows, WFP, HTML DOM, Apple UIKit, Qt, you name it 99% of GUI frameworks are Retained Mode Object Oriented Scenegraph GUIs.

A few problems with this GUI style are

You have to write lots of code to manage the the creation and destruction of GUI objects.

Like if you have a scrolling list you often have to create 100s or 1000s of GUI widgets (think HTML, create a TR, then TDs, then contents of each TD, etc...) If the data is really large you end up having to make some virtual window of widgets either creating new ones and deleting old as the user scrolls or pulling old ones off the trailing side and adding them to the leading side. It ends up being a lot of code
The creation and destruction problem leads to slow unresponsive UIs

Because creation and destruction of GUI objets is slow (and usually they are extremely large objects) there is usually tons of code it spent finding and designing solutions to minimize the amount of creation and destruction

Think of how React does this by having a virtual dom, diffing the changes and then applying those changes to the actual GUI widgets and DOM tree / scenegraph.
You have to marshal your data into and out of the widgets.

This requires copying it in and then responding to events to read it back out. More code.

In contrast in an ImGUI there are no objects and there is almost no state. The simple explanation of most ImGUIs is that you call functions like

// draw a button
if (ImGUI::Button("Click Me")) {
  IWasClickedSoDoSomething();
}
// draw slider
ImGUI::SliderFloat("Speed:" &someInstance.speed, 0.0f, 100.0f);

Button and Slider do two things.

They append into a vector (array) the positions and texture coordinates needed to draw the widget (or not insert them if they'd be clipped off screen or outside the current window / clip rectangle)
They check the position of the mouse pointer, state of keyboard, etc to manipulate that widget. If the data changed they return it immediately

So, pluses:

There is no memory allocation whatsoever. ZERO!
Speed. Most if not all ImGUIs have zero trouble running at 60fps even with extremely complex UIs and only single threaded
There is no creation and destruction of objects that has to be managed
There is no state, because there are no objects to store state in
There is no little to no marshaling of data.
There are no events or callbacks that need to be registered or responded to

Possible minus

It might use more CPU.

I'm not yet convinced this is always true. Originally retained mode GUIs where designed to try to do the minimal amount of work. So for example you've got a UI like Microsoft Excel. It has 75 toolbar buttons and a spreadsheet showing 300 cells. The input cursor is in cell E7 and it's blinking. If you go back to Windows 3.0 (and earlier) the CPU is drawing pixels (GPUs didn't exist). The GUI system determines that some tiny area only the size of the cursor itself needs to be redrawn and only redraws those pixels directly into screen memory. Similarly if type a letter the system can figure out only cell E7 was updated only cell E7 needs to be redrawn.

On a 1993-1994 computer that was important. The computer could not draw the entire screen at 60fps.

So, that's the best case for a traditional scenegraph based object oriented retained mode GUI

Note that the system still had to walk a giant portion of the GUI scenegraph to compute what the smallest area of affect is. That might not be as much work as re-drawing every pixel but it is lots of work none the less

Contrast that to an ImGUI, the entire GUI is drawn anytime you want anything to change. Even the cursor if we go to the Excel example all 75 toolbar widgets and all 300 cells will be redrawn just for a blinking cursor.

This is the worst case for an ImGUI. Lots of CPU is wasted.

Contrast that to scrolling the spreadsheet. In the scenegraph based GUI assuming you pressed page down, most likely 300 cell widgets will be deleted, 300 new cell widgets will be created, the data for each cell will be copied into each cell widget. From all of that the GUI system will walk over all 300 cells and draw them.

Conversely in the ImGUI case no widgets are deleted, no widgets are created, no data is copied, 300 cells are drawn just like they were before. The amount of CPU work the ImGUI does to update the entire display is easily 10th to 100th the work as the retained mode system.

Which case is more common? I suppose for a text editor only a small amount changes so the scenegraph style wins. For say Instagram or Facebook App people scroll almost constantly in which case the ImGUI wins by a landslide.
Accessability issues

With a retained mode GUI the data for all the widgets has been copied to the GUI's scenegraph. This means the GUI system itself can look at that data and provide different interfaces. (draw it large, speak it, change it to braille, etc.)

With an ImGUI generally the GUI doesn't retain any data so it might not be able to do those type of things a well.

That said, I think it could be an area of research. There might be ways to make an ImGUI handle accessability better than they traditionally do. Most ImGUIs are used for game development and are not targeted at end users but only game developers on the same team. In other words, there's been no incentive to explore solutions.

Perceived but probably not minuses

Styling

I'm not sure how skinnable most retained mode GUIs are. Probably the most stylizable is the HTML DOM with all the 1000s of CSS options.

For ImGUIs it's up to you to style them. Adding more style options even up to almost all of CSS or at least the good parts is probably relatively easy to implement and keep the speed. Even better, you can probably easily opt in or opt out so if your app doesn't need the style why waste memory or CPU cycles dealing with it where as with most retained mode GUIs all that styling data is embedded in every widget whether you use it or not. Comsider HTML where every element has 100s of style settings. Literally 100s.
Animation

For most ImGUIs there's no state so all animation would be up to the app. It's easily imaginable though that a wrapper that kept a little animation state might easily put UI animation back in? In fact as a wrapper it would let you opt into supporting animation only where it's important where as like styling, most retained mode GUIs have tons of data, state, and settings per widget just sitting around for animation whether or not you use it.

I guess I'm really curious. I know most GUI framework authors are skeptical that ImGUIs are a good pattern. AFAICT though, no one has really tried. As mentioned above most ImGUIs are used for game development. It would take a concerted effort to try to find the right patterns to completely replicate something as fancy as say Apple's UIKit. Could it be done and stay performant? Would it lose the performance by adding back in all the features? Does the basic design of an ImGUI mean it would end up keeping the perf and the easy of use? Would we find certain features are just impossible to really implement without a scenegraph?

Let me also add that to some degree React is similar to an ImGUI in usage. React has JSX but it's just a shorthand for function calls. The biggest differences would be

no need for a renderer since since each component would just render immediately.
no hidden virtual dom needed.
no setState. There is no state.
no need to attach events or deal with componentWillMount, componentDidMount, componentWillUnmount etc. because there are no components, just functions, and there are no widgets (DOM elements, native objects, etc..)

If we were to translate the code above into some imaginary ImReact it might be something like

const Button = (props) => {
  return ImGUI:Button(props.caption);
};

const SliderFloat = (props) => {
  return ImGUI:SliderFloat(props.caption, props.value, props.min, props.max);
};

const Form = (props) => {
  if (<Button caption="Click Me">) {
    DoSomething();
  }
  <SliderFloat min="0" max="100" value="&props.speed" caption="Speed:" />
};

Just looking at that React code you can see the translation back into real code is really straight forward.

Not exactly sure how the update to speed would work but I guess I'm mixing C++ (ImGUI) with JavaScript (React). Typical ImGUIs either have the pattern of being able to pass in a pointer to a primitive, something JavaScript doesn't have. Or, they return the new value as in

newValue = ImGUI::SliderFloat(caption, currentValue, min, max);

which if you want to use the same as the Dear ImGUI C++ example you'd write

someInstance.speed = ImGUI::SliderFloat("Speed:", someInstance.speed, 0.0f, 100.0f);

So if we assumed that style of API then

const Button = (props) => {
  return ImGUI:Button(props.caption);
};

const SliderFloat = (props) => {
  return ImGUI:SliderFloat(props.caption, props.value, props.min, props.max);
};

const Form = (props) => {
  if (<Button caption="Click Me">) {
    DoSomething();
  }
  props.speed = (<SliderFloat min="0" max="100" value="{props.speed}" caption="Speed:" />);
};

Notice the components are not returning virtual dom nodes since there's no need. The only thing we're really taking is JSX just to show that you could use a React style pattern if you wanted to.

Note: Don't get caught up in the direct state manipulation in the example. How you update state should not be dicated by your UI library. You're free the manage state anyway you please regardless of which UI system you use. Still the example shows how simple ImGUI style is.

state.value = ImGUI:SliderFloat(caption, value, min, max);

is certainly simpler than

// at init time
const slider = new SliderWidget(caption, state.value, min, max);
slider.onChange = function(newValue) {
  state.value = newValue;
}

// if state.value changed slider needs to show the new value
function updateSlider(newValue) {
  state.value = newValue;
}

Even worse now you need to some how call updateSlider either everywhere state.value is updated or you need to write some elaborate system so that all places that want to update state.value call into a system that tracks all the widgets and what state they reflect.

ImGUI libraries needs no such complication. There is no widget. Every frame whatever value is in the state is what's in the widget. This is the same promise of React but React ends up being hobbled by the fact that it's on top of slow retained mode GUI libraries.

As an example of complexity possible the most prolific ImGUI is Unity's Editor UI.

So at least there is some precedence of using an ImGUI in user facing app instead of just a game even if Unity itself is for making games.

There are also lots of screenshots of various ImGUI made UIs in the readme.

Here is also a live version of the included example in the Dear ImGUI library

If you decide to interact with it be aware that it's not actually been designed for the browser and so has issues that need fixing. Those issues can easily be fixed so don't get bogged down in nitpicking tiny issues. Rather, notice how complex the UI is and yet it's running at 60fps. Use the "examples" menu in the main window and open more windows. Expand the examples in the main window and see all kinds of live and complex widgets. Now imagine you tried to make just as complex UI using HTML/DOM/React. Not only would the HTML/DOM version have lots of pauses and likely not run 60fps but the code to actually implement it would probably be 5x to 10x as much code along multiple dimensions. One dimension is how much code you have to write to implement the UI using HTML/DOM and/or React vs ImGUI. The other dimension is how much code executes to get the UI on the screen. I suspect the amount of CPU instructions executed in the HTML/DOM version is up to 100x more than the ImGUI version.

Consider the ImGUI::Button function vs making <button> element.

For the <button> element

HTMLButtonElement object as to be created.

It has all of these properties that need to be set to something

 autofocus: boolean 
 disabled: boolean 
 form: object 
 formAction: string 
 formEnctype: string 
 formMethod: string 
 formNoValidate: boolean 
 formTarget: string 
 name: string 
 type: string 
 value: string 
 willValidate: boolean 
 validity: object ValidityState
 validationMessage: string 
 labels: object NodeList
 title: string 
 lang: string 
 translate: boolean 
 dir: string 
 dataset: object DOMStringMap
 hidden: boolean 
 tabIndex: number 
 accessKey: string 
 draggable: boolean 
 spellcheck: boolean 
 autocapitalize: string 
 contentEditable: string 
 isContentEditable: boolean 
 inputMode: string 
 offsetParent: object 
 offsetTop: number 
 offsetLeft: number 
 offsetWidth: number 
 offsetHeight: number 
 style: object CSSStyleDeclaration
 namespaceURI: string 
 localName: string 
 tagName: string 
 id: string 
 classList: object DOMTokenList
 attributes: object NamedNodeMap
 scrollTop: number 
 scrollLeft: number 
 scrollWidth: number 
 scrollHeight: number 
 clientTop: number 
 clientLeft: number 
 clientWidth: number 
 clientHeight: number 
 attributeStyleMap: object StylePropertyMap
 previousElementSibling: object 
 nextElementSibling: object 
 children: object HTMLCollection
 firstElementChild: object 
 lastElementChild: object 
 childElementCount: number 
 nodeType: number 
 nodeName: string 
 baseURI: string 
 isConnected: boolean 
 ownerDocument: object HTMLDocument
 parentNode: object 
 parentElement: object 
 childNodes: object NodeList
 firstChild: object 
 lastChild: object 
 previousSibling: object 
 nextSibling: object 
 nodeValue: object 
 textContent: string

More objects need to be created.

Looking above we can see we need to create

NodeList            // an empty list of children of this button
HTMLCollection      // another empty list of children of this button
StylePropertyMap    //
NameNodeMap         // the attributes
DOMTokenList        // the CSS classes as a list
CSSStyleDeclaration // an object used to deal with CSS
DOMStringMap        // empty but used for dataset attributes
ValidityState       // ?? no idea

This is just creation time so far. Tons of properties need to be set to defaults, filled out with empty strings and or other objects need to be created and those objects also need all their properties filled out and as well may need deeper objects created.

Now that an HTMLButtonElememt exists it get inserted into the DOM

At render time the browser will walk the DOM, I'm sure there is some amount of caching but it needs to figure out where the button is. It will likely build some separate internal scene graph separate from the DOM itself which is rendering specific so 1000s more lines of code get executed.

Eventually it will get the to point to render the button. Here again it has to check the 100s of CSS attributes. Text color? Font size? Font Family? Text Shadow? Transform? Animation? Border? Multiple Borders? Background color? Background Image? Background gradient? Is it transparent? Is it on its own stacking context? Literally 100s of options.

Let's assume it's using nothing special, eventually it will generate some quad vertices to render font glyphs. It will likely render these glyphs into a texture or grid of textures for the stacking context. It does this as an optimization so ideally if a different stacking context has its content change but nothing in this stack context changes it can skip re-rendering the texture(s) for this context and just use the one it created last time.

I'm sure there's a 100 other steps I missing related to caching positions, marking things as computed so they don't get recomputed, and on and on.

Compare to ImGUI:Button which is just a function, not an object. All it effectively does is

Clip the button rectangle to the current clip space and exit if it's completely clipped
Insert the vertices for the rectangle of the button into the pre-allocated vertex array
Insert the vertices for each glyph stopping when the first glyph is clipped by the button area.
Return true if the mouse button was pressed and if its position is inside button rectangle, else false.

That's it

Note that those 4 steps also exist in the browser in HTML/DOM land except they are 4 steps of 100s.

So, in summary, ImGUI style is potentially much faster and easier to use. It's both easier to use in the simple case and easier to use in the complex case. The API is easier to use. It's easier to reason about. There is no state. There are no objects. There is no data marshalling. There are no events or callbacks. Because it's so fast when the UI gets complex no giant frameworks like React's virtual dom need to be created. Because of the speed little to no effort is required to workaround slowness like with the DOM. More research into ImGUI style UIs could lead to huge gains in productivity.

games.greggman.com

Could ImGUI be the future of GUIs?

2019-01-27