Immediate Mode Model/View/Controller

This book is a work in progress, comments are welcome to: johno(at)johno(dot)se

Web paradigms

The previously covered MVC architecture (with the formalization / abstraction introduced by IEventTarget) places any networking squarely in between Controller and Model. This means that in a remote case, Model logically lives on the server, which Controller and View live on the client. The client, again, has a local Model as well, but this is simply a "local cache / mirror image" of the "real" Model that lives on the server.

This type of architecture works very well with fast-action games of all sorts, but is based on an important assumption, namely that Model state is appropriately sized so that it can fit into memory on both a client and a server. Network bandwidth involved in synchronizing Model between server and client is also an issue, however with a Quake 3 Arena-style delta compression or some other clever trickery this can often be worked around.

But what about cases where it is unfeasible or undesirable to have a local Model on the client to handle all game state? Cases that come to mind are MMORPG style games where the amount of data present on the server is too large and also not all of interest to any given client.

Also common to these types of games are gui-heavy use-cases that are tightly lock stepped with what is happening on the server, which turn into network-protocol nightmares when approached via the IEventTarget architecture simply because the number of input commands available to clients is so numerous (i.e. the IEventTarget interface becomes very large, and serialization of commands is lots of work).

A solution to these kinds of problems, which of course can be used in conjunction with the earlier architecture, is to take a look at how the WWW works.

WWW is based on a powerful paradigm, namely "thin clients" (browsers) and a "load on demand" approach to resource access. While many game programmers will scoff at the low-fi interactivity typical of browsing the internet, and indeed "web-people" themselves are looking for more client-centric applications in order to present a more compelling user interface (AJAX / Web 2.0), one must not overlook an inherent aspect of the web paradigm:

"The Controller is on the server!"

Think about what this means in terms of the previously presented MVC architecture. As mentioned, we had Model on the server, and Controller and View on the client. In web-programming (although this isn't often clearly expressed in the various technologies available), Model AND Controller reside on the server, and only View is on the client, represented by a "dumb terminal", namely the web browser.

"So what?" you may say. Think in terms of interfaces, and think about the problem posed by MMORPGS with extremely wide interfaces / protocols between client and server. Then think about the WWW, and all the different things you can do given a relatively simple client like a web-browser. Why is all that possible? I argue that the simple reason for all this power is that the interface to the View (HTML) is both sufficiently narrow to be easily expressed AND sufficiently general to handle very many cases.

Think about what HTML is. Ignore funky client-side trickery like JavaScript for the time being. HTML is a "page description language" at best. What is this in game-programmer terms? Concepts like "display list", "scene graph", or "gui form" come to mind. So basically, if you superimpose these ideas onto what we have been doing earlier, we see that HTML and the browser is equivalent to our application specific View interface.

Now start thinking about hybrid architectures. It would be easy to solve the real-time / action aspects of a given game using the MVC / IEventTarget architecture and enjoy all aspects of programming for the local model and having things work the same over the network. However, for gui-intensive stuff which may also be tightly coupled to server logic, some kind of "webby" paradigm would be very useful.

Think about what this means for the Controller, which now would be in the same address space as the Model (on the server). It becomes very simple indeed to react to user input and make changes directly to a Model which is now local (or abstracted by some cool database technology, but there is no conceptual difference).

I will now explore a tangent to my career as a game developer, and come full circle with these ideas.

IMWEB

Web technologies (like Java EE) do a very good job on the server side of things, but most of the evolving View/presentation level technologies are imho going astray. Why? Because of the traditionally retained way of approaching user interfaces. JSP is a terrible mess.

When faced with a nastily designed web project and learning that web applications' user interfaces tend to be as retained as anything else, I became very interested in trying to apply an IMGUI approach to web applications. At the time I was not completely clear on the proper segmentation of Controller and View; indeed in my games both concepts were encapsulated in what I called View (reminiscent of Microsofts Document/View architecture used in MFC gui projects).

While messing about with this, I realized (with help from Daniel Toll) that being forced to user a "dumb" web browser was in fact forcing me to better understand exactly what goes where in MVC. In traditional descriptions of MVC, it is said that Controller "decides what is presented". It seemed to me that this is what View should do, but if the browser is the View, how then to program it in an application specific manner? HTML isn't a Turing-complete programming language!

Java Servlets as Controllers

In reading about Java Servlets, JSP, as well as other presentation layer technologies like Apache Struts, it became obvious to me that the people with experience in all of this were clearly viewing Java Servlets as the (to me) mystical Controller. Examples showed Servlets looking at databases (the Model) and deciding what to show to the user (controlling the View) by either writing some HTML to the output or redirecting to a static HTML page.

It was in understanding all of this that I realized that there was an opportunity to be realized here. What if the whole View aspect of web programming wasn't as exploded as it tends to be (i.e. lots of different HTML documents mixed with JSP and servlets and lots of redirection). It seemed to me that the capabilities of Servlets to dynamically generate HTML output while themselves being implemented in plain Java was too powerful a capability to ignore, even though it seemed that the general concensus of the web community was that JSP was an "upgrade" to the "pain" of having to generate HTML using simple println() statements.

Truly dynamic HTML Views

I did some experiments. My holy grail was to be able to program web applications (guis) in the same way that I could with IMGUI, i.e. the good old doButton() call, because no way in hell was I going down the old retained mode path again. I started thinking about Casey's concept of "the gui context", and realized that this is exactly what traditional MVC is thinking of when it talks about the View.

Controller "programs" View to display what Controller wants to display. This is exactly what you do with an IMGUI gui-context. The gui-context doesn't know anything about the app, it just exposes methods for querying input and putting widgets on the screen. The penny finally dropped and I realized that I could use the browser in the exact same way.

So what does this mean in terms of interfaces? Basically, the browser needs to communicate events like button clicks and text editing, the exact things you can typically query for in an IMGUI. Interestingly enough, this is exactly how HTTP GET requests work. They say, "give me this document, here is my input". So, if I just used a Java Servlet as a Controller, and created a "gui-context" that translated all my doButton() / doText() / doInput() calls into HTML, and have all form actions point to the same page all the time (back to the same Controller), everything would work the same way is IMGUI does.

It worked! The "gui-context" IS the View. And finally I understand it all!

Where does the application state go?

As mentioned, the Model is typically a database, but doesn't have to be. With Java Servlets you can store state in a number of different data stores that persist between requests (remember that HTTP itself is stateless).

With games and real-time IMGUIS, you often have some state that is specific to the Controller, for example if certain options are enabled or not, what is the current Controller, etc. I solve all of this in IMWEB by storing the Controller itself inside the API-provided "session context". This is a data store that is persistent for as long as the browser session lasts, implemented in some obscure way via cookies and / or URL rewriting (I don't know or care, it just works).

At the top level (the Servlet) I check on incoming request if there is a Controller object stored in the session context. If not, this is the first request of a new session and so I create a new Controller and store it in the session context. If there is a previous Controller, this means that a session is active and I reuse the stored Controller. This is how Controller state is preserved, and one of the key reasons why everything can be handled by a single "Meta-Controller"; indeed, just like in C++ IMGUIs you can have a single class that handles every single logical "page/screen" of your gui. Any required intermediate state can be normal Java types that are members of the Controller class, and you avoid messing around with the type-unsafe and messy data stores other than for the single Controller instance itself.

Frame Shearing

There were some details of course. One thing that you don't really think about in IMGUI with real-time apps is that there is something happening which I call "frame shearing".

What this means is that in the flow of the Controller, you typically will write:

if(applicationState == 1 && myView.doButton(params))
applicationState = 2;

if(applicationState == 1)
anEventTarget.someMethod();

What happens in the first if statement is that a button click results in a change to application state that potentially controls the appearance of the gui itself (because drawing is interleaved). This is the "frame shear", i.e. some parts of the gui may be rendered based on a given application state, while other parts are rendered based on a new state.

Now, in real-time you rarely see this, because your frame rate is so high. At worst you might see some "flickering" in parts of your gui, but normally you just don't care.

However in the case of web applications (barring any funky AJAX), you don't have any kind of real time loop. The browser will do the initial request to get the initial gui (without any input), and when you click on a button you request the same page but with some input (the button click).

This is where the realization hit me. In real-time IMGUI you are indeed always clicking on a button that was rendered on a previous frame, because otherwise you couldn't see it and wouldn't know to click! Indeed this is how my C++ input polling works; doButton() will poll for mouse clicks that have happened since last time I checked, in essence "already happened".

So, the same idea works for Java/Web/HTML. The first request to the Controller generates a gui with the initial state. Clicking on a button reloads the same page, but supplies information about what button was clicked; so this is the user input. Since the Controller has already generated the gui once, it has assigned logical ids to each widget as part of the rendering process (I just used an incrementing integer), and the code that the browser later sees is "hard coded" to use these ids. When the user clicks on a button, the Controller code sees this when rendering the resulting gui in that the doButton() call returns true.

All of this is implemented by having the View logically encapsulate "input and output", which in Java Servlet terms is the input HTTPServletRequest (the parameters of the HTTP GET) and the output HTTPServletResponse (the resulting HTML output). The Controller both queries for input AND renders output as a result of each widget call (i.e. doButton() polls input and generates output). Again, this is the exact same approach as I use for real-time IMGUIS, and since I don't use any kind of widget id there I'm really glad that I got away with not needing them in the web case.

In the web case however, frame shearing DOES matter, as there is no point in sending an invalid web page to a client. I solved this problem by having the Controller manually throw a ShearingException? whenever application state changed as a result of a button click. It is easy for the programmer to figure out when to "shear", as the programmer is the one changing the state, and he also knows exactly how the Controller depends on that state in order to render the gui.

The use of ShearingException? is mainly to avoid having to handle boolean return values in cases when the call stack in the Controller is deep (as is typical in IMGUIS). The exception will propagate all the way up to the Servlet, which will simply clear the View's widget cache for this frame and call the Controller again to render the view. In practice this only happens 0-1 times, as there is a new request per single user interaction. When the Controller method finally succeeds without throwing a ShearingException?, the View is allowed to render HTML from the widget list (note that this isn't done until the very end to avoid having to mess around with clearing the output HTTPServletResponse).

Some have argued that there might be cases where the id's don't match up, as quite a bit of time can elapse between a page being requested and any input coming back to the Controller as a result of user input. In this time span, some state (in the Model) might change in a way as to affect the output of the gui rendering, which may result in widget ids not matching up for subsequent client input.

Ways to handle this will probably be very application specific, and care must be taken with parameter validation (as usual). The main difference is that instead of the client submitting invalid data, he will here potentially "click the wrong button".

Summary

The above shows that it is is possible to do IMGUI style coding in web applications, what I call IMWEB, which is really great! Next, we will look at what this might mean for games that are gui intensive and that can't have all Model state local on the client.

What does this mean for games?

With the IMWEB background in place, it should be clear how to exploit this for games.

A game client can (via arbitrary transport) send a message to a server, and the server can respond with a stream of data that logically represents HTML in the web case. This is the "screen program", the data that the client interprets in order to show things on the screen.

This is where things get interesting. The client in this case is "dumb", it can only show graphical elements on the screen as well as send "gimme gfx and btw here's my input" requests, which can all be the exact same protocol and need only include any possible input in basically the same form as a HTTP GET (key=value pairs). The server response is "application specific HTML", typically information about what widgets (each with an id) and graphics to show.

This is where you can get really funky. Firstly, you can be really implicit in what the server tells the client to show. The client can be "fat" in that there are a number of arbitrarily complex, possibly artist-designed, gui-templates that the server can reference, and any widget information that the server sends can be logically matched to "slots" within these templates, i.e:

template login
namelabel Enter your username
textinput 0
passwordlabel Enter your password
passwordinput 1
button 2 Login!

In the above example, the client could show a pre-designed template "login", set the label "namelabel" to "Enter your username", show a textinput with id 0 (similar with password), and finally a button with id that displays the text "Login!".

The similarity to HTML is obvious. The server-side Controller has simply sent WHAT to display, but HOW to display it or WHERE on the screen or anything else can be up to the View and be very specific to your application.

The network traffic from this point on can also be very custom. The client can for example send new "gimme gfx!" requests only upon button clicks, or the protocol may allow for the server to send new "display lists" at will and at any time, potentially overriding what the user is doing at any time. Again, this doesn't NEED to be HTTP (request-response), it can be anything you want!

Speaking of being implicit, it should also be obvious the kinds of compression you could achieve with this. The above example could easily be compressed into a bytestream of:

37 00 23 45 01 00 24 68 02 03 55

where:

template id 37 (login)
widget type 00 (text), slot 23 (namelabel), localized text 45 (Enter your username)
widget type 01 (textinput), id is implicit in the order of widgets
widget type 00 (text), slot 24 (passwordlabel), localized text 68 (Enter your password)
widget type 02 (passwordinput), id is implicit in the order of widgets
widget type 03 (button), id is implicit in the order of widgets, localized text 55 (Login!)

Gains

There are two main points here:

First, the narrowness of the View interface we are dealing with here, as opposed to the "app-domain logical client-server command" interface that would be required with the previous architecture, is a huge gain.

Second, access to the Model is greatly simplified for the Controller, as they can reside in the same address space.

Furthermore, a very nice "data hiding" effect is in place here. The client is truly "dumb" and completely isolated from data in the Model. The Controller is not really exposed to visualisation details, it is more concerned with WHAT data to show to the client, and what to do as a result of client input. Details of HOW to present this data (as well as WHERE on the screen, with what bling-bling, etc) is still up to the View.

A server will have to handle the concept of a "session context", but in typical games you already have an established network connection with a remote client, which means you have some place to store data associated to a given client. Again, the amount of data that is purely part of a Controller is often completely negligable, and most of the data you will be traversing to render a gui is indeed part of the Model.

I firmly believe that this architecture, i.e. the Web Paradigm, is very appropriate even for games when faced with large numbers of complex user interfaces that are tightly coupled to data on the server that is inappropriate to co-localize to the client.

Leveraging HTTP

It is often the case in MMORPGS that ultimately be a real relational database at the very back end of the architecture. This is a good thing, because if there are some requests that are infrequent and non real-time, you can keep them out of the game server completely. Given that you enable your clients to speak HTTP (which in itself is trivial), you can leverage any number of existing web-backend technologies to talk to the database and return your application specific View protocol over HTTP. PHP, ASP, ASP.NET, Java EE, the list goes on...

Another interesting idea would be to allow these backends to analyze the agent that issued the request, and return application specific View protocol for game clients, while returning valid HTML for web browsers, giving both you and your users several alternative entry points to the system.

At Spell of Play we currently use C++ HTTP clients (the games) to talk to backends implemented in both ASP and PHP with great results. For us, being able to host our backend services on standard web hotels with PHP and mySQL support is a huge win.

Community Services and GML

I propose the following architecture for access to "community services" from within games and any other applications that have customized rendering.

For the backend, we continue with our use of LAMP on the backend (Linux/Apache/MySql/PHP), as it is great to be able to host this stuff on cheap web hotels.

We implement a community backend in PHP, based on IMWEB principles. This means that all requests go to a single central PHP Controller (i.e. www.spellofplay.com/community/index.php), and all client session state is retained by the webserver using standard PHP session management tools. This PHP script does not output HTML, but rather GML (Game Markup Language) which I will explain below.

GML is intended to be a lightweight replacement for HTML, specialized and simplified for our specific SoP game needs. I suggest that GML only support the following tags/concepts:

text - a simple text string
button - a pushable button with an optional string label
radio - a pushable button with an boolean active/inactive state and optional string label
input - a text input field
password - a password input field (i.e. characters not readable)

In addition to this, either we support the teletype paradigm of HTML with some kind of layout tags (i.e. HTML <br>/<table>), or alternatively just have each above tag also specify screen position and alignment in percent of screen size, for example:

text 50 10 center "Hello world!" - a text string ("Hello world!") at 50% in x and 10% in y, with horizontal alignment being centered
button 50 90 "Exit" - a button with the label "Exit" at 50% in x and 90% in y

It is essential that GML have a very simple syntax that is human readable/writable, as we will be writing PHP that outputs GML.

Next, we implement a core class called gml::Browser. This will be a standard replacement for HTTPRequest, and talk to the PHP backend using HTTP. gml::Browser will handle all communication, parsing of incoming GML, as well as formatting and sending the outgoing requests.

Each application/game can use this system very simply. gml::Browser will require access to an gml::IView reference, which will expose the following interface (roughly):

virtual void doText(...) = 0;
virtual bool doButton(...) = 0;
virtual bool doRadio(...) = 0;
virtual void doInput(string&) = 0;
virtual void doPassword(string&) = 0;

The idea is that gml::IView exposes all concepts of the GML protocol itself, in that there are text, buttons, inputs, etc. This is mainly an output interface, in that the application implements the various methods by drawing something on the screen. However, communication back to the backend is supported in that doButton() and doRadio() are expected to return true when the user has clicked on whatever visualisation the implementation drew to the screen. Similary, doInput() and doPassword() are expected to allow editing of the string reference passed by the caller.

Note of course that this is standard IMGUI reasoning, i.e. the gml::IView is stateless from gml::Browser's point of view. One might for example let the games View/IMGUI-context directly inherit gml::IView, and thereby adapt the interface by directly calling native IMGUI methods after converting the screen-placement information).

With this simple system in place, each game automatically has access to any and all community functionality that we care to implement, all controlled centrally from our PHP backend. The important thing here is that all session and Controller logic occurs on the server, freeing the clients (the games) completely from any kind of local logic. When it comes to interacting with the backend, they are reduced to being specialized web-browsers.

I argue that we are already very close to this solution due to our semi-standardized use of HTTPRequest. However, right now we are still only getting data back from the backend, and having to deal with both parsing and visualization in each individual client to HTTPRequest. With the GML "framework" in place, each app is isolated from parsing details, and is also free to implement the visualisation of the GML "widgets" in any way it sees fit (even disregarding certain widgets, and/or some of the information supplied, such as screen placement, etc).

For example, with highscores, each game currently handles parsing of the output from the backend scripts (each in a custom manner), as well as the logic of when to get scores and when to post scores, including optional user input in text fields etc. All of this can be done without any Controller logic residing in the game clients.

Using different backend URLs per game, we are free to customize the GML output from the backend to better match the game in question, hiding or exposing certain functionality as we see fit, similar to a website with numerous entrypoints. This will for example be useful for selecting the correct highscore table to display for the game in question.

There is of course always the risk of going to far with this solution, and especially the game-specific parameterization is something that might get out of control, resulting in a very hard to maintain PHP backend. Also, I don't think that we should use this to replace ALL user interfaces within a given game, as many things are truly local and should stay that way.

Again, the important point here is that this solution will allow us to incrementally grow the end-user interface to the backend (in terms of content and interactable functionality) without requiring constant maintenance to each client both in terms of user interface and network protocols.

Current Status
I have implemented a GML browser with plugin View support, and implemented both a text view (not very useful) and a graphical view using GDI+. HTTPRequest has been extended to support cookies, and session support is implemented.

Back to index...