Immediate Mode Model/View/Controller

This book is a work in progress, comments are welcome to: johno(at)johno(dot)se

In the process of writing all of this, I have had various interesting conversations with programmers in the industry about implications of MVC.

in swedish

in english

an email discussion with Samuel A. Falvo II

MVC (Model / View / Controller)

Make sure you have read The Pitch before reading this section. When it comes to games, a useful application of MVC is:

Model = gamestate and logic which operates on this state (i.e. objects)
View = the "input/output" layer, including the renderer and also all user interface windows and widgets
Controller = based on the state of the Model, Controller orchestrates the presentation of the application using View

Model

This is basically the state of the application. If you have a spaceship, its position and velocity etc is part of Model. Any bullets it shoots is also part of Model. If you want to be object oriented, it can be useful to model these concepts as classes, so that the code that operates on the state (i.e. the logic) is coupled to the state itself.

Model may not depend on View or Controller. Model may be global (i.e only a single Model in the system).

View

As stated, this is the input/output layer. All interfaces for querying for user input (gamepad/key/mouse stuff) is here, and output/sound/drawing stuff is here, including user interface widgets. However, beyond allowing for query of user input, this part of the application is completely stateless from a client perspective (immediate mode), the client being the Controller. The View looks like a functional/procedural interface. The reason for this is that the Controller will use View to visualize the Model in a purely functional way, and there should be no assumptions about frame coherence for the client (the Controller).

One important aspect of this is that the user interface stuff needs to be Immediate Mode (indeed the whole View should be). This means no retaining of application state in widgets that persist across frames, and definitely no way for a client to get application state out of the View other than querying for input state; as mentioned, from a Controller's point of view the View simply does not hold any application state this is externally accessible or meaningful. It only exposes user input as well as things like, for example, rendering statistics.

This also means that the popular "scene-graph" design may not be exposed from the View. You are free to do anything you want internally when it comes to clever caching of things, but this may not be exposed to clients. For example, any type of "instance abstraction" to represent a mesh-transform pair in the public interface is illegal. The corresponding interface should be of the form:

view::drawMesh(mesh, transform, anyOtherRenderState);

A way to implement this that maintains these constraints would be a purely functional library or namespace that only includes free functions. Yes, this makes it global, but that isn't a problem in this context; indeed there should be only a single View. However, this does NOT mean that Model may depend on or access View, only Controller may do that.

View may depend on Model (in order to visualize it), but may not depend on Controller. View may be global.

Controller

"based on the state of the Model, Controller orchestrates the presentation of the application using View"

This is an important concept. This essentially means that per frame, Controller will look at (i.e. traverse) Model's state and "program" View to visualize this state in some way. How this is done is up to Controller, hence the name "Controller".

It is very typical for there to be many different Controllers and/or control paths in a given Controller. This is to better compartmentalize different aspects of the application when it comes to how user input is interpreted and how the output (visualisation) is generated.

For example, there is often an ditController that allows for editing of Model, and a PlayController that allows for gameplay. To handle the switching between these different Controllers, it is typical to have a MetaController at the top level. This logically replaces the main "Controller" in the MVC pattern, and allows for switching between the various Controllers. This tends to be a parent/child heirarchy of some sort, in that the MetaController has several Controllers, whom in turn may also have several Controllers (GOF - Composite).

On useful application of the MetaController pattern is to compile several seperate executables that differ in the concrete class of the MetaController, allowing for edit enabled vs only gameplay enabled applications, and also to enable / disable and remote implementations. EXPAND ON THIS IDEA! Jungle Peak example.

A very useful addition to the straight MVC design proposed above is the addition of a "write proxy / event target" that controls all changes of state in Model. The idea can be summed up as follows:

"Reads are free, writes are formalized."

Reading Model state

First of all, View and Controller may only access Model in a const fashion. This has numerous repercussions. Firstly, exposing central Model state as public is ok, as it can only be read. Also, only const methods may be called, so state changes cannot be made internally as a result of a bad function call. This allows for a clear grouping of aspects of the Model into read and write categories.

Just how this is implemented will vary. If Model is global / singleton, care must be taken so that only const access is allowed. I personally let View hold a const Model&, and have the Controller baseclass supply a View&. This way View can access model in a const way, and Controller can access View in a non-const way, and via it Model in a const way. From the top of the App this is:

App owns a Model, a View and a MetaController
View has a const& to Model
MetaController has a & to View, and passes this to each IController implementation

Writing to Model state

Writes to Model are formalized through the addition of IEventTarget. This is a pure virtual interface that defines all possible state changes / events on a system wide level. Controller will be passed an IEventTarget each frame, and any changes it wishes to make to Model must go through this interface.

Model will implement IEventTarget in order for the writes to apply. View and Controller both see this fact, but due to their const access to Model they cannot call these methods directly on their const Model&.

Event callbacks

The central reason for the immediate mode nature of View is as follows: without any manually managed state cache in View, the number of explicit callbacks (traditionally required for synchronization between Model and any cached state inside Controller or View) is kept down. For example, Controller can simply traverse a list of game objects in Model each frame and call View to visualize them, without any overhead of creating or destroying any View object as the contents of the object list changes over time (concepts such as scene-graph "nodes" come to mind here). There is no requirement of explicit "object existence" synchronization between Model and Controller / View.

However, there are often cases of events occurring inside Model's logic that we want to visualize in some way. There are often "flank events" that can be naturally represented as method calls, for example explosions, sound effects, etc that represent some explicit state change within Model. It is often very convenient to have these occur as explicit events (i.e. method calls) instead of trying to derive them from state changes (which often requires caches, which we are trying to avoid).

To do this, it is typical to have Controller/MetaController also implement IEventTarget, and extend the interface to include these "visualisation callbacks". App supplies a reference to IEventTarget to the Model (which is the Controller / MetaController on construction, and Model stores this reference for later callback during runtime.

This also results in a modicum of formalisation within Model's logic, in that when a significant event occurs, the relevant Model implementation that causes the event can also call IEventTarget (which in that case is the Model instance itself) and "re-enter" the system from the top, triggering a "formal" event. Each Model implementation of an IEventTarget method typically does the actual state change, and ends with a "forwarding call" to the stored IEventTarget (which is the Controller), allowing the implemention to visualize each given event as desired.

Why only a single event interface (as opposed to a read/write pair)?

Experience dictates that there only be a single IEventTarget interface that is responsible for all "system events", rather than a "write interface" and a "notification / read" interface (for callbacks). Most often, the exact information that causes a change is the information required to visualise that change, and in other cases this information can be derived and looked up in the Model (by Controller or View).

The architecture used in Ground Control 2 (which evolved into this architecture) was a plain remote proxy architecture, involving an IGame and IPlayer pair. IGame represented the "server" (which is analogous to Model), while IPlayer represented a "client" (which is analogous to both View and Controller, with no real clear definition in between, as well as a cache of state that can be viewed as a subset of Model). A typical call flow would be:

//client call
myGame.CreateEntity();

//server call
for(i = 0; i < NUM_CLIENTS: i++)
myPlayers[i].EntityCreated();

There were a number of inherent problems with this approach.

First, the server/Model was forced to have an internal concept of "players" in order for the remote cases to work, even though the concept of a "player" had no real logical place in the context of the game (an individual player had no specific avatar).

Second, and more critical, was the fact that there was no shared state between a "game" and a "player". This implied many invariants that were difficult to maintain. For example, IPlayer::EntityCreated(id) implied that some later IPlayer method call could reference that id and have it implicitely refer to a unit that was assumed to have been created.

Also, due to the fact that we had several implementations of IPlayer (Player, RemotePlayer, ScriptPlayer, and AIPlayer), the amount of duplication of similar "stateful" concepts, such as the above mentioned "entity" was enormous and ridiculous.

The real realisation that state should ideally be shared between remote nodes stemmed from the experience that the addition of any additional visualisations of / to the system proved difficult. Each IPlayer implementation had its own unique representation of state, very tightly coupled to a given visualisation and access pattern. For example, the main 3d implementation, Player, had internal state tightly bound to the 3d visualisation at hand, and adding a "minimap" view clearly "invaded" these state representations, so that they were exposed to details of both visualisations.

Finally, combining IGame and IPlayer into IEventTarget further increases the robustness of the architecture and makes it even more purely asynchronous. CreateEntity() and EntityCreated() can for example be merged into CreateEntity(), and a client who calls CreateEntity() can gracefully react to a future CreateEntity() and understand it to mean that an entity has been created.

Care must indeed be taken in order for the IEventTarget signatures to include enough information for visualisation, and in some cases IEventTarget does get extended to include events that are really only of interest to a visualising client, which might seem contradictory to a clean MVC splitup. However, I argue that even though Model is designed to be oblivious to and independent of any external clients, without "anyone watching" what was going on in a Model, there would really be no point to having a Model in the first place. Observe that there is no explicitly view-specific information included in any IEventTarget event, it is just that there must be significant information present in order for each given view to be able to visualize appropriately.

Remote proxies and Network abstraction

The initial motivation for the IEventTarget / const Model& formalization was to completely abstract the locality of the IEventTarget implementation (i.e. remote proxy). Using this pattern, network code is completely external to the system. Controller transparently writes to some implementation of IEventTarget (either a Model or a network proxy), and both View and Controller transparently see any changes to Model that may have come from across a network.

Note that this allows the "reads are free, writes are formalized" paradigm be extended across a network. A Controller client who is talking to a remote server is completely isolated from the code that updates the local Model, and can "read for free", but must still write via an IEventTarget. As this formalization is also useful in the local case, it is nice that all components of MVC see the world in the same way regardless of the existence of a network.

Supporting local, "listen", and dedicated server in the same architecture

In the local case, Model will typically hold a reference to an IEventTarget for "event forwarding" (implemented as a C++ reference).

In the "listen" case (i.e. local client and remote clients), the C++ reference still holds. Model will only forward events to the single local IEventTarget, so that the local client operates in the exact same way as in the local case.

To support remote clients, a Server is added. Server is completely external to Model, and is not seen by any part of the MVC architecture. Server handles all connections from remote clients, and calls IEventTarget methods on Model as commands arrive from these clients. Server also holds a constant reference to Model, and regularly writes the current state of Model to remote clients.

Clients (Controllers) that play vs a remote Server have an IEventTarget implementation called Client, which sends relevant IEventTarget calls to the remote Server. When Model state is received across the network, Client will derive implied IEventTarget events by comparing the two last received Model state packets to each other and callback to the local IEventTarget (the Controller).

An important note on these "derived" IEventTarget events, which of course do not necessarily exactly mirror the events that actually took place on the remote Model, is that the client is in no way dependent on ANY IEventTarget callbacks in order to operate correctly. ALL relevant gamestate exists in the Model, which is regularly synchronized (see the Quake 3 Arena link below) from the remote Model. The derivation of "implied" events from two state packets is simply used to emulate the local case (i.e. explosions and sounds etc).

To implement a "dedicated server", a Model and a Server are deployed, and in this case the Model's reference to an IEventTarget can be changed to a pointer, which may be NULL.

To support all three cases, the IEventTarget pointer implementation (in Model) is used, and Server is changed to check for this pointer being set when accepting new clients. If the pointer is set, there is a local client, and the first player slot is in used by this local client. If the pointer is NULL, there is no local client (i.e. dedicated), and all slots are available for remote clients.

Note that the above assumes a one-to-one mapping between clients/players and Model-domain "avatars". This is relevant to games in which each player uniquely controls some avatar within the context of Model. Typically, Server's remote client representations are stored in an array that is parallell to the array of avatars inside Model, and as input arrives across the network from a given remote client, this input will be applied to the avatar that is "stored in the same slot" in the Model. This is the reason for the pointer check noted above, in which case the first "slot" is already in use by the local client/player.

This approach to networking is inspired by id Software's Quake 3 Arena:

Quake 3 Networking

The Director

An important aspect of architecture that comes up when dealing with "local/remote/listen/dedicated" single/multiplayer architectures is how to manage these various modes of operation at the highest level. Experience at Massive as well as various more recent experiments have led to the concept of a Director.

The Director encapsulates the details of the various modes, with when aggregated together are:

Model, View, Controller
Client (the proxy to a remote Model, i.e. a "server")
Server (the proxy to all remote Controllers, i.e. "clients")

Local mode involves:

do a Controller update to change Model state and compose the current view
update Model logic
render View

Join mode involves:

receive network data via Client, i.e. receive new Model state from the remote server and derive any events
do a Controller update to change Model state (via proxy) and compose the current view
send all change requests via Client
updating Model for any local simulation
render View

Listen mode involves:

Server receives incoming change requests from remote clients and applies them to Model
do a Controller update to change Model state and compose the current view
update Model logic
Server sends outgoing state to remote clients
render View

If Dedicated mode is implemented in the same application:

Server receives incoming change requests from remote clients and applies them to Model
(do a ServerController update to compose the current view of the server, i.e. special controller for op access)
update Model logic
Server sends outgoing state to remote clients

Reliable messages over UDP

From the q3 article:
All reliable data that another node needs is sent repeatedly until the sender receives an update for most-recent-ack (indicating that the packet has been received). For example, if a player sends a chat message (reliable) with update 6, he will continually send that chat message on subsequent state updates until he receives notification from the server that it has received an update >= 6. Brute force, but it works.

DELTA COMPRESSION -> RELIABLE DATA?
Currently, the server sends state ids to the client, and the client piggybacks these ids back to the server on input. If the client also sent its own ids (input ids) to the server for each input, and the server added these as acks to the state data (i.e. here is server state 4, the last client input I got from you was input 3), then this can be used to send reliable data.

Remember, the only reason to do any delta compression at all (as opposed to just throwing the latest state and input at each other) is the assumption that the data is too big for an udp packet. This of course needs to be checked for any game BEFORE delta compression is implemented. Of course, any game that doesn't require it is much simpler to implement on the network side of things.

Ok, so back to "reliable messages". Chat is one of those, so we would expect to be able to submit a chat message to the server once and have it arrive, without any high level mucking about. Previous implementations would spam the server with chat messages based on a local delta check on the client, i.e. each players chat string was part of the game state, and the local string (in the gui) was checked against this for the current player, and as long as it differed the chat message would be included in the input.

The big question is, is this simpler than the above noted q3 model? It appears so, because there is no extra client input sequence ack stuff going on (from the server to the client). The client does send sequence ids, but these are currently only used to discard out of order input on the server.

So what about a game with infrequent explicit commands, like an RTS. One could implement this by having a big fat command structure (i.e. a union) as part of the game state for each client (obviously only the command structure for the given client would be synced back to each client). This would work just like the chat string, i.e. the client networking layer would diff the currently submitted command against the last received state, and as soon as there was a match the client would know that the data was received correctly by the server.

The problem (even in the chat case) is that there is a chance that the client submits two consecutive commands fast enough, the server might miss the first one. Just writing this, I realise that the solution is for the client to buffer commands (fifo) and continually send them until the server acks each one, only popping then.

When the app submits unreliable data (typically every single frame), this is written to a struct inside the Client which will be sent immediately and unreliably for each net send tick (20 hz or whatever).

When the app submits reliable data (way less than every frame), this is queued as a struct in a pending command list (fifo) with a globally unique number (for the session) per command. For each net send tick, the Client will append the first relcmd in from the fifo to the unreliable input data, including the cmd id.

The servers state packet will be modified to additionally indicate the last received command id from the client, so that when a client gets a state from the server, he will know when it is ok to pop commands from the fifo and continue on sending the next one.

This is similar to the delta compression scheme (in the use of acks), but of course the intent is to deliver every single command reliably, and the difference between the data being sent is that it is "existence-transient" (i.e. both the data structure and the data comes and goes), as opposed to the game state / model which is "data-transient" (i.e. the data changes but the structure is static).

On question is: is it possible to send all of the pending client messages in the same input? This would probably speed up roundtrip time, as the app has already submitted the commands, and probably use bandwidth better, as long as everything fits inside the udp packet.

However, worried that the logic might be harder. The server should ack the highest received command number (a single number).

client sends cmd 123
server gets one of these packets, handling all commands, and then starts to ack 3
client gets another cmd (4), and starts sending 1234
server gets another 123 packet, sees that the last handled command was 3, so ignoring this packet
server gets a 1234 packet, notes that last cmd is 3, so it only handles cmd 4, and starts to ack 4
the client gets cmd ack 3, and continues sending cmd 4
etc

This should work.

Getting rid of projectiles over the network

The current multiplayer implemenation of ZFighter is based on the Q3A ideas, namely only UDP and delta compression. The projectiles however are not delta compressed as they are always in motion and would not gain anything from such compression.

However, using the current scheme, i.e. only state is transmitted and any events need to be "derived", there are subtle issues with the responsiveness of firing projectiles, due to two issues:

First, since the client only samples your input at a fixed rate (20 hz?), while the Controller samples and sets the same flags every frame, your flags need to "happen to be set" at the time the Client decides to send input to the Server. This feels like the game doesn't catch all your input. This can probably only be solved by having the Client either queue the input commands for firing (I suspect that Q3A does this), or to increase the rate of input from the Client to the Server to the point where a user cannot press the fire button fast enough to perceive any drops.

Second, because state snapshots go from the Server at a fixed rate (20 hz?), any IEventTarget events that are derived by the Client are clamped to the interval at which you receive the packets (which should be close to the Server send interval). This doesn't mean that you don't get all projectiles; you still do because the Server will encode projectile state, not creation/destruction. However, the user will perceive that some of the projectiles that come into existence will not have an associated sound, and also they may spawn a little bit in front of the firing player.

There are also some issues with projectile destruction events occuring a bit away from the point of impact, due to the implementation using the last state snapshop position for destruction. This could be solved using halfway interpolation or something.

However, another approach that might work even better is as follows:

The current implementation includes Hero input flags from Server to Client. It appears that this is currently used for animation derivation in the View code.

Since the flags are local, we could actually run the Hero logic in the Model in order to "predict" the Heros locally on the client, INCLUDING projectile firing stuff. Since we trap every write (or should and could :) ) via IEventTarget, we can pass the Client when we run this code. Careful selection of what code to actually run is required, but I think it should be possible to run local prediction of Hero logic AND projectiles based solely on the Hero input flags.

For the local player this will be really great, as there is instant feedback of running the local avatar directly. We will still always correct the position and speed of all Heros when state comes across the network, but this should be no worse than the current lerp errors that occur when you change direction (i.e. act non deterministically). Also, projectile firing will also feel like the local case. For other Heros that you see, you can't tell the difference either way, as you are of course always behind across the network and don't know or care when they move.

The assumption is that the Hero input flags as passed as part of Model state will be usefull for prediction/firing for the non local Heros. It seems that the big risk is what is causing the loss of creation / destruction events, i.e. the state packets have a fixed 20hz rate, and in between those times shots fired might be missed. This needs to be tested!

NOTE: One thing that should probably be done no matter what is for the Client to keep the fire flag set until the next input packet goes out, no matter what the state of the fire flag is that frame. This way there are no dropped single fire events in case you press in between input sends. If you are fast enough to fire twice between input sends, then you will lose one of the shots (barring packet loss).

This is indeed a gameplay design issue that is tied to the implementation, i.e. it would be easier to handle rate of fire on the server, and have the client tend to keep the fire button depressed all the time (i.e. autofire). This way individual events don't get lost as easily.

(080204 - checked the implementation in Client, it is currently NOT implemented this way. If you are reading this after that date, check and see if this is still the case and perhaps remove this note!)

080403 - current plan for my ARC clone is to use input flags and run full local simulation in the clients. The only thing that needs to be synced in this design is the state of the Saucers (position, speed, input flags, target position, kills), as well as the Flags. The idea is to get rid of syncing projectiles, and minimizing the number of events that we need to derive.

080619 - ARC, now dubbed Bangu Bangu Sosaru, was tested over the internet today using these ideas, and worked very well.

Back to index...