Skip to content

ServerServerConcepts

Michael Bayne edited this page Feb 10, 2014 · 9 revisions

Nexus Server/Server Concepts

You should be familiar with the concepts in ClientServerConcepts and ServerConcepts as this documentation will refer freely to ideas and terms defined there.

The Nexus system is designed to allow the distribution of services across a network of mostly homogeneous server nodes. In normal usage, services are split up across all server nodes, such that each node handles some subset of the clients making use of those services. It is also possible to distribute the load for a particular service across a subset of the nodes if there is benefit in confining that service to a smaller number of servers.

Caveat

This section comes with a great big caveat: none of this is implemented yet! The Nexus system has been designed with distribution across multiple servers in mind, and the authors have implemented other distributed systems like Nexus, which are distributed across many servers (and from which many of the ideas for Nexus have evolved). However, it is possible that some of the concepts in this part of the documentation will change, a little or a lot, when the rubber hits the road. The rubber should be hitting the road soon, so the hypothetical nature of this documentation will not persist for long.

Network Transparency

The primary mechanism by which services are split across multiple servers is through the use of Keyed entities. There may be many instances of a given keyed entity (each with a unique key), and those instances can be spread across multiple servers. Because Nexus already encapsulates all interaction with an entity into Action and Request (to ensure that the interaction takes place on the appropriate thread), it also becomes possible to transport these interactions between servers such that the initiator of an action or request need not know whether the entity on which it is operating is on the same server or some other server.

Of course, the runtime characteristics of a request that operates on an entity on the same server and one that operates on an entity on another server are substantially different. As a result, one needs to "know" at a high-level what the structure of their entities will be and whether a particular communication is likely to involve network transport or not, but the code to actually implement those communications uses the same simple API in both cases.

Nexus takes care of keeping track of which entity is hosted on which server, and automatically routes actions and requests to the appropriate server, and responses back to the initiating server. The application developer is thus freed from all of this bookkeeping and can focus on the higher level architecture of a particular distributed service, ensuring that the service can be scaled as horizontally as possible, and that cross-server communication is minimized.

The next two sections describe design considerations that are useful to bear in mind when distributing services across multiple servers, and how Nexus can help to make that process as simple as possible (but no simpler).

Think Local, Act Local

Nexus has been designed with the goal that implementing a service that is distributed across multiple servers is not substantially different from implementing the same service such that it runs on a single server. This goal is most easily attained when the service in question boils down to a modest collection of clients manipulating a shared state of modest size, with that structure repeated arbitrarily many times.

This format might seem contrived, but in fact a great many services (in multiplayer games certainly) take exactly this form. The poker example from ServerConcepts is one such case: you have two to ten clients playing a game with a set of distributed state that is largely independent of all of the other poker games going on simultaneously. A particular game can be hosted on a particular server, the clients of that game communicate only with that server, and that pattern can be repeated arbitrarily many times across a network of as many servers needed to handle the load.

Even more complex systems can also be structured in this way: an MMO with instanced "dungeons" can host one or more dungeon instances on a server. Because modest in Nexus terms can mean client numbers in the low-to-mid-hundreds and shared data in the tens to hundreds of megabytes, this is often more than sufficient to accommodate a game (or application) design.

Games with non-instanced dungeons are often divided into zones, which can also be load balanced across servers in a similar manner. Handling the transition between zones can be more complex, but one possible solution is to make the zones substantially smaller than a single server can handle and to have a client subscribe to the distributed state for a 3x3 grid of zones (or an even larger size). As the player moves from the center zone to one of the edge zones, they can drop subscriptions for the zones that are now two hops away, and establish subscriptions to the new zones that are now only one hop away (which may be on the same server, or may be on some other server). As long as game actions tend not to span zones, this can also yield a nearly horizontally scalable system. For the small number of actions that span zones, solutions like those described below can be layered atop a mostly localized design.

If you actually do need to structure your application such that thousands of clients must simultaneously access shared state that spans multiple servers, Nexus can still be useful and simplify the implementation of your application, you will just have to think a lot more carefully about where data lives in the network and how requests are sent between servers to read and update distributed data. Designs of such complexity are not only beyond the scope of this documentation, but also probably a bad idea, ab ovo. If you can make small changes to the design of your application to allow things to be more easily localized, you will reap substantial benefits in developing and maintaining a vastly simpler and more manageable distributed system. If you really have to build an ultra-massive distributed system, then congratulations on your wildly successful product!

Think a Little Bit Global

Though the previous section explained why it is beneficial to try to structure your distributed system as localized pockets of activity, there are some "global" actions which are common and easily accommodated.

User to User Chat

One example of this is user to user chat. You may wish to send a message to your friend regardless of whether they're playing in the same dungeon instance or card game that you are. Since user to user chat is simply a one-way message delivery, this can be accomplished trivially and efficiently using Keyed entities.

Simply define a Keyed entity to manage your user, and then dispatch an Action on that entity to deliver a chat message from anywhere in the network and it will automatically be routed to the node that hosts that entity and processed. For example:

public class PrivateMessage implements Streamable {
  public int fromUserId;
  public String fromUserName;
  public String message;
  // ...ctor that inits fields...
}

public class UserObject extends NexusObject {
  public final DSignal<PrivateMessage> msgs = DSignal.create(this);
}

public class UserManager extends Keyed {
  private final UserObject obj;
  private final Integer userId;

  public UserManager (Nexus nexus, int userId) {
    this.userId = userId;
    Nexus.Context<?> ctx = nexus.registerKeyed(UserManager.class, this);
    nexus.register(new UserObject(), ctx);
  }

  public Comparable<?> getKey () { return userId; }

  public void deliverMessage (PrivateMessage msg) {
    // do any server-side checks like checking a user's blocked users list, or swear word filtering
    // here, and if the message passes muster, then deliver it to your user
    obj.msgs.emit(msg);
  }
}

// now from any entity running anywhere in the network, you can simply write
PrivateMessage msg = new PrivateMessage(fromUserId, fromUserName, message);
nexus.invoke(UserManager.class, toUserId, um -> um.deliverMessage(msg));

What Are My Friends Doing?

One nice thing to do in a multi-user game or application is to keep users abreast of whether any of their friends are logged into the system, and also potentially tell them what those friends are up to. This allows users to decide whether they want to join in the fun, or perhaps go off and do their own thing.

This is another situation where information does not neatly organize itself into localized clumps. Your friends may be playing together or they may each be doing their own thing, on totally separate server nodes. In this case, the solution is not quite as trivial as user to user chat, and requires some consideration on the part of the implementor.

There are a number of ways one could make this friend status information available, and the best approach for your application depends on your circumstances. The key parameters in thinking about this particular problem are: how frequently does each user's status change (how often do we "write"), how frequently do other users want to see the current status of their friends (how often do we "read"), and how many online friends does a user have on average (how big is our "span"). The different solutions described below accommodate differing values for these parameters.

Ask, Don't Tell

If your app is such that you read infrequently (the user has to click a button in the UI to see their friends' current statuses, for example), or even if you read somewhat frequently but you write extremely frequently (a user's status changes very often), then you may wish to gather up friend status information on demand.

This is easily accomplished using Nexus.gather:

public class Status implements Streamable {
  public final int userId;
  public final String username;
  public final String activity;
  // ...ctor that inits fields...
}

public interface UserService extends NexusService {
  void getFriendStatus (Callback<List<Status>> callback);
}

public class UserManager implements Keyed, UserService {
  private final Nexus nexus;
  private final Integer userId;
  private Set<Integer> friendIds;

  public UserManager (Nexus nexus, int userId) {
    this.nexus = nexus;
    this.userId = userId;
  }

  public Comparable<?> getKey () { return userId; }

  public Status getStatus () { ... }

  // from UserService
  public void getFriendStatus (Callback<List<Status>> callback) {
    Map<Integer,Status> statMap = nexus.gather(UserManager.class, friendIds, um -> um.getStatus());
    List<Status> statuses = new ArrayList<>();
    statuses.addAll(statMap.values());
    callback.onSuccess(statuses);
  }
}

Note that the gather call will simply omit data for any friend that is not currently online (i.e. for any friend who does not have a UserManager instantiated with their id on some server in the network). In this case, that's precisely the behavior we want. In other use cases, you may wish to compare the keys of the map returned by the gather call to the original friendIds set, and do something special for the users not included in the results.

This solution can be very efficient. Each user does their own thing on their own server, and when they switch from activity to activity, they simply make a note of it in their UserManager's state. This means that no inter-server network traffic is generated when we "write" (update a given user's status) because the information stays on the user's main server.

Furthermore, the traffic involved in a "read" is often also not onerous. Chances are, the number of friends online for a given user will grow much more slowly than the size of the entire user population. Satisfying a request for current friend status will only require talking to a small subset of the total set of servers.

Tell All of the People, All of the Time (Global Maps)

The polar opposite of the above approach is to broadcast the status of every user to every server, every time it changes, so that any server can immediately satisfy a request for current friend status from local state. Servers can also pass on status changes immediately to clients, such that there's no "asking for current status" at all. Every client will always be up to date with the current status of their friends, within milliseconds of any change in status.

As you can imagine, this can be very expensive. However, if user status changes extremely infrequently (few writes), or users have extremely large friend networks (high span), or it's very important for your application to display status updates immediately in the UI of each client, this can be an appropriate solution.

Nexus provides a mechanism called global maps to enable this system-wide distribution of information. Information in a global map is duplicated on every server node in the entire network, and any change to a global map is (nearly) immediately broadcast to every other server in the network so that all servers have up to date data at all times.

The caveat in the previous paragraph is because updates to global maps are batched slightly to reduce the impact of the (potentially substantial) inter-server traffic they generate. An application can configure the thresholds, but by default, map updates may be delayed by up to 500ms to enable multiple updates to be grouped into a single network packet. This is usually a sensible tradeoff; latency between server and client can often result in much higher delays, so delaying a status update for an extra 500ms is not likely to meaningfully impact user experience.

A global map solution to the status distribution problem would look something like the following:

public class Status implements Streamable {
  // we omit userId here because it's available as the key for the map;
  // when it comes to global maps, every byte is precious
  public final String username;
  public final String activity;
  // ...ctor that inits fields...
}

public class UserObject extends NexusObject {
  public DMap<Integer,Status> friends = DMap.create(this);
}

public class UserManager implements Keyed {
  private final Nexus nexus;
  private final Integer userId;
  private final UserObject obj;

  private final RMap<Integer,Status> userStatus;
  private final Connection usConn;

  private String username = ...;
  private Set<Integer> friendIds = ...;

  public UserManager (Nexus nexus, int userId, RMap<Integer,Status> userStatus) {
    this.nexus = nexus;
    this.userId = userId;
    this.userStatus = userStatus;
    Nexus.Context<?> ctx = nexus.registerKeyed(UserManager.class, this);
    this.obj = new UserObject();
    nexus.register(obj, ctx);

    // listen for changes to all users' status, and look for changes to our friends
    usConn = userStatus.connect(new RMap.Listener<Integer,Status>() {
      public void onPut (Integer userId, Status status) {
        if (friendIds.contains(userId)) {
          obj.friends.put(userId, status);
        }
      }
      public void onRemove (Integer userId) {
        obj.friends.remove(userId); // will NOOP if user is not in map
      }
    });

    // start out with our friends' current statuses
    for (Integer userId : friendIds) {
      Status status = userStatus.get(userId);
      if (status != null) obj.friends.put(userId, status);
    }

    // you may want to set your current status here, or wait until the user starts an activity that
    // results in a call to updateActivity()
  }

  public void updateActivity (String activity) {
    userStatus.put(userId, new Status(username, activity));
  }

  public void shutdown () {
    // remove ourselves from the userStatus map
    userStatus.remove(userId);

    // it is very important to disconnect your listener when the UserManager goes away; otherwise
    // you'll leak memory and waste CPU resources trying to broadcast friend updates into the void
    usConn.disconnect();

    // you can also opt to use the weak-listener support (userStatus.connect(...).holdWeakly()),
    // but if you do that, you need to ensure that UserManager maintains a strong reference to the
    // Listener instance, which the above code does not do; plus since you have to manually manage
    // the lifecycle of a user's mapping in the userStatus map, it's pretty easy to manually manage
    // the lifecycle of their listener as well; YMMV
  }
}

// the global map must be registered during server initialization
RMap<Integer,Status> userStatus = nexus.registerMap("userStatus");

// you'll want to store userStatus somewhere and make it available to UserManager when it is
// created; that can be accomplished with dependency injection, keeping a global reference, or
// however you like

It's useful to walk through everything that happens when a user's activity is updated to get a feel for the overhead involved in this "broadcast every change" model. When a user changes to a new activity, something will call updateActivity which will update a mapping in the userStatus map. This will include this status update in the next message sent from the server that hosts this UserManager to every other server in the network, informing them of a status update. As each server receives this update, it will then emit an onPut notification to every UserManager listening to the map on that server, and each of those will check whether the updated user is in their friends mapping. For those who do see a match, they will update their friends map, which will cause a message to be sent to the client subscribed to the UserObject managed by that user manager, and (most likely) some element in the client's UI will be notified to update the status of the friend in question.

This is a substantial amount of processing and network traffic, but as mentioned above, it may be the right solution for your app depending on your circumstances and requirements.

Tell Some of the People, All of the Time (Custom Broadcasts)

A third option is to manually broadcast status updates using Action. This can provide immediate notification of status changes at potentially lower overhead than the use of global maps. With this approach, the user manager entity explicitly broadcasts a status change to its friend entities using a variant of Nexus.invoke that accepts a Set of keys and efficiently batches the action invocation such that only one message is sent to each server that hosts one or more of the target entities.

The code for this solution is similar to the global map solution, with some small changes:

// Status and UserObject same as in global map example

public class UserManager implements Keyed {
  private final Nexus nexus;
  private final Integer userId;
  private final UserObject obj;

  private String username = ...;
  private Set<Integer> friendIds = ...;

  public UserManager (Nexus nexus, int userId) {
    this.nexus = nexus;
    this.userId = userId;
    Nexus.Context<?> ctx = nexus.registerKeyed(UserManager.class, this);
    this.obj = new UserObject();
    nexus.register(obj, ctx);
    // use Nexus.gather as in the first example to obtain the starting status of our friends
  }

  public void updateActivity (String activity) {
    Status status = new Status(username, activity);
    nexus.invoke(UserManager.class, friendIds, um -> um.friendStatusUpdated(userId, status));
  }

  public void shutdown () {
    nexus.invoke(UserManager.class, friendIds, um -> um.friendLoggedOff(userId));
  }

  void friendStatusUpdated (int userId, Status status) {
    // double check that this user really is our friend; they should be, but maybe we unfriended
    // them while a status update was in-flight
    if (friendIds.contains(userId)) {
      obj.friends.put(userId, status);
    }
  }

  void friendLoggedOff (int userId, Status status) {
    obj.friends.remove(userId);
  }
}

This solution can be more efficient than the global map solution when users have small numbers of online friends who are logged into a small subset of the servers. With this approach, there is no batching of status updates, every status update results in a message being sent immediately to other servers, but to a potentially much smaller number of servers.

Also, the size of the message sent in this case is usually much larger than the size of the message sent for a global map update. Sending an Action over the network requires using Java serialization to serialize the Action object, which is very space inefficient. A global map update on the other hand uses Nexus's internal serialization mechanism which is very space efficient (in addition to the lower overhead due to batched updates).

This solution also requires that "both sides" know who is interested in receiving these updates. In the case of reciprocal friend relationships, that's fine. A user knows who their friends are, and they can send updates precisely to those friends. If the originator of the status change does not know who is interested in hearing about the change, then you will likely have to go with the global map or gather on demand solutions.

Keyed Entity Load Balancing

Thus far we have avoided any discussion of how keyed entities end up spread across the servers in a network. In this section, we'll discuss the different ways in which an app can control the process of keyed entity distribution and thus balance its load evenly across its servers.

Random Load Balancing

When a client first bootstraps a connection into a Nexus network, it uses the address of a particular server to establish that bootstrap connection. An app could publish the addresses of all servers to its clients and have the client choose randomly from the list of addresses when creating its bootstrap connection. It's generally a good policy to have clients make a random selection from at least a non-trivial subset of the total number of servers, otherwise you run the risk of overloading one or two servers when thousands (or more) clients try to reconnect after a full network restart.

Assuming you have clients randomly connecting to servers in the bootstrap process, you can simply let entities fall "where they may" from there. The random server that hosts a given client creates their user entity, and if that user creates a game, it's hosted on their server. If they join a game created by some other user, the game is hosted on the other user's server and so forth. If you have moderately low session lengths with a lot of user turnover, this extremely simple load balancing approach can get you pretty far.

DIY Load Balancing

If the random load balancing approach is not appropriate for your system, or you just prefer not to leave things to chance, Nexus provides you with some basic building blocks to use in implementing a load balancing strategy of your choice.

The basic components in an explicit load balancing approach are as follows:

  • Look at the current state of the system and decide which server you want to host your new keyed entity; this may involve using Nexus.census or Nexus.survey.
  • Use Nexus.invokeOn or Nexus.requestFrom to invoke code on a singleton on the target server; the invoked action/request will create your new keyed entity.

Here's an example using census to host a new keyed entity on the server with the fewest instances of that entity at that moment:

public class GameObject extends NexusObject {
  // ...game stuff
}

public class GameManager implements Keyed {
  private final Integer gameId;
  public final GameObject obj;

  public GameManager (Nexus nexus, int gameId) {
    this.gameId = gameId;
    Nexus.Context<?> ctx = nexus.registerKeyed(GameManager.class, this);
    nexus.register(obj = new GameObject(), ctx);
  }

  public Comparable<?> getKey () { return gameId; }
}

public class LobbyManager implements Singleton {
  private final Nexus nexus;

  public void startGame (...) {
    // find the server with the lowest population of game managers
    Map<Integer,Integer> census = nexus.census(GameManager.class);
    // census return a map from (serverId -> population)
    Map.Entry<Integer,Integer> target = null;
    for (Map.Entry<Integer,Integer> entry : census.entrySet()) {
      if (target == null || entry.getValue() < target.getValue()) target = entry;
    }
    // instruct the lobby manager on the least populated server to create a game manager
    Address<?> addr = nexus.requestFrom(
      target.getKey(), LobbyManager.class, lm -> lm.createGame(...));
    // now send the address of the game object to the players, or whatever we need
  }

  Address<?> createGame () {
    int gameId = nexus.nextId(GameManager.class);
    GameManager gm = new GameManager(nexus, gameId);
    return gm.obj.getAddress();
  }
}

LobbyManager, being a singleton, exists on every server. In a normal circumstance the LobbyManager on some server (call it A) will prepare to start a game, it will use census to determine the server currently hosting the fewest game instances (call it B) and it will send a request to server B to start a new game. Server B generates an id for the game that is globally unique, but can only be used by server B (see Nexus.nextId for details), and then creates a GameManager using that id. It then returns the address of that game manager's distributed object back to server A so that it can pass that address on to the players of the game, to which they can then subscribe and play.

A simple populuation count may be insufficient for more complex load balancing situations. In such cases, one can use the Nexus.survey mechanism to do more sophisticated calculations to determine which server is the least loaded. The general structure is the same as using census, but the call to census:

    Map<Integer,Integer> census = nexus.census(GameManager.class);

becomes a call to survey:

    Map<Integer,Integer> census = nexus.survey(LobbyManager.class, lm -> lm.computeLoad());

and one must define a function in LobbyManager that computes a more nuanced measure of the load on a given server:

  int computeLoad () { ... }

The survey call will issue the computeLoad call on every LobbyManager on every server in the network and aggregate the results and return them to the server that initiated the call. In this regard, it is substantially more expensive than census, which simply looks at tables that exist locally on the server. That may mean that you want to avoid recalculating load every time a new entity is created, and instead only compute it every few minutes. You may also wish to rebroadcast the load calculations to all servers, so that all servers have the latest load calculations for use until the next time they are computed. This can be accomplished using Nexus.broadcast:

    // compute the load on each server and aggregate the results
    Map<Integer,Integer> census = nexus.survey(LobbyManager.class, lm -> lm.computeLoad());
    // now broadcast those results back to every server
    nexus.broadcast(LobbyManager.class, lm -> lm.updateLoadData(census));

This assumes some function updateLoadData is defined on LobbyManager which accepts a Map with load data and stores it somewhere for future use.

Specialized Server Subsets

As shown in the previous section on load balancing, an application has total control over how entities are distributed around the network. The examples we showed will result in entities being distributed evenly over all servers in a network. However, one could easily structure their load balancing code to distributed entities to only a subset of the servers.

One can make use of census data for a variety of different keyed entities when making load balancing decisions for a particular keyed entity, or use any strategy at all for distributing services and load over the network. Nexus provides the tools to easily assess the state of your network, and to easily use that information to create a new keyed entity on the server of your choice. The rest is up to you and your application's requirements.

Entity Id Generation

If your Keyed entity has no naturally occurring unique id, you will need to generate one. Identifiers must be unique across the entire server network, which makes their generation more complex. Fortunately, Nexus provides a simple mechanism for generating unique 32-bit integer identifiers for "ephemeral" entities (those which will not outlive the server on which they are hosted and which will never migrate).

Use Nexus.nextId to generate unique ephemeral identifiers like so:

public void startGame (...) {
  int gameId = nexus.nextId(GameManager.class);
  GameManager mgr = new GameManager(gameId, ...);
  nexus.registerKeyed(GameManager.class, mgr);
}

The API documentation for nextId explains various caveats relating to these ids. The general gist is that they should never be written to persistent storage and they are only valid for the server on which they were generated. Don't generate an id on one server node and then send that id to another server node to be used to create a keyed entity. That could result in key collisions. To be fair, it will only result in collisions if you generate over two million entities of the same type during uninterrupted operation of your network, causing the generated keys to wrap around, and at least one low-numbered entity lives long enough to still be around at that time, but it's still a bad idea.

If you need an auto-generated id that will be persisted and reused in between server reboots, you will need to use a database server, or GUIDs, or some other mechanism to generate globally unique keys. For most auto-keyed entities, ephemeral keys are fine. In such cases, they are preferable to something like GUIDs because they are substantially smaller. Entity keys may be sent over the network a lot, so there are performance benefits to keeping them small.

Global Maps

Global maps are a mechanism for maintaing a set of key/value pairs that are mirrored to all servers in a network and for which changes are broadcast to all servers, ensuring that all servers have the most up to date data (within the bounds of inter-server latency and a brief batching delay).

The section above on friend status distribution describes a use case for global maps and shows example code. Refer to that section for basic info on global maps. You should also read the javadoc for Nexus.registerMap for additional information.

Singletons in Multi-server System

TODO

Migrating Entities

TODO