Reversing GGP Client/Server Protocol: Match Request Issues

Claus
New Member

Posts: 18

Reversing GGP Client/Server Protocol: Match Request Issues Oct 16, 2014 17:18:36 GMT -8

Quote

Post by Claus on Oct 16, 2014 17:18:36 GMT -8

The primary obstacle to changing the client / server relationship between the Game Manager and the players is the process of starting the match, which has potential response time and scalability issues. Once the match is started, everything is straightforward. The purpose of this post is to explore what the protocol for starting a match might look like. I hope to get the ball rolling and stimulate some suggestions to improve my initial proposal. Once we get that figured out, then we can turn to the protocol for playing the game, incorporating the other suggested enhancements to improve reliability and extensibility.

Once the player becomes a client, the client will need to authenticate with the Game Manager server, so the Game Manager can verify that the player is who it claims to be. The problem arises with the next step. Once the player is authenticated, it requests a match and expects a "timely" response from the Game Manager. Typically, on the Internet, this means a response time of seconds. We could stretch this out to as long as a minute, but even so that will not be enough time in many cases. If there are too few players, another player may not make a request for some time. (We probably do not want the Game Manager to resort to a single player game every time this happens.) If there are too many players, the Game Manager may not have the capacity to start another match until an existing match finishes.

The simplest solution would be for the Game Manager to fail the player's request after 30 to 60 seconds when it cannot arrange a match for the player. While the player remains interested in waiting, it continues to poll the Game Manager with a match request. The works fine with few players but does not scale well when there are a large number of players because the Game Manager will need to maintain a connection with essentially all the players waiting for a match. This has the potential for exhausting the available connections so some requests will be refused. (To avoid interfering with active matches, the match requests would need to be on a different port than the match moves.)

A more scalable solution would be for the Game Manager to maintain a queue of players waiting for matches without the players having an active connection with a request. When the Game Manager fails a request it also returns with a retry delay based on an estimate of the expected wait until a match can be scheduled for the player. The approximate time to complete existing games to complete is easy to estimate if the Game Manager (like Tiltyard) keeps statistics on the average number of turns per game, which then is just multiplied by the play clock. To discourage premature retrying of match requests (before the retry delay expires), violators could be bumped to the end of the queue.

I think this addresses the main issues with starting the match. I provided this background so everyone understands why the proposed solution is notably more complex than what a Game Manager does today as the client. With this, I can sketch out a possible solution. There are some further details to work out, but I first want to get some feedback on the general idea.

First, the view of the protocol from the player's perspective:
-Player makes a login request.
-Game Manager authenticates the player, and if successful responds with a list of capabilities, probably as a JSON or XML document.
-Player makes a match request, selecting the capabilities it wants to use, again as a JSON or XML document.
-If the Game Manager cannot start a match within 30-60 seconds due to absence of other players or due to capacity limits, the Game Manager fails the match request. The failure response will include a retry delay (which could be 0) indicating when the player may make another match request.
-Player expects match requests to fail and is prepared to retry match requests, observing the retry delay, so long as it continues to wish to play. Should the Player no longer wish to play, it will log out.
-If the Game Manager can start a match, it responds to all the included players with a match ID and the GDL. This begins the start clock. The player begins metagaming.

From the Game Manager's perspective, life is more complex. While the Game Manager will always accept all login and logout requests, the handling of match requests depends on how busy the Game Manager is.

If the Game Manager has the capacity to begin another match, it selects a game using some algorithm. This could be the existing algorithm so long as it does not depend on who the players are. As player match requests come in, they are associated with the game. If the match request of any player assigned to the game ages past some maximum (30-60 seconds), all the players associated with the game are removed and sent a failure response with a 0 retry delay. Presumably, all the players will retry immediately, get reassociated with the game again, and continue to wait for more players. However, it is possible that a player may elect to drop out (not send a match request), which is why all players get disassociated with the game when the failure response is sent. Obviously, single player games don't need to wait for an additional player, and for a two player game, just one player is waiting. (Note: For games with 3 or more players, Tiltyard appears to create random players if there are not enough active players. A Game Manager could continue to implement this type of policy when there are few active players. We probably also don't want the Game Manager to spawn a random player every time a player is waiting for an opponent in a two player game.)

If the Game Manager does not have the capacity to begin another match, it still selects a game using its algorithm, but now it estimates a start time for the match based on the matches currently active. As match requests come in, the Game Manager associates the player with the game and responds immediately with a failure including a retry timeout corresponding to the estimated start time of the match. Once the match has enough associated players, the Game Manager selects another game and associates subsequent match requests with this new match. When the scheduled start time for a match occurs, all the associated players will retry their match request and the game can be started, assuming there is available capacity. While the details are straightforward and I won't describe them now, there are several alternate scenarios that need to be handled (prior match takes longer than estimated, so Game Manager lacks capacity to start new match at the scheduled time; a player associated with a scheduled match logs out before the scheduled match start time; a player associated with the scheduled match fails to send a match request at the scheduled start time).

Complexity aside, does anyone see reasons why this would not work? Are there ways to simplify this while still satisfying the response time and scalability constraints? Given the complexity, is this worth doing? I would appreciate feedback from others in the community.

Sam Schreiber Global Moderator Posts: 46	Reversing GGP Client/Server Protocol: Match Request Issues Oct 16, 2014 19:13:12 GMT -8 via mobile Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Sam Schreiber on Oct 16, 2014 19:13:12 GMT -8 I need to read through this in more detail, but... what benefit does this provide?

Claus
New Member

Posts: 18

Reversing GGP Client/Server Protocol: Match Request Issues Oct 17, 2014 12:52:35 GMT -8

Quote

Post by Claus on Oct 17, 2014 12:52:35 GMT -8

Oct 16, 2014 19:13:12 GMT -8 Sam Schreiber said:

I need to read through this in more detail, but... what benefit does this provide?

The primary benefit is to simplify hosting a Game Player behind a firewall by allowing the Game Player to be an HTTP client with the Game Manager becoming an HTTP server. From what I have heard, the number one complaint with the existing GGP protocol is that the Game Manager is the HTTP client and the Game Player is the server. Most individual developers are not set up to host servers so this becomes at minimum a nuisance and worst case an inhibitor to creating a player.

Just reversing the roles (with some minor tweaks) means that to start a match, the Game Manager needs to get HTTP requests from all the players rather than sending an HTTP request to the selected "active" (or logged in) players. As stated in the initial post, the naive approach of having the players submit a request and wait however long it takes to get assigned to a match presents problems of a potentially very slow (in Internet terms) response to the request and a potential scaling problem due the number of pending, long lived requests that the server may need to manage.

I acknowledge that the proposal is significantly more complex than the naive approach or current practice with game player as the HTTP server. Perhaps someone else can think of something simpler way to address the response and scalability issues, or identify why they are not issues. This proposal was my best idea.

jrao
New Member

Posts: 3

Reversing GGP Client/Server Protocol: Match Request Issues Nov 13, 2014 9:45:38 GMT -8

Quote

Post by jrao on Nov 13, 2014 9:45:38 GMT -8

It's been a while since I worked with java networking code, but have you considered just maintaining the connection? Based on blog.krecan.net/2010/05/02/cool-tomcat-is-able-to-handle-more-than-13000-concurrent-connections/, it looks like Tomcat can handle 10K connections on a laptop, and there's also async servlet to consider (http://www.javaworld.com/article/2077995/java-concurrency/asynchronous-processing-support-in-servlet-3-0.html).

Sam Schreiber
Global Moderator

Posts: 46

Reversing GGP Client/Server Protocol: Match Request Issues Nov 14, 2014 10:12:36 GMT -8

Quote

Post by Sam Schreiber on Nov 14, 2014 10:12:36 GMT -8

I'd really prefer to avoid any requirement in the gaming protocol that connections be held open indefinitely. It's already bad enough that connections need to be held open for the entire duration of a turn; it makes it much harder to build a robust distributed player, since there needs to be a front-end server holding the connection open and, while that's happening, that one server is a single point of failure for the entire gamer. I'd really prefer a gaming protocol that doesn't require long-lived connections at all, in any form.

jrao
New Member

Posts: 3

Reversing GGP Client/Server Protocol: Match Request Issues Nov 15, 2014 6:54:58 GMT -8

Quote

Post by jrao on Nov 15, 2014 6:54:58 GMT -8

Yeah, but it would make the protocol so much simpler, and easy to implement on both sides. I did a quick look at various server push technologies, it looks to me most if not all of the solutions involve long-lived connections in one way or another.

For the case of distributed player, I may be missing something, but wouldn't you need a machine to act as frontend no matter what? If this machine goes down, you can still switch to another one as frontend?

alandau
Global Moderator

Posts: 159

Reversing GGP Client/Server Protocol: Match Request Issues Nov 16, 2014 18:25:46 GMT -8

Quote

Post by alandau on Nov 16, 2014 18:25:46 GMT -8

Are the login and logout parts of the communication protocol necessary? It seems like it might be a complication for the client to actually log out as the program is closing, and certainly there would be times when the logout fails (client process suddenly terminates, Internet connection loss, etc.). The server would need to handle unresponsive players anyway, so why not just make that the single method of "logging out"?

I'd like to think that connections could be used as long as they can be maintained (especially during games), to minimize the creation and authentication overhead, but that if they're interrupted at any point the client can reestablish the connection (and re-authenticate) and everything can continue normally. That should allow both the client and server to be distributed, correct? If the currently connected process on either side fails, the client can reconnect. Alternatively, a distributed client could manually change which process is making the connection. (There may need to be some changes to the GDL protocol to ensure that players don't miss any PLAY messages in any cases.)

While we're at it, do we have a strawman proposal for what authentication looks like? Would it just be a username/password pair sent in plaintext, perhaps using HTTPS instead of HTTP? That's pretty simple, but people would need to be careful not to accidentally commit their passwords to a public repository.

Edge case: What if two clients log in as the same player at the same time?

jrao
New Member

Posts: 3

Reversing GGP Client/Server Protocol: Match Request Issues Nov 19, 2014 4:35:08 GMT -8

Quote

Post by jrao on Nov 19, 2014 4:35:08 GMT -8

I wonder how feasible it is to piggyback on an existing protocol, looks to me the GGP communication language is just passing messages back and forth between server and client, so it should be able to run on a general messaging protocol, for example XMPP?

Claus
New Member

Posts: 18

Reversing GGP Client/Server Protocol: Match Request Issues Nov 26, 2014 19:57:04 GMT -8

Quote

Post by Claus on Nov 26, 2014 19:57:04 GMT -8

Nov 13, 2014 9:45:38 GMT -8 jrao said:

...it looks like Tomcat can handle 10K connections on a laptop, and there's also async servlet to consider (http://www.javaworld.com/article/2077995/java-concurrency/asynchronous-processing-support-in-servlet-3-0.html).

This is an excellent point. In the near term, this demonstrates that scalability is not an issue. If the GGP community gets to the point where we have 1000+ players, I think everyone would be very happy but we would also need more powerful game managers. Since the heavy computation (relatively) is managing a bunch of active games, this level of demand would require the game manager to be be implemented on some type of cluster to handle that level of demand for games (hundreds of concurrent active games) or wait times would be unacceptably long.

So while I hate to design protocols with scalability issues, in practical terms scalability is not an near term issue. Since the protocol is extensible through the capabilities returned at login, we can opt for simplicity until scalability becomes a problem.

What about my other constraint of minimizing the time between the HTTP request and the response? (This is different from the length of time the underlying TCP/IP connection remains open--even with HTTP the connection can be maintained across HTTP requests.) There are two important use cases that become problematic with long response times:

Client wants to stop waiting. One solution is for the Game Manager to send a failure response (no game started) after some fixed interval. The client sends another game request if it wants to continue waiting. The other option would be for the client to make a second request to stop waiting (could just be a logout). A bit more work for the Game Manager since it would need to find the pending game request from the client and clean it up. (and send two responses, one acknowledging the logout request and one aborting the game request.)
Game Manager crashes. While this would be a presumably rare scenario, there is no way for a client to distinguish a server failure from a very slow response time. The only solution is for the Game Manager to periodically send a response that the request is still pending. In a strict HTTP request / response flow this in effect means the initial game request failed (no game started) and the client needs to submit a new game request.

The common solution to both use cases is for the Game Manager to send a failure response (no game started) after some fixed interval, with the client sending a new game request if it wants to continue waiting. We can debate what is a reasonable interval. One option is for the client to specify the timeout for a failure response.

Bottom line: this simplifies the protocol by eliminating the special responses when the Game Manager is busy. If the Game Manager is unable to schedule a client for a new game after some interval, whether due to lack of opponents or lack of capacity to start another game, the request fails and the client needs to make a new game request if it wants to continue waiting. The Game Manager should probably track how long each client is waiting, so clients who have waited the longest get priority for the next game when the Game Manager is busy.

Claus
New Member

Posts: 18

Reversing GGP Client/Server Protocol: Match Request Issues Nov 26, 2014 20:39:43 GMT -8

Quote

Post by Claus on Nov 26, 2014 20:39:43 GMT -8

Nov 19, 2014 4:35:08 GMT -8 jrao said:

I wonder how feasible it is to piggyback on an existing protocol, looks to me the GGP communication language is just passing messages back and forth between server and client, so it should be able to run on a general messaging protocol, for example XMPP?

After taking a quick peak at XMPP, my initial reaction is that this may be more than we need. On the plus side, it appears to handle the (mutual) authentication (e.g. TLS). On the minus side, as a general messaging protocol there is configuration complexity beyond what we need for simple point to point messaging as well as the requirement that the payload be formatted as XML. I would need to investigate more to see if complexity is a real issue.

An interesting aspect of the article is how XMPP deals with the issue of clients behind firewalls, which is to use the BOSH protocol over HTTP. BOSH is a way to do bidirectional communication across HTTP without polling, and this looks like what we need. The client basically uses two TCP/IP connections to the server. On one connection the client makes an HTTP request to the server. This is the channel for the server uses when it wants to send a message to the client. The request remains pending (indefinitely) until the server has data to send. As soon as the client receives a message from the server (a response) it immediately sends another HTTP request to the server, so the server always has a channel for sending messages to the client. The client uses the second connection to send messages to the server, to which the server immediately responds, which just means the server got the message. Any data that the server sends in response to the request gets sent on the server channel.

I am going to search to see if I can find an existing BOSH implementation that we could use. If not, it should be very easy to create one for ourselves. The client code is trivial; the server code is a bit more complex but should be straightforward.

Both use cases in my previous post are easily handled with BOSH. By using BOSH to allow asynchronous messaging between client and server, I think that the overall protocol can be designed to be much simpler and closer to what we have today.

Sam Schreiber
Global Moderator

Posts: 46

Reversing GGP Client/Server Protocol: Match Request Issues Dec 6, 2014 22:23:16 GMT -8

Quote

Post by Sam Schreiber on Dec 6, 2014 22:23:16 GMT -8

Tiltyard is already a distributed system for scheduling matches that should scale up to large numbers of players.

Large, widely-used online services rarely run on a single machine that's acting as a frontend, because that machine would be a single point of failure. Instead, they have multiple frontends and incoming requests are load-balanced between them. Unhealthy frontends are automatically drained so incoming traffic doesn't get impacted. Developers who want to build reliable GGP players need the ability to do something similar. We should at least *support* that type of architecture in our protocols, even if not everyone builds their players that way. That's why I'd like to move in the direction of removing long-lived connections from the GGP protocols, rather than adding more of them. Personally what I'd like is for the incoming request to the player to include a callback URL that the player can use to submit draft moves, as frequently as it likes, and the match host uses the player's latest draft move when it's time to compute the next game state. This would make all of the connections in the protocol short-lived, and would allow play to continue smoothly even in the face of intermittent network interruptions or failures of individual nodes within the gamer.

I agree that we should not use XMPP for GGP.

I disagree with adding complications to the GGP protocol just to accommodate developers who can't run an internet-accessible server on their local machines. Do you have data supporting the idea that "Most individual developers are not set up to host servers"? It seems like that would only be the case for developers behind a hostile firewall. And developers who really are in that situation can cheaply rent compute resources on AWS / GCE / Azure / etc to get an internet-accessible server (indeed, AWS will give first-time users 750 hours free).

Last Edit: Dec 6, 2014 22:26:01 GMT -8 by Sam Schreiber

wat
New Member

Posts: 32

Reversing GGP Client/Server Protocol: Match Request Issues Jan 5, 2015 10:26:28 GMT -8

Quote

Post by wat on Jan 5, 2015 10:26:28 GMT -8

Dec 6, 2014 22:23:16 GMT -8 Sam Schreiber said:

I disagree with adding complications to the GGP protocol just to accommodate developers who can't run an internet-accessible server on their local machines. Do you have data supporting the idea that "Most individual developers are not set up to host servers"? It seems like that would only be the case for developers behind a hostile firewall. And developers who really are in that situation can cheaply rent compute resources on AWS / GCE / Azure / etc to get an internet-accessible server (indeed, AWS will give first-time users 750 hours free).

It was an issue during Coursera courses, and also an issue during 2014 annual competition. Took about an hour, after many tries, until all players were acceptably functional, and some players were eliminated during qualifying rounds simply due to router configuration issues (they never worked). And it kept happening again every round.

wat
New Member

Posts: 32

Reversing GGP Client/Server Protocol: Match Request Issues Jan 5, 2015 10:37:58 GMT -8

Quote

Post by wat on Jan 5, 2015 10:37:58 GMT -8

The main scalability issue with maintaining long-lived connections in push-style architectures is assigning a dedicated thread for each connection. Which is the simplest implementation. A connection in itself doesn´t consume too much resources, but a dedicated thread does.

Most servers can handle threads in the thousands. But if you want to handle millions of simultaneous connections, then you need to manage connections independently from threads.

If using Java EE 6, async servlet comes in handy: blogs.oracle.com/enterprisetechtips/entry/asynchronous_support_in_servlet_3

Last Edit: Jan 5, 2015 10:39:08 GMT -8 by wat

Sam Schreiber
Global Moderator

Posts: 46

Reversing GGP Client/Server Protocol: Match Request Issues Jan 5, 2015 23:28:54 GMT -8

Quote

Post by Sam Schreiber on Jan 5, 2015 23:28:54 GMT -8

I described the design issues with long-lived connections. It's not related to per-thread overhead.

As for the problems experienced in the various competitions: developers need to correctly configure their routers, just like they need to get reliable power supplies for their machines, write code that behaves correctly, et cetera. I don't think developer error is a good reason to redesign the protocol. What I'm more interested in is whether there are developers who are in situations where they are actually unable to host servers, even if they're configuring everything under their control correctly. I don't believe that's a common situation (except at hotels and possibly schools with restrictive network policies), and even in those cases, developers can run their players on services like AWS, GCE, Azure, etc, or use one of those services to run a tunneling proxy and forward connections to their local machines, circumventing the firewall.

(Also, if developers aren't testing their players on Tiltyard, they're probably going to be in trouble during competitions anyway just due to lack of testing. Players that are working properly on Tiltyard are reachable from the outside world and will also work properly during competitions.)

wat
New Member

Posts: 32

Reversing GGP Client/Server Protocol: Match Request Issues Jan 6, 2015 6:06:15 GMT -8

Quote

Post by wat on Jan 6, 2015 6:06:15 GMT -8

Jan 5, 2015 23:28:54 GMT -8 Sam Schreiber said:

I described the design issues with long-lived connections. It's not related to per-thread overhead.

As for the problems experienced in the various competitions: developers need to correctly configure their routers, just like they need to get reliable power supplies for their machines, write code that behaves correctly, et cetera. I don't think developer error is a good reason to redesign the protocol. What I'm more interested in is whether there are developers who are in situations where they are actually unable to host servers, even if they're configuring everything under their control correctly. I don't believe that's a common situation (except at hotels and possibly schools with restrictive network policies), and even in those cases, developers can run their players on services like AWS, GCE, Azure, etc, or use one of those services to run a tunneling proxy and forward connections to their local machines, circumventing the firewall.

(Also, if developers aren't testing their players on Tiltyard, they're probably going to be in trouble during competitions anyway just due to lack of testing. Players that are working properly on Tiltyard are reachable from the outside world and will also work properly during competitions.)

Developers were having trouble connecting to Tiltyard too for the same reason. It is theoretically possible for almost all developers to raise a server at home, but harder. And every router is different, making it annoyingly difficult to explain how to do it on a chat or forum.

But if it´s the developer problem to deal with it alone, then the whole reversing client/server role discussion is over.

General Game Playing

Reversing GGP Client/Server Protocol: Match Request Issues

Post by Claus on Oct 16, 2014 17:18:36 GMT -8

Post by Sam Schreiber on Oct 16, 2014 19:13:12 GMT -8

Post by Claus on Oct 17, 2014 12:52:35 GMT -8

Post by jrao on Nov 13, 2014 9:45:38 GMT -8

Post by Sam Schreiber on Nov 14, 2014 10:12:36 GMT -8

Post by jrao on Nov 15, 2014 6:54:58 GMT -8

Post by alandau on Nov 16, 2014 18:25:46 GMT -8

Post by jrao on Nov 19, 2014 4:35:08 GMT -8

Post by Claus on Nov 26, 2014 19:57:04 GMT -8

Post by Claus on Nov 26, 2014 20:39:43 GMT -8

Post by Sam Schreiber on Dec 6, 2014 22:23:16 GMT -8

Post by wat on Jan 5, 2015 10:26:28 GMT -8

Post by wat on Jan 5, 2015 10:37:58 GMT -8

Post by Sam Schreiber on Jan 5, 2015 23:28:54 GMT -8

Post by wat on Jan 6, 2015 6:06:15 GMT -8

Quick Reply