Passive/Active Remote Queueing Version 1.0.a Raphael Manfredi August, 14th 2002 May, 10th, 2003 1. Introduction This proposal is a formalization of my previous Queueing proposal, which was in turn derived from the original proposal from Mike Green. It has been refined to take into account the feedback I got from the GDF, as well as my own comments to another queueing proposal from Christopher Rohrs. The aim of remote queueing is to have a servent serving a file (aka, the "server") enqueue a request from a servent requesting a file (aka, the "client") when the server has run out of upload slots. This queueing must satisfy the following properties: * Fairness: first come, first serve, provided the client plays by the rules. * Monitoring: the client side must be able to monitor progress. * Remanence: the queue must survive a servent shutdown, either on the client side or on the server side, provided the servent comes back "quickly". * Legacy: older servents not aware of PARQ must not be penalized and must still get their share of slots. This also participates to Fairness. A few comments on those properties: Fairness is obvious. Monitoring ensures that the client can see the evolution of its queued slot on the server, which helps the end-user decide whether it should pursue further or "cancel" the download. Remanence allows the client or server to become unavailable for a short period of time (max. 5 minutes) and yet not perturb the queue. This property is valuable for modem users (this includes DSL users), which can experience a connection drop, and servent developers who can have their servent crash in the middle of an experiment. Finally, Legacy ensures that in a world of queueing servers, a client not aware of PARQ can still be allowed to get slots, although it will not be able to get the complete PARQ experience, of course. 2. Definitions and Principles PARQ maintains an arbitrary long queue (typically 5000 entries or more) which is split into two parts: Aq. Asynchronous queue, the largest part. Those are the newest entries added to the queue, and are far away from being scheduled. The queue is said to be asynchronous because the connection between the server and the client is not maintained. Sq. Synchronous queue, the smallest part. Those are the entries close to be able to get an uploading slot. The connection between the server and the client is maintained. Typically, Sq will hold the top 50 entries or so, and Aq will hold the remaining 4950 ones. As time goes by, an entry first made in Aq will gradually moves to Sq, and then is allowed the uploading slot. Naturally, if Sq is almost empty, the entry is initially made there, not to Aq, which starts filling when Sq is full. Each entry is identified by a random GUID allocated when it enters the server queue. This GUID acts as a "token" and is used by the server to recognize the entry. This is espcially useful for Aq entries, but Sq entries can occasionally make use of this GUID (for Remanence, or for Legacy clients which are queued without knowing it). An queued slot is said to be "actively queued" if the client is aware of PARQ and knows that its request is in the queue. Otherwise, the slot is said to be "passively queued", which means the client is unaware of it (legacy clients). It follows that only actively queued entry actually use the Sq. All passively queued entry, use the Sq slots like the Aq ones, since the client is unaware of it. 3. Required Features In order to use PARQ actively, the client must support HTTP/1.1 persistent connections. If an active client is connected to the server, and the connection is broken for any reason, it can reconnect within the next 300 seconds and keep its slot in the queue (Remanence effect). In order to use PARQ passively, the client need either support the Retry-After: header in HTTP responses, and strictly honour it instead of its internal retry timer. Only the Retry-After: for of the header is used, for convenience. If the client does not support Retry-After:, as many legacy servents do, the client only needs to retry the download on 503 before the next 300 seconds. Optionally, non-firewalled clients that support PARQ can also support a new form of request that can be used by the server as a callback. It is similar to a GIV request sent by the server to the client when a PUSH is received. It is named "QUEUE" and will be detailed in section 4. 4. Active Queueing This section presents the active queueing, i.e. only addresses clients that are aware of PARQ. PARQ-compliance is indicated by the client through the following header in the HTTP request: X-Queue: 1.0 which indicates that the servent will understand the queueing protocol and the replies from the PARQ-compliant server. An initial request/answer from the client will result in the following HTTP exchange: GET /resource HTTP/1.1 User-Agent: blah X-Queue: 1.0 X-Node: 13.4.5.6:4567 HTTP/1.1 503 Queued Server: blah X-Queue: 1.0 X-Queued: position=1503; length=1840; ETA=4200; lifetime=3600; ID=6a074979f3201edbf323b6425fef4a53 Connection: close Retry-After: 600 The "X-Node" in the request indicates the client is not firewalled and provides the address and port where the server can make its QUEUE callback, if necessary (more on this later). The server tells its PARQ level of support via its own X-Queue header. Even if the server supports a more recent version (say 1.3), it will always downgrade its reply to match that of the client. The "X-Queued" header of the reply indicates that the request was asynchronously queued. The general syntax of the X-Queued: header is a list of var=value settings, each separated by ';'. The spaces after each ';' are optional. Variables are case-insensitive (i.e. "ID" and "id" are the same). The meaning of each variable is: "position": This is the current queue position (here 1503). Also known as the current queue slot. "length": The current queue length (here 1840 requests are queued). "ETA": The ETA (Estimated Time of Arrival) for the item to reach an upload slot, in seconds. It can be estimated by summing the whole size of the last files requested at each queue position, and assuming an optimal transfer rate equal to the maximum upload bandwidth. (here, 4200 seconds, a totally non-realistic figure). "lifetime": time during which the server will hold the queue position. The server expects a new request to come before the lifetime expires. (here, a new request must be made before the next 3600 seconds). However, the minimum retry blackout period indicated by Retry-After MUST be honoured by the client (here, 600 seconds). "ID": The unique queue ID (pure random number). It must be remembered and given out by the client within further requests to identify the queued item. If the client does not re-issue the request before "lifetime", and provided there was an X-Node given, then the server will contact the client at the specified IP:port and will give it the following request: QUEUE 6a074979f3201edbf323b6425fef4a53 45.56.67.78:1234\r\n To "cancel" the request, the client simply stops sending requests and ignores the QUEUE callbacks, which will cause the slot to be freed. The format of the QUEUE line is: QUEUE :\r\n The IP and port are those of the server. It is necessary to communicate them again because the server might have changed its IP address since the client started the queueing, and if we don't refresh it, the client will have no way to contact the server back again: it will only be able to reply to QUEUE requests. In order to maintain its queue position, the client MUST reply with: GET /resource HTTP/1.1 User-Agent: blah X-Node: 13.4.5.6:4567 X-Queue: 1.0 X-Queued: position=1503; ID=6a074979f3201edbf323b6425fef4a53 HTTP/1.1 503 Queued Server: blah X-Queue: 1.0 X-Queued: position=1501; length=3500; ETA=4100; lifetime=3600; ID=6a074979f3201edbf323b6425fef4a53 Connection: close Retry-After: 600 This is identical to any client-generated requery made by the client. Note that even when replying to a QUEUE request, the ID MUST be given in the request. This simplifies the implementation on the server side. (and the server will again reply with the same ID, the aim being to avoid special-casing the processing of those headers as much as possible). The X-Queued header in the request simply shows the server that the client at this IP:port knows the last slot number that was returned for this entry, and is therefore the proper client. If there is a mismatch, the corresponding slot is immediately freed. The QUEUE callback is useful when the server is firewalled and the client is not. Otherwise, the client will make the request after the time given by the Retry-After, and the server will never make the callback. Any PARQ-compliant client MUST honour the Retry-After. Sending the request too soon will cause the slot to be freed. When the slot number is low enough to enter the Synchronous Queue (Sq), the HTTP exchanges becomes totally different. Let's take a requery as an example, but things would be totally similar for an initial request: GET /resource HTTP/1.1 User-Agent: blah X-Node: 13.4.5.6:4567 X-Queue: 1.0 X-Queued: position=53; ID=6a074979f3201edbf323b6425fef4a53 HTTP/1.1 503 Queued Server: blah X-Queue: 1.0 X-Queued: position=14; length=4300; ETA=2100; ID=6a074979f3201edbf323b6425fef4a53 Retry-After: 290 <290 seconds later> GET /resource HTTP/1.1 User-Agent: blah X-Node: 13.4.5.6:4567 X-Queue: 1.0 X-Queued: position=14; ID=6a074979f3201edbf323b6425fef4a53 HTTP/1.1 503 Queued Server: blah X-Queue: 1.0 X-Queued: position=1; length=4300; ETA=2100; ID=6a074979f3201edbf323b6425fef4a53 Retry-After: 30 <30 seconds later> GET /resource HTTP/1.1 User-Agent: blah X-Node: 13.4.5.6:4567 X-Queue: 1.0 X-Queued: position=1; ID=6a074979f3201edbf323b6425fef4a53 HTTP/1.1 200 OK Server: blah This sequence requires some explainations: the server is not replying with a "Connection: close" header, so according to regular HTTP/1.1 semantics, the connection is persistent. Therefore, after the delay imposed by the server (via the Retry-After header), the client can re-issue its request. Until it gets a 200 (or 206 for partial content) reply. If the connection between the server and the client is broken at this stage, the server will hold the slot for 300 seconds max the first time. The client can re-issue its request within the first 300 seconds, or the server will call back with a QUEUE request after 300 seconds (i.e. the implicit "lifetime" here is 300). To prevent abuse, servers MAY limit the amount of disconnections allowed within Sq, but it must allow at least 1 (one). As the slot moves towards the head of the queue (slot 1), the time between retries shrinks. The algorithm to compute the Retry-After is given by: x = 30 + 20 * (slot - 1) which is valid up to x = 20. After that, a constant retry timeout is used (e.g. 600 seconds). If all the slots before become cancelled, this will force the server to reserve the slot for that amount of time, so 600 seconds is a reasonable value. 5. Passive Queueing This section presents the passive queueing, i.e. only addresses clients that are unaware of PARQ (clients that do not send the X-Queue header, and which are incapable of handling the QUEUE callback). The principle of passive queueing is that the server remembers the IP address of the requesting servent and allocates a slot for it. This slot is kept as long as the servent retries within the next 300 seconds. Naturally, there is no Sq used with those clients. Even when the slot number is low enough to deserve an entry in the Sq, the server will act as if the slot was in Aq. The "lifetime" is implicitly set to 300 seconds. In order to avoid excessive requerying, the server will use the Retry-After: 250 header to indicate that it is useless to retry before 250 seconds (for instance). As the requests moves in the queue towards slot 1, this amount will decrease. It is suggested that the amount be initially set to 250 for slots up to 21 (included), and then to 30 + 20*(slot-1): 410 for slot=20, and 30 for slot=1. Since Retry-After is a standard HTTP header, all servents should already honour it correctly. If the client does not understand the header, it will requery at its normal rate (since it does not know about the queue). PARQ-compliant servers should not penalize the legacy servent for not honouring this header, other than banning the servent if he clearly hammers. This forces the server to hold slots for at most 300 seconds. In practice, servents requery every 60 seconds or so. 6. Discussion The main advantage of PARQ is that it provides a queueing system for modern servents whilst not completely preventing older servents from getting slots. Full deployment of PARQ will indeed take some time. Another important advantage is that it allows the download queue to be persisted. Even the Sq can be "persisted". This will prove most useful for non-firewalled clients, as they will get a callback from the server, possibly days later, when the server comes back online. There has to be some advantage to not being firewalled! It is also important to note that PARQ is only a slot reservation system. There is no correlation between a slot and a file. This means a client can request a file initially, and change its mind until the upload slot is allocated. The Remanence property of the queue can also be extended to overcome the lack of persistent connection support in HTTP/1.0 by reserving an upload slot for 300 seconds, to the same IP, same file! This can help servents behind a modem, that waited painfully to get an upload slot and once they got it, had their connection broken suddenly. The upload slots could even be "persisted", for PARQ-aware servents that are not firewalled: the server would then send back QUEUE requests to them, and they would get back their slot immediately. Because both the client and the server exchange version numbers via the X-Queue header, smooth upgrades will be possible: a server will always downgrade to a level acceptable by the client. And future versions of this specification will mandate that a client at level 1.2 be able to understand 1.1 and 1.0 replies from a server. The reason for using active retries within Sq, instead of merely relying on HTTP continuations (100 status) is that it allows propagation of the download mesh information, and also it allows the client to determine the requested range at the last minute. It even permits the client to change its mind and request a totally different file than it had initially done. 7. Backward Compatibility with Legacy Queueing BearShare and Shareaza support an incompatible queueing protocol, built before PARQ was launched (it's called "Active Queueing"). In order to allow for smooth transition to PARQ and at the same time not penalize PARQ-compliant clients that would connect to those legacy servers, any client advertising: X-Queue: 1.0 MUST also be prepared to honour the legacy queueing implementation. The legacy server will reply with: X-Queue: position=2,length=5,limit=4,pollMin=45,pollMax=120 It's easy to tell that it's a legacy servent, because PARQ servers will use the X-Queue header in replies to advertise the level used to reply. Therefore, any X-Queue header not starting with "digit.digit" must belong to the legacy implementation. In this header, the only fields of interest are: . position: this is the same as PARQ's. . length: this is the same as PARQ's. . pollMin: this is the same as PARQ's Retry-After header. . pollMax: this is the same as PARQ's "lifetime" field. Note that the separator used is ",", whereas PARQ uses ";". This means there can be multiple X-Queue headers in the reply, whereas PARQ's choice of ";" as the separator ensures the unicity of the X-Queued header in the reply. Like PARQ, those legacy servents require that the connection be maintained, i.e. they require HTTP/1.1 persistent connections (they only support some kind of Sq, in PARQ parlance). If the connection is dropped, the slot will be lost since those servers do not support Remanence. PARQ servers SHOULD support Active Queueing as well as PARQ (i.e. honour at least the X-Queue: 0.1 header and behave as in Active Queueing). 8. Conclusion Let's go back to the initial goals we had set and look how well we fulfill them: * Fairness: this is ensured by the FIFO nature of the queue and by the fact that we support both active and passive queueing. * Monitoring: low granularity when the request is far deep in the queue (Aq), the period of feedback increases as the request reaches Sq and moves to the first slots. Each client request also gives the instantaneous status (queue position, ETA). * Remanence: this is a property of the implementation, but PARQ makes it easy to support since it was designed with it in mind. * Legacy: this is assured by the passive queueing features of PARQ. A. Change History May 10th, 2003 - revision 1.0.a: * Changed X-Listen-IP headers into X-Node to be compliant with what BearShare is going to use (combination of X-Node and X-Features headers to obsolete headers like Chat: or Browse: in the requests). * Added the IP:port section to the QUEUE callback. Amazing that nobody had noticed that it was lacking!