Gtk-Gnutella Binary Format in GGEP Extensions ============================================= GGEP does not define the format of each extension. This memo describes the binary format used by gtk-gnutella to encode its various extensions (held within a GGEP payload). The aim of this format is to retain the compactness of the binary format, yet allow servents to selectively implement what they want, allowing them to skip the parts they don't understand. The extension describes a set of properties. Each property is identified by an architecturally-defined numerical ID. The property value follows the ID. In order to be skippable, the property length is also described. Since a servent may not know about an ID, the length is not implied by the ID but is explicited. The scope of each ID is limited to a given extension, i.e. two different GGEP extensions may use the same numerical ID for some of their properties. Formally, the extension is a sequence of: <-- 1 byte -> <-- n bytes --> The leading byte is architected as: bits 7-3 ID bit 2-0 length Valid ID values in that byte range from 1 to 31. We'll call this value the relative ID value. ID 0 is special and enables segmentation (in order to be able to code more than 31 fields). ID 0 has no attached value, and the length bits indicate the segment number (from 0 to 7), each segment defining a 31-slot range. Subsequent IDs are then offsetted by that amount, so if a segment of 2 is seen, then a following relative ID of 5 is really 31*2+5, an absolute ID of 67. Segments are active until a new segment change is seen in the extension, using a scanning from the start to the end of the extension. Here are the absolute ID ranges, depending on the segment: Segment # Range 0 001-031 1 032-062 2 063-093 3 094-124 4 125-155 5 156-186 6 187-217 7 218-248 Implicitly, at the beginning of the extension, the segment #0 is active (i.e a leading NUL byte is implied, but not physically present). All IDs belonging to the same segment should be groupped together, to avoid the overhead of switching segments. Segments may appear in any order. A given ID may appear more than once, if it makes sense repeating it (with different values each time, most probably). For IDs other than 0, the length bits are interpreted thusly: 0 NUL-terminated value 1 1 byte value 2 2-byte value 3 3-byte value 4 4-byte value 5 8-byte value 6 explicit length (next byte gives length) 7 RESERVED All extensions using this binary format also use big-endian to store their numbers. For boolean values: TRUE is encoded as 0x1, FALSE as 0x2 (hence bit 0 is really the boolean value, whilst preventing a NUL byte that could trigger COBS encoding). Note that when a field length says "1 byte value", nothing is said about the semantics of that byte. It could be a boolean, a signed value or an unsigned value. Its interpretation depends on the ID: it is architecturally defined and immutable for a given ID. Example: We want to encode the following properties (absolute IDs given): ID=4, boolean, 1-byte long, value=FALSE ID=28, timestamp, 4-byte long, value=1023567521 ID=55, unsigned value, 1-byte long, value=0x32 ID=89, string, value="sample" In our first representation, we'll show as <4:1>, meaning ID=4 and length = 1, which would be coded as "00100 001" in binary, or 0x21 hexa. Recall that we begin the extension being implictly in segment #0, so 4 and 28 belong to that segment (covers IDs 1 to 31). Segment #1 holds ID values from 32 to 62, therefore ID=55 is within segment #1. Segement #2 holds IDs from 63 to 93, hence holds ID=89. It follows the following representation: <4:1> 0x02 ; we're in segment #0 <28:4> 1023567521 = 0x3d0266a1 ; UNIX timestamp <0:1> ; entering segment #1 <24:1> 0x32 ; 55 = 1*31+24 <0:2> ; entering segment #2 <27:0> "sample" 0x00 ; 89 = 2*31+27 The string "sample" is coded thusly: 's' = 0x73 'a' = 0x61 'm' = 0x6d 'p' = 0x70 'l' = 0x6c 'e' = 0x65 NUL = 0x00 The final binary representation is therefore: 0x21 0x02 0xe4 0x3d 0x02 0x66 0xa1 0x01 0xc1 0x32 0x02 0xd8 0x73 0x61 0x6d 0x70 0x6c 0x65 0x00 Total payload length: 19 bytes.