[Converted from HTML via "html2text -width 79 -style pretty -nobs"] Direct Internet Message Encapsulation May 23 2001 Authors Henrik_Frystyk_Nielsen, Henry_Sanders, Erik_Christensen, Microsoft Copyright Notice (c) 2001 Microsoft Corporation. All rights reserved. The presentation, distribution or other dissemination of the information contained herein by Microsoft is not a license, either expressly or impliedly, to any intellectual property owned or controlled by Microsoft. This document and the information contained herein is provided on an "AS IS" basis and to the maximum extent permitted by applicable law, Microsoft provides the document AS IS AND WITH ALL FAULTS, and hereby disclaims all other warranties and conditions, either express, implied or statutory, including, but not limited to, any (if any) implied warranties, duties or conditions of merchantability, of fitness for a particular purpose, of accuracy or completeness of responses, of results, of workmanlike effort, of lack of viruses, and of lack of negligence, all with regard to the document. ALSO, THERE IS NO WARRANTY OR CONDITION OF TITLE, QUIET ENJOYMENT, QUIET POSSESSION, CORRESPONDENCE TO DESCRIPTION OR NON-INFRINGEMENT WITH REGARD TO THE DOCUMENT. IN NO EVENT WILL MICROSOFT BE LIABLE TO ANY OTHER PARTY FOR THE COST OF PROCURING SUBSTITUTE GOODS OR SERVICES, LOST PROFITS, LOSS OF USE, LOSS OF DATA, OR ANY INCIDENTAL, CONSEQUENTIAL, DIRECT, INDIRECT, OR SPECIAL DAMAGES WHETHER UNDER CONTRACT, TORT, WARRANTY, OR OTHERWISE, ARISING IN ANY WAY OUT OF THIS OR ANY OTHER AGREEMENT RELATING TO THIS DOCUMENT, WHETHER OR NOT SUCH PARTY HAD ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES. Status DIME and related specifications are provided as-is and for review and evaluation only. Microsoft hopes to solicit your contributions and suggestions in the near future. Microsoft Corporation makes no warrantees or representations regarding the specifications in any manner whatsoever. Abstract This document defines the media type "application/dime". Direct Internet Message Encapsulation (DIME) is a lightweight, binary encapsulation format that can be used to encapsulate multiple application defined entities or payloads of arbitrary type and size into a single message construct. The only parameters described by DIME is the payload type, the length, and an optional payload identifier. The type is identified by either a URI or a registered media type and the length by an integer indicating the number of octets of the payload. The optional payload identifier is in the form of a URI enabling cross- referencing between payloads. The format is strictly an encapsulation format and provides no concepts of a connection or logical circuit and does not address head-of-line problems. It is designed to make as few assumptions about the underlying or encapsulating protocol as possible. Table of Contents 4.1.1 Message 4.1.2 Record 4.1.3 Chunked_Records 4.2 DIME_Payload_Description 4.2.1 Payload_Length 4.2.2 Payload_Type 4.2.3 Payload_Identification 5. The_DIME_Specifications 5.1 DATA_Transmission_Order 5.2 Record_Layout 5.2.1 MB_(Message_Begin) 5.2.2 ME_(Message_End) 5.2.3 CF_(Chunked_Flag) 5.2.4 TNF_(Type_Name_Format) 5.2.5 TYPE_LENGTH 5.2.6 TYPE 5.2.7 ID_LENGTH 5.2.8 ID 5.2.9 DATA_LENGTH 5.2.10 DATA 6. Security_Considerations 7. Acknowledgements 8. References 1. Introduction SOAP provides a flexible and extensible envelope for exchanging structured information between SOAP peers. However, because SOAP is XML-based there are certain inherent limitations in dealing with three particular issues: 1. Not all data is suited for being embedded within a SOAP message. Such data can for example be binary data in the form of image files etc. The overhead of encoding binary data in a form acceptable to XML is often significant both in terms of bytes added as a result of the encoding as well as processor overhead performing the encoding/decoding. 2. Encapsulation of other XML documents as well as XML fragments and encrypted XML might be cumbersome to embed within a SOAP messageespecially if the XML parts do not use the same character encoding. 3. Although SOAP messages inherently are self-delimiting, the message delimiter can only be detected by parsing the complete message which can imply a significant overhead in data processing. Although the SOAP Attachment [8] specification attempts to address these problems, MIME multipart is often not adequate as it itself adds significant parsing overhead in order to determine the various sub-parts of the MIME message. In contrast, DIME provides a simple, message encapsulation format that provides three basic services that significantly can decrease the parsing overhead as well as the buffering requirements in both the generator and the parser: The payload length The complete length of a record is unambiguously indicated within the first 8 octets of the record The payload type By providing a mechanism for indicating the type of a record, it is possible to dispatch records to the appropriate user application based on the type of the record. The payload identifier A payload carried within DIME can be identified by a globally unique identifier in the form of a URI [4]. This makes it possible to refer to a particular payload based on its URI. 1.1 Design Goals Because of the large number of often controversial message encapsulation formats, record marking protocols and in particular layered multiplexing protocols that have been brought up in the past and of which several are under consideration elsewhere, we would like to be as explicit about the goals and in particular the non-goals of DIME as possible. It is the explicit hope that these design goals and in particular the non-goals will be considered when evaluating the format defined as part of this specification. The design goal of DIME is to provide a simple encapsulation format that 1. Supports encapsulating arbitrary payloads including encrypted data, XML documents, XML fragments, image data like GIF and JPEG files, etc. 2. Supports aggregation of multiple records that are logically associated in some manner into a single message. A message can for example contain a document and a set of attachments related to that document. 3. Supports encapsulating payloads of initial unknown size. This can for example be used to transfer a dynamically generated data stream as a series of chunks. 4. Provides identification of the payload based on its type 5. Enables identification of the payload based on a globally unique ID The following list explicitly states the non-goals of DIME in order to firmly state what is outside the scope of DIME: 1. DIME must not make any assumptions about the type of messages that are carried within DIME records or the message exchange patterns of such messages. 2. DIME must not in any way introduce the notion of a connection or logical circuit (virtual or otherwise) 3. DIME must not attempt to deal with head-of-line blocking problems that might occur when using stream-oriented protocols like TCP Although DIME is a self-contained encapsulation format, it is not a protocol nor an attempt of providing transport level services in general. 2. Notational Conventions The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3]. 3. The DIME Encapsulation Model A DIME message is a general encapsulation format that contains one or more DIME records. Each record can contain an application payload of arbitrary type and size. DIME provides three parameters for each payload: the length, the type, and an optional identifier. The DIME type identifier indicates the type of the payload. The format of the type identifier is either a globally unique URI allowing for an infinite value space without central administration or a media type registered by the Internet Assigned Number Authority (IANA [10]). The latter allows DIME to take advantage of the already very large and successful media type value space maintained by IANA. The optional payload identifier is intended to be globally unique in order to enable applications to refer to specific payloads independent of the payload type. The payload identifier is always a URI. 3.1 Intended Usage The intended usage for DIME is as follows: A user application wants to encapsulate one or more related documents or payloads into a single DIME message. This can for example be a SOAP message along with a set of attachments. Each payload is assigned a DIME record that describes the type and length of the payload along with an optional identifier. The set of records are put together to form a DIME message. DIME can be used in combination with most protocols that support exchange of binary data as long as the DIME message can be exchanged in its entirety and in order. DIME records can encapsulate any message type. For example, it is possible to carry MIME messages in DIME records by using the type indicator of "message/ rfc822". DIME messages can be encapsulated in DIME records by using the media type "application/dime". It is important to note that although MIME entities are supported, there are no assumptions in DIME that the record payload is MIME; DIME makes no assumption of the type of the payloads carried in a DIME message. DIME is not a protocol and provides no support for error handling. It is up to the user application to determine what the semantics of malformed DIME records are and how they are handled. It is also up to the user applications to provide any additional QoS that they may need. 3.2 DIME Terminology DIME message The overall unit of encapsulation. A message contains one or more DIME records (see section 4.1.1). DIME record A DIME record can contain a payload that is described by a type, a length, and an optional identifier (see section 4.1.2). DIME chunked record Chunked records can be used to partition a payload into smaller pieces allowing for dynamically generated data of initial unknown size to be encapsulated without knowing the complete length up front (see section 4.1.3) DIME payload The user application defined data carried within a record DIME payload length An integer that indicates the length of the payload in octets (see section 4.2.1) DIME payload type A globally unique identifier that indicates the type of payload (see section 4.2.2) DIME payload identifier A globally unique identifier that identifies a specific payload (see section 4.2.3) DIME user application The logical higher-layer application that uses DIME for encapsulating messages DIME generator A software entity or module that encapsulates payloads within DIME messages. DIME parser An software entity or module that parses DIME messages and hands the payload to a DIME user application 4. The DIME Mechanisms This section describes the mechanisms used in DIME whereas section_5 defines the specific syntax for these mechanisms. 4.1 DIME Encapsulation Constructs 4.1.1 Message A DIME message is composed by one or more DIME records. Each record in a message can contain a payload of any type and size and can be chunked (see section 4.1.3). The media type for a DIME message is "application/dime". DIME messages can be used to encapsulate a set of records that are logically associated in some manner, for example that they should be processed in the aggregate. The exact relationship and any impact on processing models etc. is defined by the user application or is a function of the record types. The first record in a message is marked with the MB (Message Begin) flag set and the last record in the message is marked with the ME (Message End) flag set. The minimum message length is one record which is achieved by setting both the MB and the ME flag in the same record. There is no maximum number of records for a DIME message. DIME messages MUST NOT overlap. That is, the MB and the ME flags MUST NOT be used to nest DIME messages. DIME messages can be nested by carrying a full DIME message within a DIME record with the type "application/dime". Figure 1: Example of a DIME message with a set of records. The message moves in the direction from right to left with the logical record indexes t > s > r > 1. The MB (Message Begin) flag is set in the first record and the ME (Message End) flag is set in the last record. <--------------------- DIME message ----------------------> +---------+ +---------+ +---------+ +---------+ | R1 MB=1 | ... | Rr | ... | Rs | ... | Rt ME=1 | +---------+ +---------+ +---------+ +---------+ An important restriction is that there is no sequencing of individual records in a message. DIME messages therefore MUST be used on top of an in-order stream-oriented transport service such as TCP or if they fit entirely within the MTU of a non-stream-oriented transport. 4.1.2 Record A record is the unit for carrying a payload within a DIME message. Each record is described by a set of parameters (see section 4.2). 4.1.3 Chunked Records Chunked records allow dynamically produced data of the same type to be encapsulated along with the information necessary for the parser to verify that it has parsed the full message. Chunked records is not a mechanism for introducing multiplexing into DIME. It is a mechanism for allowing dynamically generated data to be encapsulated as a series of records hence minimizing the need for outbound buffering on the generating side. This is similar to the message chunking mechanism defined in HTTP/1.1 [6]. A chunked record series contains an initial chunk followed by zero or more middle chunks followed by a terminating chunk. The encoding rules are as follows: 1. The initial chunk is indicated by having the CF flag set. The type of the payload MUST be indicated in the TYPE field regardless of whether there is a non-zero payload or not. 2. Each middle chunk is marked with the CF flag set, a zero-length TYPE field, and a zero-length ID field indicating that more data of the same type and with the same identifier is following. The TNF field MUST be set to 0x00 (see section 5.2.4). 3. The terminating chunk is indicated by having the CF flag cleared, a zero- length TYPE field, and a zero-length ID field. The TNF field MUST be set to 0x00 (see section 5.2.4). A chunk series MUST NOT span multiple DIME messages. That is, a chunk series MUST be entirely encapsulated within a single DIME message. 4.2 DIME Payload Description Each record contains information about the payload carried within the record. This section introduces the mechanisms by which the payload is described. 4.2.1 Payload Length Regardless of what the relationship of a record is to other records, the payload length always indicates the length of the payload encapsulated in THIS record. The length of the payload is indicated in number of octets in the DATA_LENGTH field. 4.2.2 Payload Type The payload type indicates the kind of data being carried in the payload. DIME makes no guarantee that or how a message will be processed or otherwise manipulated based on the value of the TYPE field. Rather the field is an identification mechanism for the payload which for example can be used to pick the appropriate user application that is to receive the payload. User applications can use the mechanism to determine the contents of a record and based on the context determine how to process the message. Typically the processing context will be defined through the use of dynamically negotiated services, well-known transport service ports (TCP, UDP, etc), or out of band communication. The value of the TYPE field is either an absolute URI or a media type construct. The format is indicated using the TNF (Type Name Format) field. A set of registered media types is maintained by the Internet Assigned Number Authority (IANA [10]). The media type registration process is outlined in RFC 2048 [12]. Use of non-registered media types is discouraged. Similarly a set of registered URI schemes is maintained by IANA [10]. The URI scheme registration process is described in RFC 2717 [13]. It is recommended that only well-known URI schemes registered by IANA [10] be used. URIs can be used for message types that are not expected to be registered as media types or that do not map well onto the media type mechanism. Examples of such message types are SOAP based protocols such as SOAP-RP. Records that carry a payload with an XML based message type MAY use the XML namespace identifier of the root element as the TYPE field value: http://schemas.xmlsoap.org/rp/ Records that carry a payload with an existing, registered media type SHOULD carry a TYPE field value of that media type. For example, the media type message/rfc822 indicates that the payload is a MIME message as defined by RFC 2046 [11]. The registered media type message/http indicates that the payload is an HTTP message as defined by RFC 2616 [6]. Note that this means that the payload is of type "message/http" and NOT a MIME message which contains an entity of type "message/http". A value of application/xml; charset="utf-16" indicates that the payload is an XML document as defined by RFC 3023 [12]. Again, this means that the payload is of type "application/xml" and NOT a MIME message which contains an entity of type "application/xml". 4.2.3 Payload Identification The optional payload identifier allows user applications to uniquely identify a DIME payload. The reason for providing a payload with a globally unique identifier is so that it is possible to refer to that payload, for example from within SOAP messages and other documents that support hypertext links. DIME does not mandate any particular linking mechanism but leaves this to the user application to define this in the language that it prefers. It is important that payload ids are maintained so that references to those payloads are not broken. If records are repackaged, for example by an intermediate application, then that application SHOULD ensure that the payload ids are preserved. 5. The DIME Specifications 5.1 DATA Transmission Order The order of transmission of the DIME record described in this document is resolved to the octet level. For diagrams showing a group of octets, the order of transmission of those octets is left to right, top to bottom as they are read in English. For example, in the diagram in Figure_2, the octets are transmitted in the order they are numbered. Figure 2: DIME Octet Ordering 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | Octet 1 | Octet 2 | | Octet 3 | Octet 4 | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | Octet 5 | Octet 6 | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Whenever an octet represents a numeric quantity, the leftmost bit in the diagram is the high order or most significant bit. That is, the bit labeled 0 is the most significant bit. For each multi-octet field representing a numeric quantity defined by DIME, the leftmost bit of the whole field is the most significant bit. Such quantities are transmitted in a big-endian manner with the most significant octet transmitted first. 5.2 Record Layout DIME records are variable length records with a common format illustrated in Figure_3. In the following sections, the individual record fields are described in more detail. Figure 3: DIME Record Layout. The use of "/" indicates a field length which is a multiple of 4 octets. 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |MB|ME|CF| TNF | ID_LENGTH | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | TYPE_LENGTH | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | DATA_LENGTH | | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | TYPE + PADDING / / | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | ID + PADDING / / | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | / / DATA + PADDING / / | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 5.2.1 MB (Message Begin) The MB flag is a one-bit signal that indicates the start of a DIME message (see section 4.1.1). 5.2.2 ME (Message End) The ME flag is a one-bit signal that indicates the end of a DIME message (see section 4.1.1). 5.2.3 CF (Chunked Flag) If the CF flag is set then this record is a chunked record (see section 4.1.3) 5.2.4 TNF (Type Name Format) The TNF field value indicates whether the value of the TYPE field is an absolute URI or a media type construct (see section 4.2.2). The TNF field values are defined as follows: Table 1: DIME TNF field values Type Name Format Value none 0x00 media-type as defined in RFC 2616 [6] 0x01 absoluteURI as defined in RFC 2396 [4] 0x02 reserved 0x03 The reserved TNF field values are reserved for future use and MUST NOT be used. The value 0x00 MUST be used in all but the first chunk in a chunked record series (see section 4.1.3). It SHOULD NOT be used in any other record. 5.2.5 TYPE_LENGTH An unsigned 16 bit integer that specifies the length in octets of the TYPE field (see section 4.2.1). 5.2.6 TYPE The value of the TYPE field is an identifier of type indicate by the TNF field. It can either be a a globally unique identifier in the form of an absoluteURI [4] or a media-type [6] that describes the type of the payload (see section 4.2.2). With the exception of chunked records (see section 4.1.3), all records MUST have a non-zero TYPE field. There is no default value for the TYPE field. The length of the TYPE field MUST be a multiple of 4 octets. If the length of the payload type value is not a multiple of 4 octets, the generator MUST pad the value with all zero octets and this padding is not included in the TYPE length field. The generator should never pad with more than 3 octets. The parser MUST ignore the padding octets. We STRONGLY RECOMMEND that the identifier be globally unique and maintained with stable and well-defined semantics over time. 5.2.7 ID_LENGTH An unsigned 11 bit integer that specifies the length in octets of the ID field (see section 4.2.3). 5.2.8 ID The value of the ID field is a unique identifier in the form of a URI [4] that refers to THIS payload and this payload only (see section 4.2.3). With the exception of chunked records, all records MAY have a non-zero ID field. The uniqueness of the message identifier is guaranteed by the generator. We STRONGLY RECOMMEND that the identifier is as unique over space and time as possible. It is RECOMMENDED that identity values are either Universally Unique Identifiers (UUIDs), as illustrated in the example above or generated from message content using cryptographic hash algorithms such as MD5. The length of the ID field MUST be a multiple of 4 octets. If the length of the payload id value is not a multiple of 4 octets, the generator MUST pad the value with all zero octets and this padding is not included in the ID length field. The generator should never pad with more than 3 octets. The parser MUST ignore the padding octets. 5.2.9 DATA_LENGTH The DATA_LENGTH field is an unsigned 32 bit integer that specifies the length in octets of the DATA field excluding any padding used to achieve a 4 octet alignment of the DATA field (see section 4.2.1). 5.2.10 DATA The DATA field carries the payload intended for the DIME user application. Any internal structure of the data carried within the DATA field is opaque to DIME. The length of the DATA field MUST be a multiple of 4 octets. If the length of the payload is not a multiple of 4 octets, the generator MUST pad the value with all zero octets and this padding is not included in the DATA length field. The generator should never pad with more than 3 octets. The parser MUST ignore the padding octets. 6. Security Considerations Implementers should pay special attention to the security implications of any record types that can cause the remote execution of any actions in the recipient's environment. Before accepting records of any type an application should be aware of the particular security implications associated with that type. 7. Acknowledgements @@@TBD@@@ 8. References [1] J. B. Postel, "Simple Mail Transfer Protocol", RFC_821, ISI, August 1982 [2] S. Bradner, "The Internet Standards Process -- Revision 3", RFC2026, Harvard University, October 1996 [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, Harvard University, March 1997 [4] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC_2396, MIT/LCS, U.C. Irvine, Xerox Corporation, August 1998. [5] M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg, "SIP: Session Initiation Protocol", RFC2543, ACIRI, Columbia U., Cal Tech, Bell Labs, March 1999 [6] R. Fielding, J. Gettys, J. C. Mogul, H. F. Nielsen, T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC_2616, U.C. Irvine, DEC W3C/MIT, DEC, W3C/MIT, W3C/MIT, January 1997 [7] W3C Note "Simple_Object_Access_Protocol_(SOAP)_1.1", May 2000 [8] W3C Note "SOAP_Messages_with_Attachments", December 2000 [9] H. F. Nielsen, S. Thatte, "SOAP Routing Protocol", Mar 2001 [10] Reynolds, J. and J. Postel, "Assigned Numbers", STD 2, RFC_1700, October 1994. [11] N. Freed, N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types" RFC_2046, Innosoft First Virtual, November 1996 [12] N. Freed, J. Klensin, J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", RFC_2048, Innosoft, MCI, ISI, November 1996 [13] R. Petke, I. King, "Registration Procedures for URL Scheme Names", BCP: 35, RFC_2717, UUNET Technologies, Microsoft Corporation, November 1999