

Messages consist of a fixed-size header and variable length opaque byte array payload. The header contains a format version and a CRC32 checksum to detect corruption or truncation. Leaving the payload opaque is the right decision: there is a great deal of progress being made on serialization libraries right now, and any particular choice is unlikely to be right for all uses. Needless to say a particular application using Kafka would likely mandate a particular serialization type as part of its usage. The MessageSet interface is simply an iterator over messages with specialized methods for bulk reading and writing to an NIO Channel.

/*** A message. The format of an N byte message is the following:** If magic byte is 0** 1. 1 byte "magic" identifier to allow format changes** 2. 4 byte CRC32 of the payload** 3. N - 5 byte payload** If magic byte is 1** 1. 1 byte "magic" identifier to allow format changes** 2. 1 byte "attributes" identifier to allow annotations on the message independent of the version (e.g. compression enabled, type of codec used)** 3. 4 byte CRC32 of the payload** 4. N - 6 byte payload**/



 /*** 1. 4 byte CRC32 of the message* 2. 1 byte "magic" identifier to allow format changes, value is 0 or 1* 3. 1 byte "attributes" identifier to allow annotations on the message independent of the version*    bit 0 ~ 2 : Compression codec.*      0 : no compression*      1 : gzip*      2 : snappy*      3 : lz4*    bit 3 : Timestamp type*      0 : create time*      1 : log append time*    bit 4 ~ 7 : reserved* 4. (Optional) 8 byte timestamp only if "magic" identifier is greater than 0* 5. 4 byte key length, containing length K* 6. K byte key* 7. 4 byte payload length, containing length V* 8. V byte payload*/



Messages (aka Records) are always written in batches. The technical term for a batch of messages is a record batch, and a record batch contains one or more records. In the degenerate case, we could have a record batch containing a single record. Record batches and records have their own headers. The format of each is described below for Kafka version 0.11.0 and later (message format version v2, or magic=2). Click here for details about message formats 0 and 1.

baseOffset: int64
batchLength: int32
partitionLeaderEpoch: int32
magic: int8 (current magic value is 2)
crc: int32
attributes: int16bit 0~2:0: no compression1: gzip2: snappy3: lz4bit 3: timestampTypebit 4: isTransactional (0 means not transactional)bit 5: isControlBatch (0 means not a control batch)bit 6~15: unused
lastOffsetDelta: int32
firstTimestamp: int64
maxTimestamp: int64
producerId: int64
producerEpoch: int16
baseSequence: int32
records: [Record]


