ZCM Type System
This page describes the ZCM Type System grammar, encoding, and type hashes in very formal terms. Unless you're intimately concerned with the subtlties, feel free to skim this document, and refer back as reference.
Grammar
Primitives
int8_t | 8-bit signed integer |
int16_t | 16-bit signed integer |
int32_t | 32-bit signed integer |
int64_t | 64-bit signed integer |
float | 32-bit IEEE floating point value |
double | 64-bit IEEE floating point value |
string | UTF-8 string |
boolean | true/false logical value |
byte | 8-bit value |
Specification
The grammar is given in EBNF using regex-style repetition and character classes:
file = zcmtype*
zcmtype = 'struct' name '{' field* '}'
field = const_field | data_field
const_field = 'const' const_type numbits? name '=' const_literal ';'
const_type = int_type | float_type | 'byte'
const_literal = hex_literal | int_literal | float_literal
data_field = type numbits? name arraydim* ';'
type = primative | name
primative = int_type | float_type | 'string' | 'boolean' | 'byte'
int_type = 'int8_t' | 'int16_t' | 'int32_t' | 'int64_t'
float_type = 'float' | 'double'
numbits = ':' int_literal
arraydim = '[' arraysize ']'
arraysize = name | uint_literal
name = underalpha underalphanum*
underalpha = [A-Za-z_]
underalphanum = [A-Za-z0-9_]
hex_literal = "0x" | hexdigit+
hexdigit = [0-9A-Fa-f]
uint_literal = [0-9]+
int_literal = '-'? uint_literal
Semantic Constraints
Using the grammar above, to be well-formed the following constraints must be satisfied:
- Each field's name must be unique
- Names used for array sizes must refer to a field in the same 'zcmtype' that has a scalar integer type
- Names used for 'type' refer to other 'zcmtype' definitions
- These may exist in other files
- The absolute value of the int_literal specified for numbits must always be less than the number of bits of the corresponding type
- Sign extension of bitfields via negative numbits are not allowed on
byte
type. See the bitfields section below for how sign extension works.
Encoding formats
Note that if your machine architecture does not natively support int8_t
and uint8_t
types, signed zcmtype members may not decode negative numbers properly. Similarly sign
extension on bitfields may also not function properly. These are known issues and if you need
them addressed, please create an issue on
zcm's github issue page.
Primitives
Type | Encoded Size | Format |
---|---|---|
int8_t | 1 byte | X |
int16_t | 2 bytes | XX |
int32_t | 4 bytes | XXXX |
int64_t | 8 bytes | XXXXXXXX |
float | 4 bytes | XXXX |
double | 8 bytes | XXXXXXXX |
string | 4+len+1 bytes | LLLL<chars>N |
boolean | 1 byte | X |
byte | 1 byte | X |
_bitfield_ | bitpacked with neighbors | |+ |
Where:
- X is a data byte
- L is a length byte
- N is a null byte
- | is an individual bit
Bitfields
Bitfields are integer types with a specified number of bits that are bitpacked during encoding.
Neighboring bitfields will be packed tightly, wasting no bits in between (not necessarily
maintaining byte alignment). This is unlike all other type encodings which maintain byte alignment.
Bitfields currently only support big endian encoding. All bitfields will behave exactly like
their non-bitfield type in all regards other than encoding and decoding. Sign extension is
configurable by specifying a negative sign before the number of bits in the bitfield. When
encoding a type that contains an int8_t:3
with the value set to 0b111
, you should expect the
decoded message to contain a 7
as the value of this variable. A type with an int8_t:-3
with
the value set to 0b111
will have its sign extended upon decode. You should expect the received
value to be -1
(0b11111111
).
byte
is unsigned for any language that supports unsigned types. When encoding a type that
contains a byte:3
with the value set to 0b111
, you should expect the decoded message to
contain a 7
as the value of this variable for languages that support unsigned types.
For languages that do not support unsigned types (ahem java...) you should still expect the
decoded message to contain a 7
as the value of this variable. However, for a type containing
a byte:8
with the value set to 0xff
, you should expect the decoded message to contain a
255
for languages that support unsigned types and a -1
for languages that do not.
Array Types
Array types are encoded as a simple series of the element type. The encoding does NOT include a length field for the dimensions. For static array dimensions, the size is already known by the decoder. For dynamic array dimensions, the size is encoded in another field (as mandated by the grammar). For these reasons, there is zero encoding overhead for arrays. This includes nested types.
Recursive/Nested Types
Nested types are also encoded with zero overhead. Since the decoder knows the layout, there is no reason to encode type metadata. Circular type dependencies are not currently supported.
Type Hashes
Note: Announcement on membername hashing found here
The optimized encoding formats specified above are made possible using a type hash. Each encoded message starts with a 64-bit hash field. As seen above, for one message, this is the only size overhead in ZCM Type encodings. Without the hash, the encoded data is at maximum the same size as an equivalent C struct. Further, the hash is a unique type identifier. The hash allows a decoder function to verify that a binary blob of data is encoded as expected.
To acheive this lofty goal, it is crucial to get the type hash computation right. We must ensure that that a hash uniquely identifies a type layout. The hash is not intended to be cryptographic, but instead to catch programming and configuration errors.
Hashing primatives:
i64 hashbyte(i64 hash, byte v)
{
return ((((u64)hash)<<8) ^ (((u64)hash)>>53)) + v;
}
i64 hashstring(i64 hash, string s)
{
hashbyte(s.length);
for (b in s)
hashbyte(b);
}
Hashing zcmtypes:
i64 hashtype()
{
i64 hash = 0x12345678;
if (HASH_TYPENAME)
hash = hashstring(hash, zcmtype_name);
for (fld in fields) {
if (HASH_MEMBER_NAMES)
hash = hashstring(hash, fld.name);
// Hash the type (only if its a primative)
if (isPrimativeType(fld.typename))
hash = hashstring(hash, fld.typename);
// Hash the array dimmensionality
hash = hashbyte(hash, fld.numdims)
for (dim in fld.dimlist) {
hash = hashbyte(hash, dim.mode); // static (0) or dynamic (1)
hash = hashstring(hash, dim.size); // the text btwn [] from the .zcm file
}
}
}
The hashing function above works well, but an observent reader will quickly notice that it completely ignores nested zcmtypes. This is done because zcmtypes may be defined in different files and thus, the type generator may not have access to their definitions. To resolve this, ZCM defers the final hash computation until runtime, when it can use all dependent types.
The final hash computation will be triggered on a type's first runtime use and will recurse into nested types as needed. The hash code computed above in hashtype() is typically called the base hash because it's used as the starting point in the recursive-nested hash computation. The recursive computation is fairly simple. The algorithm proceeds as follows:
i64 TYPE_hash_recursive()
{
u64 hash = BASE_HASH;
+ SUBTYPE1_hash_recursive()
+ SUBTYPE2_hash_recursive()
+ SUBTYPE3_hash_recursive()
...;
return ROTL(hash, 1); // rotate left by 1
}
Packages
Zcmgen allows the user to specify the package of the zcmtype which will then be used on a
language-by-language bases to group types into namespaces, modules, etc. The semantics for
specifying the package are as shown in the example below, which constructs a type bar
within the package foo
. Note that the specified package can actually be multiple nested
packages, ie replacing foo
with foo1.foo2
would instead place the type bar
within
the package foo2
which itself is within the package foo1
.
package foo;
struct bar {
baz b;
.qux q;
};
When a type belongs to a package, all nonprimitive types within that type are assumed to
also be from that package. In the example above, the zcmtype foo.bar
contains a member
b
of type foo.baz
(ie the package foo
is automatically prepended to the specified
type baz
because the zcmtype bar
is from the package foo
). Should the user wish
to specify a type that does not belong to the same package as the containing type, they
can prepend the type with a .
as in the case of the member q
from the example, which
will not belong to any package. This also allows the user to specify a member type from a
completely separate package by prepending a leading .
before the package. For instance,
if the zcmtype qux
actually belonged to a package quuz
(that is not part of foo
),
replacing .qux
with .quuz.qux
would properly specify the desired type.
Note also that although some languages allow unqualified access to types from
parent packages, the zcmtype specification does not. Specifically, for the following
2 types, note that t2
must specify its t1
member as existing within the package
.foo
even though t2
itself exists within a child package of foo
.
package foo;
struct t1 {
int8_t a;
};
package foo.bar;
struct t2 {
.foo.t1 b;
};
Back Home