6 minutes reading time
While googling for binary formats I found this web page and decided to try it out. Universal Binary JSON attempts to make using JSON more efficient and tries to cover
Entries follows this format:
[type, 1-byte char]([integer numeric length])([data])
The type is required. The length and data fields are optional.
The type is a 1-byte char
| type | char | Hex | Size (bytes) |
|---|---|---|---|
| NULL | Z | 0x5A | 1 |
| no-op | N | 0x4E | 1 |
| true | T | 0x54 | 1 |
| false | F | 0x46 | 1 |
| int8 | i | 0x69 | 2 |
| uint8 | U | 0x55 | 2 |
| int16 | I | 0x49 | 3 |
| int32 | l | 0x6C | 5 |
| int64 | L | 0x4C | 9 |
| float32 | d | 0x64 | 5 |
| float64 | D | 0x44 | 9 |
| high-precision number | H | 0x48 | 1 + int num value + string bytes |
| char | C | 0x43 | 2 |
| string | S | 0x53 | 1 + int num value + string bytes |
| array | [] | 0x5B, 0x5D | 2 + bytes |
| object | {} | 0x7B, 0x7D | 2 + bytes |
Note the size in bytes. The 'value' of NULL, No-Op, true, and false types are implicit. There's no need to specify a value in addition to the type. Other types do require both a byte to specify the type and one or more bytes to provide the value. So an 8-bit value (signed, unsigned, or char) takes two bytes instead of one. One byte for the 'i', 'C', or 'U' and another for the value. A 16 bit integer takes 3 bytes. One for the type 'I' and two for the value. It's an increase over packing binary data in a known order, but compared to plain old JSON text, we're likely saving time and probably saving space.
Interestingly, there doesn't appear to be an unsigned version of the larger integers. Not in 1.0 at least. Version 2.0 does allow for larger unsigned values.
Strings require the length (as an integer) and the individual bytes in the string
[S][i][6][ubjson]
The docs use arrays of int8s to specify the key name without using the 'S'.
[[]
[i][7][key-name][S][i][5][value]
[]]
Arrays and objects can have two type characters each. A starting char ([ or {) and an ending char (] or }).
be further optimized by type and count
| type | char | bytes | arg type | example |
|---|---|---|---|---|
| type | $ | 1 | value type or container | [$][S] type strings |
| count | # | 1 | Integer | [#][i][64] type strings |
No doubt simply packing bytes in a known order will be more efficient space and computation wise, but considerably less flexible.
The space overhead compared to packed bytes isn't great. The array and object types look like they'll provide the best performance for my use case. I have to send objects with arrays of float32 values from an image analysis backend to a GUI and data processor at runtime. The values re://ubjson.org/2-0/type-reference/integers/turned from the backend
There are a number of libraries for the languages used by my projects (C, C++, and Java). Right now I'm most interested in finding a C++ implementation.
Unfortunately protoc and UbjsonCpp appear to be inactive. So I'll try out jsoncons.
Jsoncons appears to be a general purpose JSON library that also happens to have a UBJSON implementation. It looks pretty well liked (509 stars at the time of this writing) and is still active.
jsoncons is a C++ header-only library for constructing JSON and JSON-like data formats.
Played around with it a bit
As a simple test, I might need to send the hit location on a flat object. This could be a gaze location on a tablet or the intersection of a "ray" cast form a light-gun to an flat panel display. I could send a X/Y intersection, Hit flag and object name. In regular JSON it would look like this.
jsoncons provides a parse function that can take a string and transform it into an object
// basic, send x, y intersection, hit flag and object name
ojson json1 = ;
Once we have this object, it can be encoded as UBJSON with ubjson::encode_ubjson
// Encode a basic_json value to UBJSON
std::vector<uint8_t> jdata1;
;
...
// output the the original json
json jsonFromString = ubjson::decode_ubjson<json>;
std::cout << "UBJSON\n" << << "\n\n";
The following code
using namespace jsoncons;
int
Gives this output
{
}
The above JSON object can be represented as Universal Binary JSON with
[{]
[i][1][x][d][160.0]
[i][1][y][d][112.5]
[i][3][hit][T]
[i][6][object][S][i][11][Main Screen]
[}]
As a JSON object, you start the UBJSON with { and optionally end it with }
[{]
...
[}]
The actual key value pairs are pretty straight forward
"x": 160.0,
We can represent 'x' as a single char. Which is int8 (or uint8). I'm going with int8 to match the examples in the UBJSON page. We'd tell the parser that the key is a single character long with
[i][1]
in hex
0x69, 0x01,
followed by the 'x' character
[i][1][x]
or 0x58 in hex
0x69, 0x01, 0x58, // [x] key is 'x'
The actual value is a 32-bit float, which is indicated by type 'd' followed by the big-endian representation of 160
[d][160.0]
in hex
0x64, // [d] type i float32
0x43, 0x20, 0x00, 0x00, // [160.0] value: the big endian bytes for 160.0
The Y-intersect is similar
The boolean key-value pair indicating a 'hit' is even easier
[i][3][hit][T]
You start the key with 'i' followed by 3 for the length of the name in hex
0x69, 0x03,
followed by the 'h', 'i', and 't' chars
[hit]
which is just
0x68,0x69,0x74
The value is represented by the type T and needs no extra data
[T]
which is just
0x54,
string 'Main Screen' is 11 chars long so we represent the value as
[S][i][11][Main Screen]
in hex
0x53, 0x69, 0x0b, // [S][i][11] String with int8 indiciating 11 chars
0x4D, 0x61, 0x69, 0x6E, 0x20, 0x53, 0x63, 0x72, 0x65, 0x65, 0x6E,// [Main Screen] value is 'main Screen'B
In C++ we can stitch the UBJSON together by hand like this:
std::vector<uint8_t> ubjsonData =
;
// print object
json json2 = ubjson::decode_ubjson<json>;
std::cout << "Hand Made UBJSON\n" << << "\n\n";
which gives me the original JSON
{
}
One thing to note. I printed the hex values of the UBJSON in my first example. This doesn't exactly match up with my hand stitched code
0x7b 0x23 0x55 0x04 0x55 0x01 0x78 0x64
[{] [#] [U] [4] [U] [1] [x] [d]
vs
0x7b 0x69 0x01 0x78 0x64 0x43 0x20 0x00 0x00
[{] [i] [1] [x] [d] [160]
The call to ubjson::encode_ubjson() is adding an optimization to tell the decoder to expect 4 key/value pairs [#][U][4] and using uint8 ([U]) instead of int8 ([i]) to indicate the number of characters in the key 'x'.
Writing this out by hand is pretty painful, so we wouldn't normally do this. OTOH, if know the length of of the data we're using, so can just put values into the vector as needed.
If I'm going to be doing byte operations, there's not much point in using a library. This is a lot of work though. Luckily JSONCONS is has some functions and macros to make things easier.
You just have to
ubjson::encode_ubjson() to encode a class instance into a UBJSON blobubjson::decode_ubjson() to decode UBJSON into an instance of a classIn my case I've got this class
JSONCONS_ALL_CTOR_GETTER_TRAITS macro to prep the class for use with JSONCONS
the constructor and and const getter methods seem to be necessary for the macro to work.
ubjson::encode_ubjson() to encode a class instance into a UBJSON blobstd::vector<uint8_t> convertedClassData;
;
ubjson::decode_ubjson() to decode UBJSON into an instance of a classcodepala::CPSegmentData segData2 = ubjson::decode_ubjson<codepala::CPSegmentData>;
So this
using namespace jsoncons;
int
gives me this output
{
}