27 Jul 2022

Using UBJSON and JSONCons

6 minutes reading time

#Overview

While googling for binary formats I found this web page and decided to try it out. Universal Binary JSON attempts to make using JSON more efficient and tries to cover

Universal compatibility
Ease of use
Speed and efficiency

Entries follows this format:

[type, 1-byte char]([integer numeric length])([data])

The type is required. The length and data fields are optional.

The type is a 1-byte char

type	char	Hex	Size (bytes)
NULL	Z	0x5A	1
no-op	N	0x4E	1
true	T	0x54	1
false	F	0x46	1
int8	i	0x69	2
uint8	U	0x55	2
int16	I	0x49	3
int32	l	0x6C	5
int64	L	0x4C	9
float32	d	0x64	5
float64	D	0x44	9
high-precision number	H	0x48	1 + int num value + string bytes
char	C	0x43	2
string	S	0x53	1 + int num value + string bytes
array	[]	0x5B, 0x5D	2 + bytes
object	{}	0x7B, 0x7D	2 + bytes

Note the size in bytes. The 'value' of NULL, No-Op, true, and false types are implicit. There's no need to specify a value in addition to the type. Other types do require both a byte to specify the type and one or more bytes to provide the value. So an 8-bit value (signed, unsigned, or char) takes two bytes instead of one. One byte for the 'i', 'C', or 'U' and another for the value. A 16 bit integer takes 3 bytes. One for the type 'I' and two for the value. It's an increase over packing binary data in a known order, but compared to plain old JSON text, we're likely saving time and probably saving space.

Interestingly, there doesn't appear to be an unsigned version of the larger integers. Not in 1.0 at least. Version 2.0 does allow for larger unsigned values.

Strings require the length (as an integer) and the individual bytes in the string

[S][i][6][ubjson]

The docs use arrays of int8s to specify the key name without using the 'S'.

[[]
  [i][7][key-name][S][i][5][value]
[]]

Arrays and objects can have two type characters each. A starting char ([ or {) and an ending char (] or }).

be further optimized by type and count

type	char	bytes	arg type	example
type	$	1	value type or container	[$][S] type strings
count	#	1	Integer	[#][i][64] type strings

No doubt simply packing bytes in a known order will be more efficient space and computation wise, but considerably less flexible.

The space overhead compared to packed bytes isn't great. The array and object types look like they'll provide the best performance for my use case. I have to send objects with arrays of float32 values from an image analysis backend to a GUI and data processor at runtime. The values re://ubjson.org/2-0/type-reference/integers/turned from the backend

#C++ Implementations

There are a number of libraries for the languages used by my projects (C, C++, and Java). Right now I'm most interested in finding a C++ implementation.

Unfortunately protoc and UbjsonCpp appear to be inactive. So I'll try out jsoncons.

Jsoncons appears to be a general purpose JSON library that also happens to have a UBJSON implementation. It looks pretty well liked (509 stars at the time of this writing) and is still active.

#JSONCONS

jsoncons is a C++ header-only library for constructing JSON and JSON-like data formats.
Played around with it a bit

#Transforming JSON Strings Into UBJSON

As a simple test, I might need to send the hit location on a flat object. This could be a gaze location on a tablet or the intersection of a "ray" cast form a light-gun to an flat panel display. I could send a X/Y intersection, Hit flag and object name. In regular JSON it would look like this.

{
  "x": 160.0,
  "y": 112.5,
  "hit": true,
  "object": "Main Screen"
}

jsoncons provides a parse function that can take a string and transform it into an object

// basic, send x, y intersection, hit flag and object name
  ojson json1 = ojson::parse(R"(
{
        "x": 160.0,
        "y": 112.5,
        "hit": true,
        "object": "Main Screen"
}
    )");

Once we have this object, it can be encoded as UBJSON with ubjson::encode_ubjson

  // Encode a basic_json value to UBJSON
  std::vector<uint8_t> jdata1;
  ubjson::encode_ubjson(json1, jdata1);

...

  // output the the original json
  json jsonFromString = ubjson::decode_ubjson<json>(jdata1);
  std::cout << "UBJSON\n" << pretty_print(jsonFromString) << "\n\n";

The following code

#include <iomanip>
#include <cassert>
#include <iostream>
#include "jsoncons/json.hpp"
#include "jsoncons_ext/ubjson/ubjson.hpp"
#include "jsoncons_ext/jsonpath/jsonpath.hpp"
using namespace jsoncons;


int main(int argc, char* argv[]) {
  ojson json1 = ojson::parse(R"(
{
        "x": 160.0,
        "y": 112.5,
        "hit": true,
        "object": "Main Screen"
}
    )");

  // Encode a basic_json value to UBJSON
  std::vector<uint8_t> jdata1;
  ubjson::encode_ubjson(json1, jdata1);
  for (int i=0; i < jdata1.size(); ++i ) {
    std::cout << "0x" << std::setfill('0') << std::setw(2) << std::hex << (int)jdata1[i];
    if( (i+1) % 8 == 0 ) {
      std::cout << "\n";
    } else {
      std::cout << " ";
    }
  }
  json jsonFromString = ubjson::decode_ubjson<json>(jdata1);
  std::cout << "UBJSON\n" << pretty_print(jsonFromString) << "\n\n";


}

Gives this output

$ ./a.out
0x7b 0x23 0x55 0x04 0x55 0x01 0x78 0x64
0x43 0x20 0x00 0x00 0x55 0x01 0x79 0x64
0x42 0xe1 0x00 0x00 0x55 0x03 0x68 0x69
0x74 0x54 0x55 0x06 0x6f 0x62 0x6a 0x65
0x63 0x74 0x53 0x55 0x0b 0x4d 0x61 0x69
0x6e 0x20 0x53 0x63 0x72 0x65 0x65 0x6e
BJSON----------------
{
    "hit": true,
    "object": "Main Screen",
    "x": 160.0,
    "y": 112.5
}

#UBJSON

The above JSON object can be represented as Universal Binary JSON with

[{]
  [i][1][x][d][160.0]
  [i][1][y][d][112.5]
  [i][3][hit][T]
  [i][6][object][S][i][11][Main Screen]
[}]

As a JSON object, you start the UBJSON with { and optionally end it with }

[{]
...
[}]

The actual key value pairs are pretty straight forward

"x" is the the key for the x-coordinate of the intersection and 160.0 is the value

  "x": 160.0,

We can represent 'x' as a single char. Which is int8 (or uint8). I'm going with int8 to match the examples in the UBJSON page. We'd tell the parser that the key is a single character long with

[i][1]

in hex

0x69, 0x01,

followed by the 'x' character

[i][1][x]

or 0x58 in hex

0x69, 0x01, 0x58,                           // [x]   key is 'x'

The actual value is a 32-bit float, which is indicated by type 'd' followed by the big-endian representation of 160

[d][160.0]

in hex

0x64,                           // [d]   type i float32
0x43, 0x20, 0x00, 0x00,         // [160.0]  value:  the big endian bytes for 160.0

The Y-intersect is similar

The boolean key-value pair indicating a 'hit' is even easier

[i][3][hit][T]

You start the key with 'i' followed by 3 for the length of the name in hex

0x69, 0x03,

followed by the 'h', 'i', and 't' chars

[hit]

which is just

0x68,0x69,0x74

The value is represented by the type T and needs no extra data

[T]

which is just

0x54,

string 'Main Screen' is 11 chars long so we represent the value as

  [S][i][11][Main Screen]

in hex

    0x53, 0x69, 0x0b,          // [S][i][11]   String with int8 indiciating 11 chars
    0x4D, 0x61, 0x69, 0x6E, 0x20, 0x53, 0x63, 0x72, 0x65, 0x65, 0x6E,// [Main Screen]   value is 'main Screen'B

In C++ we can stitch the UBJSON together by hand like this:

std::vector<uint8_t> ubjsonData = 
{ 0x7b,                                // [{] starts the object

  0x69, 0x01,                          // [i][1]  int8, 1   indicate a single char for key
  0x78,                                // [x]   key is 'x'
  0x64,                                // [d]   type i float32
  0x43, 0x20, 0x00, 0x00,              // [160.0]  value:  the big endian bytes for 160.0
                                       
  0x69, 0x01,                          // [i][1]     int8 1 indicates a single char for key
  0x79,                                // [y]   key is 'y'
  0x64,                                // [d]   type is float32
  0x42, 0xe1, 0x00, 0x00,              // 112.0   value:  the big endian bytes for 112.5
                                       
  0x55, 0x03,                          // [i][3]  int8, 1   indicate 3 chars follows
  0x68,0x69,0x74,                      // [hit]   Key is 'hit'
  0x54,                                // [T]    value is  TRUE

  0x69, 0x06,                          // [i][6]   int8, 1   indicate 6 chars for key
  0x6f, 0x62, 0x6a, 0x65, 0x63, 0x74,  // [object]    key is 'object'
  0x53, 0x69, 0x0b,                    // [S][i][11]   String with int8 indiciating 11 chars
  
  0x4D, 0x61, 0x69, 0x6E, 0x20, 0x53,  // [Main Screen]   value is 'main Screen'
  0x63, 0x72, 0x65, 0x65, 0x6E,
  0x7d                          // } end the object
};

// print object
json json2 = ubjson::decode_ubjson<json>(ubjsonData); 
std::cout << "Hand Made UBJSON\n" << pretty_print(json2) << "\n\n";

which gives me the original JSON

$ ./a.out
Hand Made UBJSON
{
    "hit": true,
    "object": "Main Screen",
    "x": 160.0,
    "y": 112.5
}

One thing to note. I printed the hex values of the UBJSON in my first example. This doesn't exactly match up with my hand stitched code

0x7b 0x23 0x55 0x04 0x55 0x01 0x78 0x64
[{]  [#]  [U]  [4]  [U]  [1]  [x]  [d]

0x7b 0x69 0x01 0x78 0x64 0x43 0x20 0x00 0x00
[{]  [i]  [1]  [x]  [d]  [160]

The call to ubjson::encode_ubjson() is adding an optimization to tell the decoder to expect 4 key/value pairs [#][U][4] and using uint8 ([U]) instead of int8 ([i]) to indicate the number of characters in the key 'x'.

Writing this out by hand is pretty painful, so we wouldn't normally do this. OTOH, if know the length of of the data we're using, so can just put values into the vector as needed.

#Converting a Class into a UBJSON data

If I'm going to be doing byte operations, there's not much point in using a library. This is a lot of work though. Luckily JSONCONS is has some functions and macros to make things easier.

You just have to

Define the class
Use the macros to access the CTOR and getter methods
Use ubjson::encode_ubjson() to encode a class instance into a UBJSON blob
Use ubjson::decode_ubjson() to decode UBJSON into an instance of a class

In my case I've got this class

namespace codepala {

  class CPSegmentData {
  public:

    // ctor for JSONCONS macro
    FXEyeData(const std::string& version,
      const std::vector<float>& landmarks,
      const std::vector<int>& edges,
      const std::string& classification,
      const double& confidence
   ) :
      m_version(version),
      m_landmarks(landmarks),
      m_edges(edges),
      m_classification(classification),
      m_confidence(confidence)
    {
    }

    // getters for JSONCONS macro
    const std::string& version() const { return m_version; };
    const std::vector< float >&  landmarks() const { return m_landmarks; };
    const std::vector< int >&  edges() const { return m_edges; };
    const std::string& classification() const { return m_classification; };
    const std::double& confidence() const { return m_confidence; };

  private:
    std::string m_version;
    std::vector< float > m_landmarks;
    std::vector< int > m_edges;
    std::string m_classification;                                                                                        double m_confidence;                                                                                             
  };
}

Use the JSONCONS_ALL_CTOR_GETTER_TRAITS macro to prep the class for use with JSONCONS

JSONCONS_ALL_CTOR_GETTER_TRAITS(codepala::CPSegmentData, version, landmarks, edges, classification, confidence )

the constructor and and const getter methods seem to be necessary for the macro to work.

Use ubjson::encode_ubjson() to encode a class instance into a UBJSON blob

std::vector<uint8_t> convertedClassData;
ubjson::encode_ubjson(segData, convertedClassData);

Use ubjson::decode_ubjson() to decode UBJSON into an instance of a class

codepala::CPSegmentData segData2 = ubjson::decode_ubjson<codepala::CPSegmentData>(convertedClassData);

So this

#include <iomanip>
#include <cassert>
#include <iostream>
#include "jsoncons/json.hpp"
#include "jsoncons_ext/ubjson/ubjson.hpp"
#include "jsoncons_ext/jsonpath/jsonpath.hpp"
using namespace jsoncons;

namespace codepala {
  class CPSegmentData {
    public:
      CPSegmentData( const std::string& version, const size_t& count, const std::string& classification, const double& confidence ) :
        m_version(version),
        m_classification(classification),
        m_confidence(confidence)
    {
      m_landmarks.resize(count);
      m_edges.resize(count);
    }
      CPSegmentData(const std::string& version,
          const std::vector<float>& landmarks,
          const std::vector<int>& edges,
          const std::string& classification,
          const double& confidence
          ) :
        m_version(version),
        m_landmarks(landmarks),
        m_edges(edges),
        m_classification(classification),
        m_confidence(confidence)
    {
    }

      // getters for JSONCON
      const std::string& version() const { return m_version; };
      const std::vector< float >&  landmarks() const { return m_landmarks; };
      const std::vector< int >&  edges() const { return m_edges; };
      const std::string& classification() const { return m_classification; };
      const double& confidence() const { return m_confidence; };

      void doSomething() {
        for( int i=0; i < m_landmarks.size(); ++i ) {
          m_landmarks[i] = float(i) * 3.0f;
          m_edges[i] = i * 2;
        }
      }
    private:
      std::string m_version;
      std::vector< float > m_landmarks;
      std::vector< int > m_edges;
      std::string m_classification;
      double m_confidence;

  };
}

JSONCONS_ALL_CTOR_GETTER_TRAITS(codepala::CPSegmentData, version, landmarks, edges, classification, confidence)



int main(int argc, char* argv[]) {

  codepala::CPSegmentData segData("test", 50, "dog", 0.56);
  segData.doSomething();

  // encode the class into UBJSON
  std::vector<uint8_t> convertedClassData;
  ubjson::encode_ubjson(segData, convertedClassData);

  // print UBJSON
  jsoncons::json segJSON = ubjson::decode_ubjson<json>(convertedClassData);
  std::cout << "Class Instance to UBJSON\n" << pretty_print(segJSON) << "\n\n";


  // decode it
  codepala::CPSegmentData segData2 = ubjson::decode_ubjson<codepala::CPSegmentData>(convertedClassData);
  std::cout << "version: " << segData2.version() << "\n";
  std::cout << "classification: " << segData2.classification() << "\n";
  std::cout << "confidence: " << segData2.confidence() << "\n";

  std::cout << "landmarks\n";
  for( const auto& iter: segData2.landmarks() ) {
    std::cout << iter << " ";
  }

  std::cout << "\nedges\n";
  for( const auto& iter: segData2.edges() ) {
    std::cout << iter << " ";
  }



}

gives me this output

$ ./a.out
Class Instance to UBJSON
{
    "classification": "dog",
    "confidence": 0.56,
    "edges": [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54,
        56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98
    ],
    "landmarks": [0.0, 3.0, 6.0, 9.0, 12.0, 15.0, 18.0, 21.0, 24.0, 27.0, 30.0, 33.0, 36.0, 39.0, 42.0, 45.0, 48.0, 51.0, 54.0, 57.0, 60.0, 63.0, 66.0, 69.0, 72.0, 75.0, 78.0, 81.0, 84.0, 87.0, 90.0, 93.0, 96.0, 99.0, 102.0, 105.0, 108.0, 111.0, 114.0, 117.0, 120.0, 123.0, 126.0, 129.0, 132.0, 135.0, 138.0, 141.0, 144.0, 147.0],
    "version": "test"
}

version: test
classification: dog
confidence: 0.56
landmarks
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147
edges
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98