JSON Type Definitions

By Adrian Sutton

May 6, 2022

In Teku, we’ve come up with a new way to both serialize/deserialize objects to JSON and generate OpenAPI documentation from the same declarative source. This significantly reduces the amount of boiler-plate code we have to write and also avoids a lot of bugs where the generated JSON diverged from the OpenAPI.

Previously, Teku’s JSON serialization and deserialization was controlled by a set of custom classes with a bunch of annotations added to define the JSON serialization. The OpenAPI speicfication utilised those, plus a bunch more annotations to generate the OpenAPI.

As an example of how this works, let’s look at the get block root API. It returns data like:

{
    "data": {
        "root": "0xcf8e0d4e9587369b2301d0790347320302cc0943d5a1884560367e8208d920f2"
    }
}

To generate that with the old system we’d need to create two classes:

public class GetBlockRootResponse {

  @JsonProperty("data")
  public final Root data;

  @JsonCreator
  public GetBlockRootResponse(
      @JsonProperty("data") final Root data) {
    this.data = data;
  }
}

public class Root {
  @Schema(
      type = "string", 
      format = "byte", 
      description = "Bytes32 hexadecimal",
      example = "0xcf8e0d4e9587369b2301d0790347320302cc0943d5a1884560367e8208d920f2")
  public final Bytes32 root;

  @JsonCreator
  public Root(@JsonProperty("root") final Bytes32 root) {
    this.root = root;
  }
}

And then to generate the OpenAPI we’d need additional annotations on the API handler method:

@OpenApi(
    path = ROUTE,
    method = HttpMethod.GET,
    summary = "Get block root",
    tags = {TAG_BEACON},
    description = "Retrieves hashTreeRoot of BeaconBlock/BeaconBlockHeader",
    responses = {
    @OpenApiResponse(
        status = RES_OK,
        content = @OpenApiContent(from = GetBlockRootResponse.class)),
    @OpenApiResponse(status = RES_BAD_REQUEST),
    @OpenApiResponse(status = RES_NOT_FOUND),
    @OpenApiResponse(status = RES_INTERNAL_ERROR)
    })
@Override
public void handle(@NotNull final Context ctx) throws Exception { ... }

There are a number of drawbacks to this approach. Firstly, that’s quite a simple format for JSON output but we’ve had to define two separate classes to model it in Java. Secondly, the @Schema attestation on the Bytes32 root needs to include quite a few options to correctly define the byte32 object in the OpenAPI which has to be repeated for every Bytes32 field in the API. We can extract the actual string values as constants, but the annotation has to be added correctly each time. And that Bytes32 is actually an interface from a third-party library so we still need a custom serializer for it.

The biggest issue though is that the OpenAPI and the actual JSON serialization is only fairly losely tied together. There’s nothing to stop the handler from returning a response code that wasn’t listed in the OpenAPI, or even returning data that isn’t of type GetBlockRootResponse.

Fundamentally, trying to use Java classes to define a JSON data structure is just a really poor match. Layering on more annotations and more reflection to also generate the OpenAPI documentation from those classes just makes the problem worse. So we’ve come up with an alternative approach based on the idea of type definitions.

With this approach, we define a SerializableTypeDefinition which specifies how to convert a class - our reguar internal data structure - into JSON, using a declarative approach. Here’s what it looks like for the block root response:


public static final StringValueTypeDefinition<Bytes32> BYTES32_TYPE =
    DeserializableTypeDefinition.string(Bytes32.class)
        .formatter(Bytes32::toHexString)
        .parser(Bytes32::fromHexString)
        .example("0xcf8e0d4e9587369b2301d0790347320302cc0943d5a1884560367e8208d920f2")
        .description("Bytes32 hexadecimal")
        .format("byte")
        .build();

private static final SerializableTypeDefinition<Bytes32> GET_ROOT_RESPONSE =
      SerializableTypeDefinition.object(Bytes32.class)
          .withField(
              "data",
              SerializableTypeDefinition.object(Bytes32.class)
                  .withField("root", BYTES32_TYPE, Function.identity())
                  .build(),
              Function.identity())
          .build();

The first part - BYTES32_TYPE is a reusable definition for serializing a Bytes32. It’s represented as a String in the JSON, so we provide a formatter and a parser to convert a Bytes32 to a String and back. The example, description and format provide the additional documentation required for the OpenAPI output.

The second part - GET_ROOT_RESPONSE defines the two objects involved in the JSON, the outer wrapper which is an object with a single field called data. The data field in turn is a single field object with a root property. The root field is a bytes32 field so uses that reusable definition. We could extract the data object as a reusable definition too but since it’s only used here it’s simpler to define it inline.

The withField call in that definition takes three things - the name of the field, the type of the field and a getter Function which maps from the source value to the value for the field. In this case those getter functions are both just Function.identity() because the only data we need is the Bytes32 root value, so we just pass it through the layers of JSON objects unchanged until it gets used as the value for the root field and actually serialized by the formatter specified in BYTES32_TYPE. In other situations, where we have more complex data structures, the getter function could perform calculations or just be a reference to an actually get*() method in the Java object.

Those getters provide one of the key benefits of this approach because they decouple the internal Java object holding the data from the external JSON representation. That allows us to refactor the internal representation we use to best suit our code, while maintaining a stable external API. Whereas with the annotation based approach, changing the structure of the object would automatically change the structure of the resulting JSON.

That same type definition can be used to generate an OpenAPI specification for the JSON. Most of the information required for the OpenAPI is inferred from the way we defined the serialization. We know GET_ROOT_RESPONSE is an object and it has a data field, we know the type of that field etc. We can add an additional description to any of those types to provide more documentation, just like we did for BYTES32_TYPE.

One final detail here is that the above object type is a SerializableTypeDefinition so it can convert from an internal type to JSON, but it can’t parse JSON back to the internal type. We can also create a DeserializableTypeDefintion for objects which then require providing both a getter and a setter for each field (a builder can also be used, allowing for immutable types). Most of the time we only need to serialize though so there’s no need to define how to parse the type.

The final piece is to provide the definition of the actual REST endpoint - equivalent to the @OpenApi annotation in the original version:

EndpointMetadata.get(ROUTE)
    .summary("Get block root")
    .tags(TAG_BEACON)
    .description("Retrieves hashTreeRoot of BeaconBlock/BeaconBlockHeader")
    .response(SC_OK, "Request successful", GET_ROOT_RESPONSE)
    .withBadRequestResponse()
    .withNotFoundResponse()
    .withInternalErrorResponse()
    .build();

While these simple examples are fairly similar, the type definitions approach results in far less boilerplate code. That’s not the main advantage though.

The type definitions provide an explicit mapping from our internal classes to the serialization we need on the API. There’s no need to create custom classes just to define the API, or to spread Jackson annotations all through our internal code. Most importantly there’s no risk that renaming a class or field will unexpectedly change the external API.

The EndpointMetadata and type definitions also build a simple to use model of the types and the expected behaviour of the endpoint. We’ve added a very thin wrapper around Javalin (which we were already using) that checks the responses being sent actually match what is declared in the metadata (and thus in the OpenAPI generated from it). Unfortunately those checks are still occurring at runtime, but it means that unit tests will fail if the OpenAPI doesn’t match where previously it was very difficult to add tests that check the OpenAPI and responses actually match.

For Teku, there’s another big advantage - we can create type definitions straight off the SSZ schemas we already have for the various types defined in the beacon chain spec. Given the majority of responses from the standard REST API are JSON versions of the SSZ objects, we save a ton of code by not having to redefine the type for JSON. For example, the endpoint to get a validator from a state defines it’s response type as:

SerializableTypeDefinition<StateValidatorData> DATA_TYPE =
    SerializableTypeDefinition.object(StateValidatorData.class)
        .withField("index", UINT64_TYPE, StateValidatorData::getIndex)
        .withField("balance", UINT64_TYPE, StateValidatorData::getBalance)
        .withField("status", STATUS_TYPE, StateValidatorData::getStatus)
        .withField(
            "validator",
            Validator.SSZ_SCHEMA.getJsonTypeDefinition(),
            StateValidatorData::getValidator)
        .build();

The index, balance and status are additional fields that aren’t part of the SSZ definitions so have been added, but the validator field is just a JSON version of the Validator type from the spec. So rather than redefine it, we can just get the SSZ schema and ask it for it’s JSON type definition which is automatically created. And the getter (StateValidatorData::getValidator) is just returning the exact Validator instance we got from the BeaconState.

We initially adopted this new approach for the whole validator key manager API which confirmed it worked well. We’re now in the process of converting our existing beacon node API over to the new style. We’re also using parts of it to create the JSON we send to the execution layer calls and plan to use more. Eventually we’ll replace all our Jackson annotation based JSON handling over to use type definitions.