Performant serialization of numeric data
ndarray
supporttorch
, tensorflow
or other common
numeric library’s objects.NumpyArrayField
. For example:
NumpyArrayField
is a wrapper around the actual numpy array. Inside your
python code, you can work with its array
attribute:
msgpack
(with
msgpack_numpy
) to serialize the dict representation.
For Chainlet-Chainlet RPCs this is done automatically for you by enabling binary
mode of the dependency Chainlets, see
all options:
msgpack
serialization client-side:
NumpyArrayField
only needs pydantic
, no other Chains
dependencies. So you can take that implementation code in isolation and
integrate it in your client code.msgpack
and msgpack_numpy
give errors, we
know that msgpack = ">=1.0.2"
and msgpack-numpy = ">=0.4.8"
work.shape (tuple[int]), dtype (str), data_b64 (str)
. E.g.
np.ndarray.tobytes()
.
To get back to the array from the JSON string, use the model’s
model_validate_json
method.
As discussed in the beginning, this schema is not performant for numeric data
and only offered as a compatibility layer (JSON does not allow bytes) -
generally prefer the binary format.
bytes
fieldsbytes
field to a pydantic model used in a chain,
or as a plain argument to run_remote
. This can be useful to include
non-numpy data formats such as images or audio/video snippets.
In this case, the “normal” JSON representation does not work and all
involved requests or Chainlet-Chainlet-invocations must use binary mode.
The same steps as for arrays above apply: construct dicts
with bytes
values and keys corresponding to the run_remote
argument
names or the field names in the pydantic model. Then use msgpack
to
serialize and deserialize those dicts.
Don’t forget to add Content-type
headers and that response.json()
will
not work.