Feels like a CBOR ad to me. I agree that most techs are familiar with XML and JSON, but calling CBOR a "pivotal data format" is a stretch compared to Protobuf, Parquet, Avro, Cap'n Proto, and many others: https://en.m.wikipedia.org/wiki/Comparison_of_data-serializa...
The fact that the long article misses to make the historical/continuation link to MessagePack is by itself a red flag signalling a CBOR ad.
Edit: OK, actually there is a separate page for alternatives: https://cborbook.com/introduction/cbor_vs_the_other_guys.htm...
Notably missing is a comparison to Cap'n Proto, which to me feels like the best set of tradeoffs for more binary interchange needs.
I honestly wonder sometimes if it's held back by the name— I love the campiness of it, but I feel like it could be a barrier to being taken seriously in some environments.
Doesn't Cap'n Proto require the receiver to know the types for proper decoding? This wouldn't entirely disqualify it from comparison, since e.g. protobufs are that way as well, but they make it less interesting for comparing to CBOR, which is type-tagged.
There's quite a few formats that are self-describing already, so having a format that can skip the type and key tagging for that extra little bit of compactness and decoding efficiency is a unique selling point.
There's also nothing stopping you from serializing unstructured data using an array of key/value structs, with a union for the value to allow for different value types (int/float/string/object/etc), although it probably wouldn't be as efficient as something like CBOR for that purpose. It could make sense if most of the data is well-defined but you want to add additional properties/metadata.
Many languages take unstructured data like JSON and parse them into a strongly-typed class (throwing validation errors if it doesn't map correctly) anyways, so having a predefined schema is not entirely a bad thing. It does make you think a bit harder about backwards-compatibility and versioning. It also probably works better when you own the code for both the sender and receiver, rather than for a format that anyone can use.
Finally, maybe not a practical thing and something that I've never seen used in practice: in theory you could send a copy of the schema definition as a preamble to the data. If you're sending 10000 records and they all have the same fields in the same order, why waste bits/bytes tagging the key name and type for every record, when you could send a header describing the struct layout. Or if it's a large schema, you could request it from the server on demand, using an id/version/hash to check if you already have it.
In practice though, 1) you probably need to map the unknown/foreign schema into your own objects anyways, and 2) most people would just zlib compress the stream to get rid of repeated key names and call it a day. But the optimizer in me says why burn all those CPU cycles decompressing and decoding the same field names over and over. CBOR could have easily added optional support for a dictionary of key strings to the header, for applications where the keys are known ahead of time, for example. (My guess is that they didn't because it would be harder for extremely-resource-constrained microcontrollers to implement).
> I feel like it could be a barrier to being taken seriously in some environments.
Working as intended. ;)
In all seriousness: I develop Cap'n Proto to serve my own projects that use it, such at the Cloudflare Workers runtime. It is actually not my goal to see Cap'n Proto itself adopted widely. I mean, don't get me wrong, it'd be cool, but it isn't really a net benefit to me personally: maybe people will contribute useful features, but mostly they will probably just want me to review their PRs adding features I don't care about, or worse, demand I fix things I don't care about, and that's just unpaid labor for me. So mostly I'm happy with them not.
It's entirely possible that this will change in the future: like maybe we'll decide that Cap'n Proto should be a public-facing part of the Cloudflare Workers platform (rather than just an implementation detail, as it is today), in which case adoption would then benefit Workers and thus me. At the moment, though, that's not the plan.
In any case, if there's some company that fancies themselves Too Serious to use a technology with such a silly name and web site, I am perfectly happy for them to not use it! :)
Ha, I wondered if you'd comment. Yeah that's a reasonable take.
I think for me it would be less about locking out stodgy, boring companies, and perhaps instead it being an issue for emerging platforms that are themselves concerned with the optics of being "taken seriously". I'm specifically in the robotics space, and over the past ten years ROS has been basically rewritten to be based around DDS, and I know during the evaluation process for that there were prototypes kicked around that would have been more webtech type stuff, things like 0mq, protobufs, etc. In the end the decision for DDS was made on technical merits, but I still suspect that it being a thing that had preexisting traction in aerospace and especially NASA influenced that.
man I love HN lol
Have to agree. I've heard of every format you mentioned, but never heard of CBOR.
I first heard of it while developing a QR code travel passport during the Covid era... the technical specification included CBOR as part of the implementation requirement. Past this, I have not crossed path with it again...
CBOR is just a standard data format. Why would it need an ad? What are they selling here?
A lot of people (myself included) are working on tools and protocols that interoperate via CBOR. Nobody is selling CBOR itself, but I for one have a vested interest in promoting CBOR adoption (which makes it sound nefarious but in reality I just think it's a neat format, when you add canonicalization).
CBOR isn't special here, similar incentives could apply to just about any format - but JSON for example is already so ubiquitous that nobody needs to promote it.
If I adopt a technology, I probably don't want it to die out. Widespread support is generally good for all that use it.
I would agree their claim is a bit early, but I think a key difference between those you mentioned and CBOR is the stability expectation. Protobuf/Parquet/etc are usually single-source libraries/frameworks, which can be changed quite quickly, while CBOR seems to be going for a spec-first approach.