The use of textual formats such as XML, JSON or CSV which mostly depend on the consensus of the users than efficiency. These 3 formats have their weaknesses especially CSV since it doesn’t have schema but that’s not the case for XML and JSON which both have optional schema. What’s the point of having a schema anyway? Well it appears that a schema makes encoding and decoding more efficiently. That’s why binary format like Thrift, Protobuf and Avro have their own schemas.
In DDIA, Compact Thrift and Protobuf were 2 times smaller than JSON file and Avro only slightly better than those 2. But having a schema by itself means you’ll have to manage the schema say at a database, the author suggests:
A database of schema versions is a useful thing to have in any case, since it acts as documentation and gives you a chance to check schema compatibility . As the version number, you could use a simple incrementing integer, or you could use a hash of the schema.
And then you can make use of the schema database this way:
A reader can fetch a record, extract the version number, and then fetch the writer’s schema for that version number from the database. Using that writer’s schema, it can decode the rest of the record. (Espresso  works this way, for example.)