jm + protobuf   11

Unbundling Pokémon Go
tl;dr: on Android, it's a Unity app, talking HTTPS to the backend, using protobuf over HTTP. Interesting notes about the use of certificate pinning and how they should be doing that
https  http  protobuf  pokemon-go  pokemon  apps  android  reversing 
july 2016 by jm
Schema evolution in Avro, Protocol Buffers and Thrift
Good description of this key feature of decent serialization formats
avro  thrift  protobuf  schemas  serialization  coding  interop  compatibility 
january 2016 by jm
Release Protocol Buffers v3.0.0-alpha-2 · google/protobuf
New major-version track for protobuf, with some interesting new features:

Removal of field presence logic for primitive value fields, removal of required fields, and removal of default values. This makes proto3 significantly easier to implement with open struct representations, as in languages like Android Java, Objective C, or Go.
Removal of unknown fields.
Removal of extensions, which are instead replaced by a new standard type called Any.
Fix semantics for unknown enum values.
Addition of maps.
Addition of a small set of standard types for representation of time, dynamic data, etc.
A well-defined encoding in JSON as an alternative to binary proto encoding.
protobuf  binary  marshalling  serialization  google  grpc  proto3  coding  open-source 
february 2015 by jm
F1: A Distributed SQL Database That Scales
Beyond the interesting-enough stuff about scalability in a distributed SQL store, there's this really nifty point about avoiding the horrors of the SQL/ORM impedance mismatch:
At Google, Protocol Buffers are ubiquitous for data storage and interchange between applications. When we still had a MySQL schema, users often had to write tedious and error-prone transformations between database rows and in-memory data structures. Putting protocol buffers in the schema removes this impedance mismatch and gives users a universal data structure they can use both in the database and in application code…. Protocol Buffer columns are more natural and reduce semantic complexity for users, who can now read and write their logical business objects as atomic units, without having to think about materializing them using joins across several tables.

This is something that pretty much any store can already adopt. Go protobufs. (or Avro, etc.)

Also, I find this really neat, and I hope this idea is implemented elsewhere soon: asynchronous schema updates:

Schema changes are applied asynchronously on multiple F1 servers. Anomalies are prevented by the use of a schema leasing mechanism with support for only current and next schema versions; and by subdividing schema changes into multiple phases where consecutive pairs of changes are mutually compatible and cannot cause anomalies.
schema  sql  f1  google  papers  orm  protobuf 
january 2015 by jm
The problem of managing schemas
Good post on the pain of using CSV/JSON as a data interchange format:
eventually, the schema changes. Someone refactors the code generating the JSON and moves fields around, perhaps renaming few fields. The DBA added new columns to a MySQL table and this reflects in the CSVs dumped from the table. Now all those applications and scripts must be modified to handle both file formats. And since schema changes happen frequently, and often without warning, this results in both ugly and unmaintainable code, and in grumpy developers who are tired of having to modify their scripts again and again.
schema  json  avro  protobuf  csv  data-formats  interchange  data  hadoop  files  file-formats 
november 2014 by jm
Cap'n Proto, FlatBuffers, and SBE
a feature comparison of these new serialization formats from Kenton, the capnp dude
serialization  protobuf  capnproto  sbe  flatbuffers  google  coding  storage 
june 2014 by jm
FlatBuffers: Main Page
A new serialization format from Google's Android gaming team, supporting C++ and Java, open source under the ASL v2. Reasons to use it:
Access to serialized data without parsing/unpacking - What sets FlatBuffers apart is that it represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility).
Memory efficiency and speed - The only memory needed to access your data is that of the buffer. It requires 0 additional allocations. FlatBuffers is also very suitable for use with mmap (or streaming), requiring only part of the buffer to be in memory. Access is close to the speed of raw struct access with only one extra indirection (a kind of vtable) to allow for format evolution and optional fields. It is aimed at projects where spending time and space (many memory allocations) to be able to access or construct serialized data is undesirable, such as in games or any other performance sensitive applications. See the benchmarks for details.
Flexible - Optional fields means not only do you get great forwards and backwards compatibility (increasingly important for long-lived games: don't have to update all data with each new version!). It also means you have a lot of choice in what data you write and what data you don't, and how you design data structures.
Tiny code footprint - Small amounts of generated code, and just a single small header as the minimum dependency, which is very easy to integrate. Again, see the benchmark section for details.
Strongly typed - Errors happen at compile time rather than manually having to write repetitive and error prone run-time checks. Useful code can be generated for you.
Convenient to use - Generated C++ code allows for terse access & construction code. Then there's optional functionality for parsing schemas and JSON-like text representations at runtime efficiently if needed (faster and more memory efficient than other JSON parsers).

Looks nice, but it misses the language coverage of protobuf. Definitely more practical than capnproto.
c++  google  java  serialization  json  formats  protobuf  capnproto  storage  flatbuffers 
june 2014 by jm
Simple Binary Encoding
an OSI layer 6 presentation for encoding/decoding messages in binary format to support low-latency applications. [...] SBE follows a number of design principles to achieve this goal. By adhering to these design principles sometimes means features available in other codecs will not being offered. For example, many codecs allow strings to be encoded at any field position in a message; SBE only allows variable length fields, such as strings, as fields grouped at the end of a message.

The SBE reference implementation consists of a compiler that takes a message schema as input and then generates language specific stubs. The stubs are used to directly encode and decode messages from buffers. The SBE tool can also generate a binary representation of the schema that can be used for the on-the-fly decoding of messages in a dynamic environment, such as for a log viewer or network sniffer.

The design principles drive the implementation of a codec that ensures messages are streamed through memory without backtracking, copying, or unnecessary allocation. Memory access patterns should not be underestimated in the design of a high-performance application. Low-latency systems in any language especially need to consider all allocation to avoid the resulting issues in reclamation. This applies for both managed runtime and native languages. SBE is totally allocation free in all three language implementations.

The end result of applying these design principles is a codec that has ~25X greater throughput than Google Protocol Buffers (GPB) with very low and predictable latency. This has been observed in micro-benchmarks and real-world application use. A typical market data message can be encoded, or decoded, in ~25ns compared to ~1000ns for the same message with GPB on the same hardware. XML and FIX tag value messages are orders of magnitude slower again.

The sweet spot for SBE is as a codec for structured data that is mostly fixed size fields which are numbers, bitsets, enums, and arrays. While it does work for strings and blobs, many my find some of the restrictions a usability issue. These users would be better off with another codec more suited to string encoding.
sbe  encoding  protobuf  protocol-buffers  json  messages  messaging  binary  formats  low-latency  martin-thompson  xml 
may 2014 by jm
A sane Google Protocol Buffers library for Ruby. It's all about being Buf; ProtoBuf.
protobuf  google  protocol-buffers  ruby  coding  libraries  gems  open-source 
april 2014 by jm
"Dapper, a Large-Scale Distributed Systems Tracing Infrastructure" [PDF]
Google paper describing the infrastructure they've built for cross-service request tracing (ie. "tracer requests"). Features: low code changes required (since they've built it into the internal protobuf libs), low performance impact, sampling, deployment across the ~entire production fleet, output visibility in minutes, and has been live in production for over 2 years. Excellent read
dapper  tracing  http  services  soa  google  papers  request-tracing  tracers  protobuf  devops 
march 2014 by jm

Copy this bookmark: