We need a unified abstraction for apps to discover each others formats

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Myself and a friend exchanged some email regarding the article “The Future of Asynchronous IO in Python”. My friend wrote:

Seems like he wants zmq/nanomsg embedded inside Python. I am not really sure why.

I responded:

I think what he is suggesting is more interesting: a single, unified way of speaking to all protocols (MySql, MongoDB, Redis, Kafka, etc). When he says “protocol” I think he means “byte format”. He points out that each app generally has its own method of taking a stream of bytes and establishing the frame for interpreting those bytes. He feels that this was a missed opportunity for ZeroMQ and NanoMsg — so long as they were reinventing sockets, why not go all the way and unify all communication protocols? I have often heard people say of ZeroMQ “This is what sockets would have been like if they were invented now, instead of in the 1970s.” Sockets were a huge breakthrough in the 1970s, one of the greatest ideas to come from Unix: a unified abstraction for sending communication among independent processes. It’s tough to do that for byte formats because every app has different needs for optimization, but I think it is an interesting idea.

Every language has its own ways of defining a format for bytes, so as to establish new protocols. In Clojure there are some fairly good libraries for this such as Gloss:

https://github.com/ztellman/gloss/wiki/Introduction

If he (the author, Paul Colomiets) only writes an interpreter for databases then he is simply re-inventing the JDBC, but for Python. What would be more interesting is if he discovers a metaphor that unifies all communication protocols in a manner that developers find useful.

Ultimately, this is something that needs to be solved at the level of the operating system. Why has it not already happened? I do find it surprising that the tech industry has expended so much effort on ontologies for Service Oriented Architecture:

http://wiki.apache.org/ws/WebServiceSpecifications

Where is the comparable effort for apps that run on one computer? We still live in an era where a unique driver is needed for all databases. Where is the single-machine equivalent of WSDL or SOAP, that would allow services to advertise their byte formats, and thus allow the other apps on the machine to adapt to them, on the fly, without needing a specific driver? Perhaps what is needed is a communication bus that abstracts some of the details, different from a queue in that it engages in an act of translation.

It’s an interesting idea.

The author writes:

Still I’ve seen no attempts to build some unified kernel for I/O in separate thread. I you know some, please let me know.

Also the pattern has similarities with recently emerging Ambassador pattern. Ambassador is a process which sits on every machine, and does service discovery, but proxies all connections through itself. I.e. every service connects to some port at localhost where Ambassador listens and forwards connection to some real service(s). Similarly I/O Kernel must do service discovery and communication with service on behalf of main thread (still protocol probably should be very different from the one ambassador using).

…But how many async libraries do service discovery in sensible way? (My answer: None of them).

…Unification
Having I/O kernel in place. Various I/O frameworks in python just support all the protocols supported by kernel. Every new protocol should be done there. This allows frameworks to compete on convenience and efficiency rather than on protocol support.

…Every protocol supported must at least be delimited into frames in C code. So that partial packets do not reach python code. Other parsing might be done in main thread directly in python objects.

Service discovery must be pluggable. With most obvious choices implemented first (e.g. polling on a DNS name).

Service discovery should be easily integrated into any protocol. In fact it must be easier to use service discovery than to omit it for protocol implementer.

Some of the stuff that he writes about sound like a wish-list for The Perfect Operating System, as some of the things he mentions are hard problems that no one has come up with a good solution for yet (find all state in all threads and report?):

Imagine you can ask Python process to get state at any time. First we always have list of requests that currently in progress. Also we can attach some marker points, like with statistics. And finally we can use a technique similar to one used in faulthandler to find out what the stack of main thread.

The key point is that there is a thread that can answer debugging requests, even when main thread does something CPU intensive, or just hangs for some reason.

And some things might be better solved by the OS:

Throttling
Even in Java and Go, where you can freely use threads there is often need for throttling number client connections. The described design allows to control number of requests in single place in the application, no matter which library is the real executor of the network request.

Since Python has a philosophy of using Unix wherever possible, perhaps this is better done using something like “ulimit”? Run the Python app as its own user and have the OS enforce a thread limit on that user.

Source