WebSockets are a new World Wide Web

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

In their book Restful Web Services the authors Leonard Richardson and Sam Ruby argue that the web is the HTTP protocol. If you are using HTTP, then you are using the web, and if you are not using HTTP, then you are not using the web.

So what should we call the network dominated by Websockets? This is not HTTP, does that mean we are no longer talking about the web?

HTTP was invented in 1989 and has done well for the last 22 years. But we are entering a new era. It seems likely that HTTP will face a slow eclipse, an out-dated protocol. It seems like that the future belongs to WebSockets. And what kind of network will be built with that?

Some critical voices:

The WebSocket protocol design was hijacked by architecture astronauts who decided that it must have all of these extra features added, instead of remaining a simple, easily implementable and understandable protocol. The original WebSocket protocol was a simple stream of delimited messages, with the only complexity being in the handshake that was necessary to ensure that JavaScript apps couldn’t send arbitrary data to arbitrary ports without permission.

The problem is that the original handshake wasn’t good enough (there were still security vulnerabilities despite he handshake), and when Ian Hickson decided to hand over control to the IETF, the architecture astronauts took over, adding complex framing with six different frame types, subprotocols, extensions, versions, complex bit twiddling required to parse frame headers, fragmentation of messages into smaller frames (which is what this article is complaining about), control frames interleaved with fragmented messages, numeric status codes and textual close reason strings that “MUST NOT” be shown to the user, masking of data by xor’ing with a random value that changes for each frame, but only for one direction (client->server), a two-way closing handshake on top the existing TCP mechanisms for closing the connection, pings to test the connection for liveness, and so on. There are six registries defined for IANA to keep track of http://www.iana.org/assignments/websocket/websocket.xml; extensions, subprotocols, version numbers, close codes, opcodes, and framing bits.

And despite all of this over-engineering and attempt at extensibility, all extensions must know about each other, because there is no standard method for delimiting different extensions’ data (or even specifying how much data an extension uses), and there are three header bits and 10 frame types that all extensions must share. And I don’t really know why there’s a need for subprotocols on top of the ability to just encode that information in the URL.

It’s kind of sad how what could have been a relatively simple and easy to implement protocol has been taken over by architecture astronauts. Yes, a few of these features are actually required to securely deploy websockets (the handshake and masking). Most of them are people making up features that would be nice in theory, instead of implementing something simple that works.

And some praise:

I think his analysis is flawed. WebSocket is a message based protocol that does not specify a maximum message size in the RFC. This does not make it a streaming protocol until an implementation decides to deliver incomplete messages to the end application. Some implementations have done this, many (including all browsers) have not and will not.

Time and time again it has been demonstrated that we are bad at choosing a maximum allowed value for all applications and all future considerations (see: ethernet frame sizes, IP address lengths, operating system address spaces, file system block sizes/counts, etc).

In some cases (many of those previously listed) there were hardware, cost, or technical concerns that led to nailing down a number in an RFC. For WebSocket there is no clear benefit to forever encoding a specific numeric maximum message size. It is a high enough level protocol that there is no technical or cost benefit to make message sizes limited by anything other than individual application needs.
As such, the WebSocket RFC leaves maximum message size implementation defined, and specifically says that an implementation SHOULD implement a reasonable maximum message size for its purpose. A chat application that knows it will only be moving small text messages can set its maximum message threshold small to improve buffer performance and catch invalid messages sooner. An application that finds a business case for sending a large file in one large message can set itself up accordingly. Generic WebSocket parsers should expose a method of setting the maximum message size the application wishes to receive.

I definitely agree that not requiring implementations to return their maximum message size along with the “Message too big” error will make some sorts of interoperability more difficult. However, it also prevents exposing implementation security details and simplifies the core spec (the author has already complained that the spec is too complicated already). It is relatively simple for an application to negotiate a maximum message size privately if necessary and the WebSocket extension mechanism allows a method for standardizing a way of doing so if this turns out to be a serious issue in the future.

Post external references

  1. 1
    http://www.amazon.com/Restful-Web-Services-Leonard-Richardson/dp/0596529260
  2. 2
    http://news.ycombinator.com/item?id=3377406
Source