Smash Company Splash Image

October 10th, 2015

No Comments

If you enjoy this article, see the other most popular articles

If you enjoy this article, see the other most popular articles

If you enjoy this article, see the other most popular articles

The TCP checksum is weak, and the Ethernet checksum will accept corrupt TCP that passed the TCP checksum

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

At Twitter, a team had a unusual failure where corrupt data ended up in memcache. The root cause appears to have been a switch that was corrupting packets. Most packets were being dropped and the throughput was much lower than normal, but some were still making it through. The hypothesis is that occasionally the corrupt packets had valid TCP and Ethernet checksums. One “lucky” packet stored corrupt data in memcache. Even after the switch was replaced, the errors continued until the cache was cleared.

I was very excited to hear about this error, because it is a real-world example of something I wrote about back in July 2008: The TCP checksum is weak. However, the Ethernet CRC is strong, so how could a corrupt packet pass both checks? The answer is that the Ethernet CRC is recalculated by switches. As long as the corrupt packet has the same TCP checksum, the hardware will happily recalculate a new, valid Ethernet CRC. As a result, if you are creating a new network protocol, please append a CRC (I suggest CRC32C, implemented in hardware on recent Intel, AMD, and ARM CPUs). An alternative is to use an encryption protocol (e.g. TLS), since they include cryptographic hashes (e.g. see a description of packet corruption in OSPF being fixed with TLS). The rest of this article describes the details about how this is possible, mostly so I don’t forget them.

Properties of the TCP checksum

The TCP checksum is two bytes long, and can detect any burst error of 15 bits, and most burst errors of 16 bits (excluding switching 0x0000 and 0xffff). This means that to keep the same checksum, a packet must be corrupted in at least two locations, at least 2 bytes apart. If the chance is purely random, we should expect approximately 1 in 216 (approximately 0.001%) of corrupt packets to not be detected. This seems small, but on one Gigabit Ethernet connection, that could be as many as 15 packets per second. For details about how to compute the TCP checksum and its error properties, see RFC 1071.

Post external references

1
http://www.evanjones.ca/tcp-and-ethernet-checksums-fail.html

Source

Check out my books:

RECENT COMMENTS

February 8, 2022 9:33 am

From Michael S on How I recovered from Lyme Disease: I fasted for two weeks, no food, just water

"Did you have Bartonella, too? Seems it uses autogenesis..."

January 11, 2022 4:33 am

From Essie on Docker is the dangerous gamble which we will regret

"Once in 1990s, there are popular high performance solution called HPC software, many commercial softwares are ..."

December 17, 2021 7:32 pm

From John Carston on The ethics of being a high level tech consultant (a Fractional CTO)

"It helped when you mentioned that it is important to have a real connection with your consumer. My cousin ment..."

September 2, 2021 7:47 pm

From Mojavedfo on Where PHP regex fails

"55 thousand Greek, 30 thousand Armenian..."

August 7, 2021 9:53 am

From Colin Steele on The ethics of being a high level tech consultant (a Fractional CTO)

"Fantastic essay. Thoughtful, well-constructed, timely and applicable. I think every part-timer in the tech f..."

August 5, 2021 3:02 pm

From Rachiovwn on Where PHP regex fails

"consists of the book itself..."

October 19, 2019 3:08 am

From Bernd Schatz on Object Oriented Programming is an expensive disaster which must end

"I really enjoyed your article. But i can't understand the example with the interface. The example is reall..."

October 17, 2019 4:50 pm

From Anderson Nascimento Nunes on The conventional wisdom among social media companies is that you can’t put too much of the onus on users to personalize their own feeds

"Can't speak for anyone else, but on my feed reader: 5K bookmarked feeds, 50K regex on the killfile to filter o..."

October 10, 2019 11:17 am

From روابط: البث المباشر – صفحات صغيرة on RSS has been damaged by in-fighting among those who advocate for it

"[...] تاريخ تقنية RSS، مقال قديم ويلقي نظرة على الناس الذين طوروا التقنية [...]..."

October 9, 2019 3:08 pm

From Dan Campbell on Object Oriented Programming is an expensive disaster which must end

"Object-Oriented Programming is Bad https://www.youtube.com/watch?v=QM1iUe6IofM..."

October 4, 2019 8:44 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, I am working to clean up some of my Packer/Terraform code so I can release it on Github, and then..."

October 4, 2019 5:14 pm

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"> Packer, sometimes with some Ansible. The combination of Packer and Terraform typically gives me what I ne..."

October 4, 2019 12:40 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, about this: "I would love if you could point out which VM based system makes it simpler and..."

October 4, 2019 7:31 am

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"I won't list anything concrete that you missed, because that will just give you ammunition to build the next a..."

October 4, 2019 1:39 am

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, also, I don't think you understand what a "straw man argument" is. This is a definition from Wiki..."

NO COMMENTS

Leave a Reply Cancel reply