Netflix monitors its micro-services

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Interesting:

The ability to decompose where time is spent both within and across the fleet of microservices can be a challenge given the number of dependencies. Such information can be leveraged to identify the root cause of performance degradation or identify areas ripe for optimization within a given microservice. Our Mogul utility consumes data from Netflix’s recently open-sourced Atlas monitoring framework, applies correlation between metrics, and selects those most likely to be responsible for changes in demand on a given microservice. The different resources evaluated include:

System resource demand (CPU, network, disk)
JVM pressure (thread contention, garbage collection)
Service IPC calls
Persistency-related calls (EVCache, Cassandra)
Errors and timeouts

It is not uncommon for a mogul query to pull thousands of metrics, subsequently reducing to tens of metrics through correlation with system demand. In the following example, we were able to quickly identify which downstream service was causing performance issues for the service under study. This particular microservice has over 40,000 metrics. Mogul reduced this internally to just over 2000 metrics via pattern matching, then correlated the top 4-6 interesting metrics grouped into classifications.

Post external references

1
http://techblog.netflix.com/2015/02/a-microscope-on-microservices.html

Source

RECENT COMMENTS

January 11, 2022 4:33 am

From Essie on Docker is the dangerous gamble which we will regret

"Once in 1990s, there are popular high performance solution called HPC software, many commercial softwares are ..."

December 17, 2021 7:32 pm

From John Carston on The ethics of being a high level tech consultant (a Fractional CTO)

"It helped when you mentioned that it is important to have a real connection with your consumer. My cousin ment..."

September 2, 2021 7:47 pm

From Mojavedfo on Where PHP regex fails

"55 thousand Greek, 30 thousand Armenian..."

August 7, 2021 9:53 am

From Colin Steele on The ethics of being a high level tech consultant (a Fractional CTO)

"Fantastic essay. Thoughtful, well-constructed, timely and applicable. I think every part-timer in the tech f..."

August 5, 2021 3:02 pm

From Rachiovwn on Where PHP regex fails

"consists of the book itself..."

October 19, 2019 3:08 am

From Bernd Schatz on Object Oriented Programming is an expensive disaster which must end

"I really enjoyed your article. But i can't understand the example with the interface. The example is reall..."

October 4, 2019 8:44 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, I am working to clean up some of my Packer/Terraform code so I can release it on Github, and then..."

October 4, 2019 5:14 pm

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"> Packer, sometimes with some Ansible. The combination of Packer and Terraform typically gives me what I ne..."

October 4, 2019 12:40 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, about this: "I would love if you could point out which VM based system makes it simpler and..."

October 4, 2019 7:31 am

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"I won't list anything concrete that you missed, because that will just give you ammunition to build the next a..."

October 4, 2019 1:39 am

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, also, I don't think you understand what a "straw man argument" is. This is a definition from Wiki..."

Netflix monitors its micro-services

Post external references

RECENT COMMENTS

NO COMMENTS

Leave a Reply Cancel reply