Smash Company Splash Image

April 13th, 2017

No Comments

If you enjoy this article, see the other most popular articles

If you enjoy this article, see the other most popular articles

If you enjoy this article, see the other most popular articles

Katharine Jarmul describes the bias in Google’s Word2Vec software

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

I must warn you that parts of this post are disgusting, disturbing and awful. If you are having a rough day, feel free to save for another time. If you are already sick of seeing hateful language, this is likely not a post to read at present. That said, I feel my duty as a former journalist to look at it, expose it, and hope to spark better conversations around how we handle both implicit and explicit bias and prejudice in our models.

To dive a bit deeper into how these biases play out, let's do some standard analogies. We all know the King-Queen comparison, how might that apply to other professions?

In: model.most_similar(positive= ['doctor', 'woman'], negative=['man'])  
Out: [('gynecologist', 0.7093892097473145), ('nurse', 0.647728681564331),...]  

So, Doctor - Man + Woman = Gynecologist or Nurse. Great! What else?

Also:

So, Computer Programmer – Man + Woman = housewife. Or graphic designer. Because of course women only do design work (never great male designers or amazing female DBAs). Now pay attention that some of these vectors have varying degrees of similarity (noted in the second element of the tuple); the higher the number, the closer the vectors. That said, these are real responses from word2vec.

….

….

There were many more offensive phrases I found, many of which I didn’t save or write down as I could really only stomach 5 minutes at a time of research until I needed a mental and spiritual break. Here are a summary of some I remembered and was able to find again:

mexicans => illegals, beaners

asians => gooks, wimps

jews => kikes

asian + woman => teenage girl, sucking dick

gay + man => “horribly, horribly deranged”

transsexual + man => convicted rapist

Post external references

1
http://kjamistan.com/embedded-isms-in-vector-based-natural-language-processing/

Source

Check out my books:

RECENT COMMENTS

February 8, 2022 9:33 am

From Michael S on How I recovered from Lyme Disease: I fasted for two weeks, no food, just water

"Did you have Bartonella, too? Seems it uses autogenesis..."

January 11, 2022 4:33 am

From Essie on Docker is the dangerous gamble which we will regret

"Once in 1990s, there are popular high performance solution called HPC software, many commercial softwares are ..."

December 17, 2021 7:32 pm

From John Carston on The ethics of being a high level tech consultant (a Fractional CTO)

"It helped when you mentioned that it is important to have a real connection with your consumer. My cousin ment..."

September 2, 2021 7:47 pm

From Mojavedfo on Where PHP regex fails

"55 thousand Greek, 30 thousand Armenian..."

August 7, 2021 9:53 am

From Colin Steele on The ethics of being a high level tech consultant (a Fractional CTO)

"Fantastic essay. Thoughtful, well-constructed, timely and applicable. I think every part-timer in the tech f..."

August 5, 2021 3:02 pm

From Rachiovwn on Where PHP regex fails

"consists of the book itself..."

October 19, 2019 3:08 am

From Bernd Schatz on Object Oriented Programming is an expensive disaster which must end

"I really enjoyed your article. But i can't understand the example with the interface. The example is reall..."

October 17, 2019 4:50 pm

From Anderson Nascimento Nunes on The conventional wisdom among social media companies is that you can’t put too much of the onus on users to personalize their own feeds

"Can't speak for anyone else, but on my feed reader: 5K bookmarked feeds, 50K regex on the killfile to filter o..."

October 10, 2019 11:17 am

From روابط: البث المباشر – صفحات صغيرة on RSS has been damaged by in-fighting among those who advocate for it

"[...] تاريخ تقنية RSS، مقال قديم ويلقي نظرة على الناس الذين طوروا التقنية [...]..."

October 9, 2019 3:08 pm

From Dan Campbell on Object Oriented Programming is an expensive disaster which must end

"Object-Oriented Programming is Bad https://www.youtube.com/watch?v=QM1iUe6IofM..."

October 4, 2019 8:44 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, I am working to clean up some of my Packer/Terraform code so I can release it on Github, and then..."

October 4, 2019 5:14 pm

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"> Packer, sometimes with some Ansible. The combination of Packer and Terraform typically gives me what I ne..."

October 4, 2019 12:40 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, about this: "I would love if you could point out which VM based system makes it simpler and..."

October 4, 2019 7:31 am

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"I won't list anything concrete that you missed, because that will just give you ammunition to build the next a..."

October 4, 2019 1:39 am

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, also, I don't think you understand what a "straw man argument" is. This is a definition from Wiki..."

NO COMMENTS

Leave a Reply Cancel reply