August 26th, 2015
In Technology
No Comments
If you enjoy this article, see the other most popular articles
If you enjoy this article, see the other most popular articles
If you enjoy this article, see the other most popular articles
Natural language processing (NLP) is a messy and difficult affair
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.
Similar words are nearby vectors in a vector space. This is a powerful convention since it lets us wipe away a lot of the noise and nuance in vocabulary. For example, let’s use gensim to find a list of words similar to vacation using the freebase skipgram data6:
from gensim.models import Word2Vec
fn = “freebase-vectors-skipgram1000-en.bin.gz”
model = Word2Vec.load_word2vec_format(fn)
model.most_similar(‘vacation’)# [(‘trip’, 0.7234684228897095),
# (‘honeymoon’, 0.6447688341140747),
# (‘beach’, 0.6249285936355591),
# (‘vacations’, 0.5868890285491943),
# (‘wedding’, 0.5541957020759583),
# (‘resort’, 0.5231006145477295),
# (‘traveling’, 0.5194448232650757),
# (‘vacation.’, 0.5068142414093018),
# (‘vacationing’, 0.5013546943664551)]We’ve calculated the vectors most similar to the vector for vacation, and then looked up what words those vectors represent. As we read the list, we note that these words aren’t just similar in vector space, but that they make sense intuitively too.
Post external references
- 1
http://multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors/
February 8, 2022 9:33 am
From Michael S on How I recovered from Lyme Disease: I fasted for two weeks, no food, just water
"Did you have Bartonella, too? Seems it uses autogenesis..."