| --- |
| tags: |
| - word2vec |
| language: hsb |
| license: gpl-3.0 |
| --- |
| |
| ## Description |
| Word embedding model trained by Al-Rfou et al. |
|
|
|
|
| ## How to use? |
|
|
| ``` |
| import pickle |
| from numpy import dot |
| from numpy.linalg import norm |
| from huggingface_hub import hf_hub_download |
| words, embeddings = pickle.load(open(hf_hub_download(repo_id="Word2vec/polyglot_words_embeddings_en", filename="words_embeddings_en.pkl"), 'rb'),encoding="latin1") |
| |
| word = "Irish" |
| a = embeddings[words.index(word)] |
| most_similar = [] |
| for i in range(len(embeddings)): |
| if i != words.index(word): |
| b = embeddings[i] |
| cos_sim = dot(a, b)/(norm(a)*norm(b)) |
| most_similar.append(cos_sim) |
| else: |
| most_similar.append(0) |
| |
| words[most_similar.index(max(most_similar))] |
| ``` |
|
|
| ## Citation |
|
|
| ``` |
| @InProceedings{polyglot:2013:ACL-CoNLL, |
| author = {Al-Rfou, Rami and Perozzi, Bryan and Skiena, Steven}, |
| title = {Polyglot: Distributed Word Representations for Multilingual NLP}, |
| booktitle = {Proceedings of the Seventeenth Conference on Computational Natural Language Learning}, |
| month = {August}, |
| year = {2013}, |
| address = {Sofia, Bulgaria}, |
| publisher = {Association for Computational Linguistics}, |
| pages = {183--192}, |
| url = {http://www.aclweb.org/anthology/W13-3520} |
| } |
| ``` |