pytorch-SkipGram

资源分类

pytorch-SkipGram

2020-02-05 |

80 |

0 |

pytorch-SkipGram

Using pytorch to implement word2vec algorithm Skip-gram Negative Sampling (SGNS), and refer paper Distributed Representations of Words and Phrases and their Compositionality.

Dependency

python 3.6
pytorch 0.4+

Usage

Run main.py.

Initialize the dataset and model.

# init dataset and modelword2vec = Word2Vec(data_path='text8',                    vocabulary_size=50000,                    embedding_size=300)# the index of the whole corpusprint(word2vec.data[:10])# word_count like this [['word', word_count], ...]# the index of list correspond index of wordprint(word2vec.word_count[:10])# index to wordprint(word2vec.index2word[34])# word to indexprint(word2vec.word2index['hello'])

Train and get the vector.

# train modelword2vec.train(train_steps=200000,               skip_window=1,               num_skips=2,               num_neg=20,               output_dir='out/run-1')# save vector txt fileword2vec.save_vector_txt(path_dir='out/run-1')# get vector listvector = word2vec.get_list_vector()print(vector[123])print(vector[word2vec.word2index['hello']])# get top k similar wordsim_list = word2vec.most_similar('one', top_k=8)print(sim_list)# load pre-train modelword2vec.load_model('out/run-1/model_step200000.pt')

Evaluate

Refer repository eval-word-vectors. Like this:

eval/wordsim.py vector.txt eval/data/EN-MTurk-287.txt

eval/wordsim.py vector.txt eval/data/EN-MC-30.txt

上一篇：dcc

下一篇：BigBatch

用户评价

全部评价

还没有评论，说两句吧！

热门资源

allennlp-server

allennlp-server Serve allennlp services as sep...
ubuntu-allennlp

ubuntu-allennlp AllenAI AllenNLP image based o...
allennlp_extras

allennlp_extras Some utilities build on top of...
allennlp-dureader

An Apache 2.0 NLP research library, built on Py...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com