资源算法pytorch-SkipGram

pytorch-SkipGram

2020-02-05 | |  35 |   0 |   0

Using pytorch to implement word2vec algorithm Skip-gram Negative Sampling (SGNS), and refer paper Distributed Representations of Words and Phrases and their Compositionality.

Dependency

  • python 3.6

  • pytorch 0.4+

Usage

Run main.py.

Initialize the dataset and model.

# init dataset and modelword2vec = Word2Vec(data_path='text8',                    vocabulary_size=50000,                    embedding_size=300)# the index of the whole corpusprint(word2vec.data[:10])# word_count like this [['word', word_count], ...]# the index of list correspond index of wordprint(word2vec.word_count[:10])# index to wordprint(word2vec.index2word[34])# word to indexprint(word2vec.word2index['hello'])

Train and get the vector.

# train modelword2vec.train(train_steps=200000,               skip_window=1,               num_skips=2,               num_neg=20,               output_dir='out/run-1')# save vector txt fileword2vec.save_vector_txt(path_dir='out/run-1')# get vector listvector = word2vec.get_list_vector()print(vector[123])print(vector[word2vec.word2index['hello']])# get top k similar wordsim_list = word2vec.most_similar('one', top_k=8)print(sim_list)# load pre-train modelword2vec.load_model('out/run-1/model_step200000.pt')

Evaluate

Refer repository eval-word-vectors. Like this:

eval/wordsim.py vector.txt eval/data/EN-MTurk-287.txt
eval/wordsim.py vector.txt eval/data/EN-MC-30.txt


上一篇:dcc

下一篇:BigBatch

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...