cnn-text-classification-tf-Chinese-py3
dennybritz's original code supports python 3,but not support Chinese
indiejoseph's original code supports Chinese,but it does not support python3 and tensorflow 1.1
I mixed them up.
I dont know how it works.
But it actually works.
The highway has been removed because i dont know how to make it work on tf 1.1
My graphic card is GTX960 with 2GB memory,it would have two delay.One occurs when writting data before training.The other occurs when the first time evaluate.
If you have the same delay,please set the TDR Delay to 20.Or the operating system would kill it
And i add predict.py.It is used to pridict if the sentences are cantonese. You can load your own checkpoint to make your own classification. File_processor Added. The usage would be given in follow.
dennybritz 的代码 支持 python 3,但不支持中文,训练的准确率只有70%左右
indiejoseph 的代码 支持中文,但不能在python3,tensorflow1.1的平台上运行
于是我把他们的代码拼起来了
我不知道为什么
反正它能跑了
Highway 这个层我注释掉了并前后文做了一点修改,因为我不知道怎么样让他在tf 1.1上跑起来,我好菜啊
我的古董GTX960只有两个G的显存,所以在加载数据和第一次评估的时候会卡屏
如果你遇到同样的情况,请把TDR延迟调至20,否则卡两秒就被操作系统结束进程了
我加了一个pridict.py,用来区分这些句子是不是广东话 你可以调用自己的存储点来做自己的分类 加入了对文件进行批处理的脚本file_process.py 用法会在下面给出
特别鸣谢:睿睿老师和朱敬xua老师
The following are their original README(Mixed,of coures):
以下是他们的README(同样是组合起来了):
Sentiment classification forked from dennybritz/cnn-text-classification-tf, make the data helper supports Chinese language and modified the embedding from word-level to character-level, though that increased vocabulary size, and also i've implemented the Character-Aware Neural Language Models network structure which CNN + Highway network to improve the performance, this version can achieve an accuracy of 98% with the Chinese corpus
This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.
It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.
Python 3
Tensorflow > 0.12
Numpy
python3.5
Tensorflow 1.1
Numpy
cuDNN 5.1
Print parameters:
./train.py --help
optional arguments: -h, --help show this help message and exit --embedding_dim EMBEDDING_DIM Dimensionality of character embedding (default: 128) --filter_sizes FILTER_SIZES Comma-separated filter sizes (default: '3,4,5') --num_filters NUM_FILTERS Number of filters per filter size (default: 128) --l2_reg_lambda L2_REG_LAMBDA L2 regularizaion lambda (default: 0.0) --dropout_keep_prob DROPOUT_KEEP_PROB Dropout keep probability (default: 0.5) --batch_size BATCH_SIZE Batch Size (default: 64) --num_epochs NUM_EPOCHS Number of training epochs (default: 100) --evaluate_every EVALUATE_EVERY Evaluate model on dev set after this many steps (default: 100) --checkpoint_every CHECKPOINT_EVERY Save model after this many steps (default: 100) --allow_soft_placement ALLOW_SOFT_PLACEMENT Allow device soft device placement --noallow_soft_placement --log_device_placement LOG_DEVICE_PLACEMENT Log placement of ops on devices --nolog_device_placement
Train:
./train.py
./eval.py --eval_train --checkpoint_dir="./runs/1459637919/checkpoints/"
Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py
script to load your data.
import predict pridict( ( "sentence" , ) )
./file_process
It would read the ./text.txt and output dir is./output checkpoint dir could be edited in predict.py
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com