Reddit-Flair-Detector
A Web App based on Python's micro web framework Flask which detects the flair (category) of a post on the subreddit india by utilising the power of Natural Language Processing and Machine Learning. The app can be used here Reddit Flair Detector.
The user enters the url of the required post. The app takes the url, extracts various features from it (comments, authors, body .etc.) and tries to predict the flair using them by applying the finalized model.
1 . clone into repository bash https://github.com/chandan21gupta/Reddit-Flair-Detector
.
Create a virtual environment by the command bash virtualenv -p python3 env
.
Go inside the cloned directory and enter command bash pip install -r requirements.txt
.
Go inside the Web directory and enter command bash Python3 flask_app.py
to start the server. It can be found here. Simply clone into it.
Data contains mongodb instance of raw data,its csv and the resulting data after cleaning and pre processing. It also contains a script called graph.py that generates statistics for data.
Finalized_Model contains the finalizes ML algorithm and combined data feature which gave the maximum accuracy during testing and training.
Web contains the flask application deployed for heroku server.
Scripts contains the the files used pre deployment, that is, the code used for scraping reddit posts and training the Machine Learning ALgorithms.
sources.txt lists the sources used for the entire project.
The data was collected using the praw library in Python.The codebase is located in the Scripts under the name reddit_webScrapper.py For comments - top ten comments were considered along with their authors. Total 100 posts are considered for data analysis. It is stored in a database using mongodb.
After the data collection, and going through various articles on internet about first step towards analysis of collected data, I got across this wonderful article which explained everything, from the data pre-processing to data analysis in Natural Language Processing. The data was cleaned using textcleaning.py Scripts, which I saved in a csv file (cleaned_dataset.csv).
After cleaning the data, various ML algorithms were trained with testing and training dataset in the ratio 3:7.
First standalone features were tried like comments, title, body, url .etc. However the accuracy was not upto the mark. The average accuracy remained around 50-55%. After that, many features were combined on the basis of their standlone accuracies. The combination of features that gave the best accuracy (60-65%) was that of title,comments,url and body, with Logistic Regression giving the best accuracy(66%). After that, the training and tesing ratio was increased to improve the model (9:1), since the data was sparse.
Basically five ML algorithms were used:
Naive Bayes
Linear Suport Vector Machine
Logistic Regression
Random Forest
Multi Layer Perceptron
A flask app was made with two routes - "/" the home route, "/action_page" for displaying the predicted flair, "/stats" for statistics.
requirements.txt contains the list of all the dependencies.
下一篇:Flair_SOTA_NLP
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
shih-styletransfer
shih-styletransfer Code from Style Transfer ...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com