Familia 开源项目包含文档主题推断工具、语义匹配计算工具以及基于工业级语料训练的三种主题模型:Latent Dirichlet
Allocation(LDA)、SentenceLDA 和Topical Word Embedding(TWE)。
支持用户以“拿来即用”的方式进行文本分类、文本聚类、个性化推荐等多种场景的调研和应用。考虑到主题模型训练成本较高以及开源主题模型资源有限的现状,我们会陆续开放基于工业级语料训练的多个垂直领域的主题模型,以及这些模型在工业界的典型应用方式,助力主题模型技术的科研和落地。(English)
欢迎提交任何问题和Bug Report至Github Issues。
或者发送咨询邮件至{ familia } at baidu.com
Docker
docker run -d
--name familia
-e MODEL_NAME=news
-p 5000:5000
orctom/familia
MODEL_NAME can be one of news/novel/webpage/webo
API
http://localhost:5000/swagger/
Citation
The following article describes the Familia project and industrial
cases powered by topic modeling. It bundles and translates the Chinese
documentation of the website. We recommend citing this article as
default.
@article{jiang2018familia,
author = {Di Jiang and Yuanfeng Song and Rongzhong Lian and Siqi Bao and Jinhua Peng and Huang He and Hua Wu},
title = {{Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering}},
journal = {arXiv preprint arXiv:1808.03733},
year = {2018}
}