资源算法simhash-java

simhash-java

2019-12-30 | |  42 |   0 |   0

simhash-java

A simple implementation of simhash algorithm by java.

Features:

  1. compute the simhash of a string

  2. compute the similarity between all the strins by build smart index, so We can deal with big data.

How to use:

  • run Main with inputfile and outputfile.

  • The format of inputfile(see src/test_in): one doc eachline with the utf8 charset.

  • The format of outputfile(see src/test_out):

  • start //start flag

  • first line // doc

  • sencode lien // doc1tdist the dist is the hamming distance between doc and doc1

  • end //end flag

Future:

  1. Build the project to a runnable jar.

  2. Improve the performace under big data.

Note:

  1. Before run Main.java, you should choose a better analyzer instead of BinaryWordSeg!


上一篇:simhash-py

下一篇:simhashphp

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...