simhash:中文文档simhash值计算

上传者: 42110038 | 上传时间: 2022-05-28 20:26:51 | 文件大小: 4.37MB | 文件类型: ZIP
C++
专门针对中文文档的simhash算法库 简介 此项目用来对中文文档计算出对应的simhash值。simhash是谷歌用来进行文本去重的算法,现在广泛应用在文本处理中。 详见 特性 使用作为分词器和关键字抽取器 使用作为hash函数 hpp风格,所有源码都是.hpp文件里面,方便使用。没有链接,就没有伤害。 本项目的副产品项目: 提供了简单的simhash HTTP服务。 依赖 g ++(建议版本> = 4.1)或clang ++。 用法 mkdir build cd build cmake .. make 测试 make test 演示 ./demo 结果如下: 文本:"我是蓝翔技工拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上总经理,出任CEO,走上人生巅峰。" 关键词序列是: ["蓝翔:11.7392", "CEO:11.7392", "升职:10.8562", "加薪:1

文件下载

资源详情

[{"title":"( 96 个子文件 4.37MB ) simhash:中文文档simhash值计算","children":[{"title":"simhash-master","children":[{"title":".gitignore <span style='color:#111;'> 67B </span>","children":null,"spread":false},{"title":".travis.yml <span style='color:#111;'> 275B </span>","children":null,"spread":false},{"title":"deps","children":[{"title":"limonp","children":[{"title":"Condition.hpp <span style='color:#111;'> 684B </span>","children":null,"spread":false},{"title":"StringUtil.hpp <span style='color:#111;'> 9.06KB </span>","children":null,"spread":false},{"title":"ThreadPool.hpp <span style='color:#111;'> 1.73KB </span>","children":null,"spread":false},{"title":"BoundedQueue.hpp <span style='color:#111;'> 1.10KB </span>","children":null,"spread":false},{"title":"ArgvContext.hpp <span style='color:#111;'> 1.47KB </span>","children":null,"spread":false},{"title":"Colors.hpp <span style='color:#111;'> 570B </span>","children":null,"spread":false},{"title":"Closure.hpp <span style='color:#111;'> 4.33KB </span>","children":null,"spread":false},{"title":"ForcePublic.hpp <span style='color:#111;'> 142B </span>","children":null,"spread":false},{"title":"BoundedBlockingQueue.hpp <span style='color:#111;'> 1.29KB </span>","children":null,"spread":false},{"title":"Config.hpp <span style='color:#111;'> 2.21KB </span>","children":null,"spread":false},{"title":"LocalVector.hpp <span style='color:#111;'> 2.78KB </span>","children":null,"spread":false},{"title":"Logging.hpp <span style='color:#111;'> 1.56KB </span>","children":null,"spread":false},{"title":"NonCopyable.hpp <span style='color:#111;'> 411B </span>","children":null,"spread":false},{"title":"StdExtension.hpp <span style='color:#111;'> 3.04KB </span>","children":null,"spread":false},{"title":"Md5.hpp <span style='color:#111;'> 12.54KB </span>","children":null,"spread":false},{"title":"BlockingQueue.hpp <span style='color:#111;'> 973B </span>","children":null,"spread":false},{"title":"Thread.hpp <span style='color:#111;'> 833B </span>","children":null,"spread":false},{"title":"FileLock.hpp <span style='color:#111;'> 1.30KB </span>","children":null,"spread":false},{"title":"MutexLock.hpp <span style='color:#111;'> 949B </span>","children":null,"spread":false}],"spread":false},{"title":"cppjieba","children":[{"title":"Trie.hpp <span style='color:#111;'> 4.28KB </span>","children":null,"spread":false},{"title":"PosTagger.hpp <span style='color:#111;'> 1.94KB </span>","children":null,"spread":false},{"title":"PreFilter.hpp <span style='color:#111;'> 1.22KB </span>","children":null,"spread":false},{"title":"SegmentBase.hpp <span style='color:#111;'> 710B </span>","children":null,"spread":false},{"title":"Jieba.hpp <span style='color:#111;'> 2.72KB </span>","children":null,"spread":false},{"title":"KeywordExtractor.hpp <span style='color:#111;'> 3.92KB </span>","children":null,"spread":false},{"title":"HMMModel.hpp <span style='color:#111;'> 3.20KB </span>","children":null,"spread":false},{"title":"FullSegment.hpp <span style='color:#111;'> 2.06KB </span>","children":null,"spread":false},{"title":"DictTrie.hpp <span style='color:#111;'> 5.71KB </span>","children":null,"spread":false},{"title":"TransCode.hpp <span style='color:#111;'> 1.61KB </span>","children":null,"spread":false},{"title":"MPSegment.hpp <span style='color:#111;'> 2.80KB </span>","children":null,"spread":false},{"title":"HMMSegment.hpp <span style='color:#111;'> 4.52KB </span>","children":null,"spread":false},{"title":"LevelSegment.hpp <span style='color:#111;'> 2.10KB </span>","children":null,"spread":false},{"title":"QuerySegment.hpp <span style='color:#111;'> 2.42KB </span>","children":null,"spread":false},{"title":"MixSegment.hpp <span style='color:#111;'> 2.33KB </span>","children":null,"spread":false}],"spread":false}],"spread":true},{"title":"README.md <span style='color:#111;'> 2.38KB </span>","children":null,"spread":false},{"title":"include","children":[{"title":"simhash","children":[{"title":"jenkins.h <span style='color:#111;'> 10.40KB </span>","children":null,"spread":false},{"title":"Simhasher.hpp <span style='color:#111;'> 6.31KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"example","children":[{"title":"demo.cpp <span style='color:#111;'> 1.34KB </span>","children":null,"spread":false},{"title":"CMakeLists.txt <span style='color:#111;'> 81B </span>","children":null,"spread":false}],"spread":true},{"title":"dict","children":[{"title":"idf.utf8 <span style='color:#111;'> 5.72MB </span>","children":null,"spread":false},{"title":"stop_words.utf8 <span style='color:#111;'> 8.76KB </span>","children":null,"spread":false},{"title":"jieba.dict.utf8 <span style='color:#111;'> 4.84MB </span>","children":null,"spread":false},{"title":"hmm_model.utf8 <span style='color:#111;'> 507.56KB </span>","children":null,"spread":false}],"spread":true},{"title":"test","children":[{"title":"load_test.cpp <span style='color:#111;'> 820B </span>","children":null,"spread":false},{"title":"testdata","children":[{"title":"news_content.4 <span style='color:#111;'> 1.49KB </span>","children":null,"spread":false},{"title":"news_content <span style='color:#111;'> 7.63KB </span>","children":null,"spread":false},{"title":"news_content.2 <span style='color:#111;'> 7.63KB </span>","children":null,"spread":false},{"title":"news_content.3 <span style='color:#111;'> 1.48KB </span>","children":null,"spread":false}],"spread":true},{"title":"CMakeLists.txt <span style='color:#111;'> 123B </span>","children":null,"spread":false},{"title":"unittest","children":[{"title":"gtest_main.cpp <span style='color:#111;'> 1.73KB </span>","children":null,"spread":false},{"title":"TSimhash.cpp <span style='color:#111;'> 2.88KB </span>","children":null,"spread":false},{"title":"gtest-1.6.0","children":[{"title":"src","children":[{"title":"gtest_main.cc <span style='color:#111;'> 1.73KB </span>","children":null,"spread":false},{"title":"gtest-port.cc <span style='color:#111;'> 24.73KB </span>","children":null,"spread":false},{"title":"gtest-all.cc <span style='color:#111;'> 2.11KB </span>","children":null,"spread":false},{"title":"gtest-death-test.cc <span style='color:#111;'> 45.33KB </span>","children":null,"spread":false},{"title":"gtest-test-part.cc <span style='color:#111;'> 4.10KB </span>","children":null,"spread":false},{"title":"gtest-internal-inl.h <span style='color:#111;'> 39.34KB </span>","children":null,"spread":false},{"title":"gtest-printers.cc <span style='color:#111;'> 11.76KB </span>","children":null,"spread":false},{"title":".deps","children":[{"title":"gtest-all.Plo <span style='color:#111;'> 19.18KB </span>","children":null,"spread":false},{"title":"gtest_main.Plo <span style='color:#111;'> 14.54KB </span>","children":null,"spread":false},{"title":".dirstamp <span style='color:#111;'> 0B </span>","children":null,"spread":false}],"spread":false},{"title":"gtest-filepath.cc <span style='color:#111;'> 13.92KB </span>","children":null,"spread":false},{"title":"gtest.cc <span style='color:#111;'> 175.61KB </span>","children":null,"spread":false},{"title":".dirstamp <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"gtest-typed-test.cc <span style='color:#111;'> 3.66KB </span>","children":null,"spread":false}],"spread":false},{"title":"include","children":[{"title":"gtest","children":[{"title":"gtest-param-test.h.pump <span style='color:#111;'> 18.36KB </span>","children":null,"spread":false},{"title":"gtest-test-part.h <span style='color:#111;'> 6.32KB </span>","children":null,"spread":false},{"title":"gtest-spi.h <span style='color:#111;'> 9.72KB </span>","children":null,"spread":false},{"title":"gtest-message.h <span style='color:#111;'> 8.20KB </span>","children":null,"spread":false},{"title":"gtest_pred_impl.h <span style='color:#111;'> 14.79KB </span>","children":null,"spread":false},{"title":"gtest-typed-test.h <span style='color:#111;'> 10.00KB </span>","children":null,"spread":false},{"title":"gtest_prod.h <span style='color:#111;'> 2.27KB </span>","children":null,"spread":false},{"title":"gtest.h <span style='color:#111;'> 80.52KB </span>","children":null,"spread":false},{"title":"gtest-death-test.h <span style='color:#111;'> 10.85KB </span>","children":null,"spread":false},{"title":"gtest-param-test.h <span style='color:#111;'> 74.09KB </span>","children":null,"spread":false},{"title":"internal","children":[{"title":"gtest-string.h <span style='color:#111;'> 13.28KB </span>","children":null,"spread":false},{"title":"gtest-filepath.h <span style='color:#111;'> 9.47KB </span>","children":null,"spread":false},{"title":"gtest-param-util.h <span style='color:#111;'> 23.67KB </span>","children":null,"spread":false},{"title":"gtest-internal.h <span style='color:#111;'> 46.22KB </span>","children":null,"spread":false},{"title":"gtest-param-util-generated.h.pump <span style='color:#111;'> 9.18KB </span>","children":null,"spread":false},{"title":"gtest-death-test-internal.h <span style='color:#111;'> 12.60KB </span>","children":null,"spread":false},{"title":"gtest-type-util.h.pump <span style='color:#111;'> 9.09KB </span>","children":null,"spread":false},{"title":"gtest-port.h <span style='color:#111;'> 61.06KB </span>","children":null,"spread":false},{"title":"gtest-param-util-generated.h <span style='color:#111;'> 165.26KB </span>","children":null,"spread":false},{"title":"gtest-type-util.h <span style='color:#111;'> 181.31KB </span>","children":null,"spread":false},{"title":"gtest-tuple.h.pump <span style='color:#111;'> 9.01KB </span>","children":null,"spread":false},{"title":"gtest-linked_ptr.h <span style='color:#111;'> 7.87KB </span>","children":null,"spread":false},{"title":"gtest-tuple.h <span style='color:#111;'> 27.46KB </span>","children":null,"spread":false}],"spread":false},{"title":"gtest-printers.h <span style='color:#111;'> 29.56KB </span>","children":null,"spread":false}],"spread":false}],"spread":false}],"spread":false},{"title":"TJenkins.cpp <span style='color:#111;'> 532B </span>","children":null,"spread":false},{"title":"CMakeLists.txt <span style='color:#111;'> 446B </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"CMakeLists.txt <span style='color:#111;'> 434B </span>","children":null,"spread":false},{"title":"ChangeLog.md <span style='color:#111;'> 1.04KB </span>","children":null,"spread":false},{"title":"README_EN.md <span style='color:#111;'> 2.15KB </span>","children":null,"spread":false}],"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明