text2vec:text2vec,中文文本到vetor。(文本向量化表示工具,包括词向量化,句子向量化,句子相似度计算)-源码

上传者: 42149145 | 上传时间: 2021-09-25 10:16:25 | 文件大小: 141KB | 文件类型: ZIP
text2vec text2vec,中文文本给vetor。(文本向量化表示工具,包括词向量化,句子向量化) 特征 文本向量表示 字词粒度,通过腾讯AI Lab开放式的大规模扩展中文 (文件名:light_Tencent_AILab_ChineseEmbedding.bin密码:tawe),获取字词的word2vec矢量表示。 句子粒度,通过求句子中所有单词词嵌入的预先计算得到。 篇章粒度,可以通过gensim库的doc2vec得到,应用替代,本项目不实现。 文本相似度计算 基准方法,估计两个句子间语义相似度最简单的方法就是求句子中所有单词词嵌入的前缀,然后计算两个句子词嵌入之间的余弦相似性。

文件下载

资源详情

[{"title":"( 60 个子文件 141KB ) text2vec:text2vec,中文文本到vetor。(文本向量化表示工具,包括词向量化,句子向量化,句子相似度计算)-源码","children":[{"title":"text2vec-master","children":[{"title":".github","children":[{"title":"stale.yml <span style='color:#111;'> 766B </span>","children":null,"spread":false},{"title":"ISSUE_TEMPLATE","children":[{"title":"bug-report.md <span style='color:#111;'> 1.16KB </span>","children":null,"spread":false},{"title":"usage-question.md <span style='color:#111;'> 682B </span>","children":null,"spread":false},{"title":"feature-request.md <span style='color:#111;'> 782B </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"requirements-dev.txt <span style='color:#111;'> 102B </span>","children":null,"spread":false},{"title":"requirements.txt <span style='color:#111;'> 39B </span>","children":null,"spread":false},{"title":"examples","children":[{"title":"similarity_demo.py <span style='color:#111;'> 636B </span>","children":null,"spread":false},{"title":"set_stopwords_demo.py <span style='color:#111;'> 665B </span>","children":null,"spread":false},{"title":"my_stopwords.txt <span style='color:#111;'> 30B </span>","children":null,"spread":false},{"title":"base_demo.py <span style='color:#111;'> 904B </span>","children":null,"spread":false}],"spread":true},{"title":"LICENSE <span style='color:#111;'> 11.09KB </span>","children":null,"spread":false},{"title":"text2vec","children":[{"title":"bert","children":[{"title":"train.py <span style='color:#111;'> 554B </span>","children":null,"spread":false},{"title":"tokenization.py <span style='color:#111;'> 11.47KB </span>","children":null,"spread":false},{"title":"modeling.py <span style='color:#111;'> 39.81KB </span>","children":null,"spread":false},{"title":"model.py <span style='color:#111;'> 27.09KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 81B </span>","children":null,"spread":false},{"title":"predict.py <span style='color:#111;'> 707B </span>","children":null,"spread":false},{"title":"extract_feature.py <span style='color:#111;'> 13.84KB </span>","children":null,"spread":false},{"title":"optimization.py <span style='color:#111;'> 6.37KB </span>","children":null,"spread":false},{"title":"graph.py <span style='color:#111;'> 4.35KB </span>","children":null,"spread":false}],"spread":true},{"title":"utils","children":[{"title":"tokenizer.py <span style='color:#111;'> 1.87KB </span>","children":null,"spread":false},{"title":"ngram.py <span style='color:#111;'> 5.71KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"non_masking_layer.py <span style='color:#111;'> 967B </span>","children":null,"spread":false},{"title":"timer.py <span style='color:#111;'> 797B </span>","children":null,"spread":false},{"title":"logger.py <span style='color:#111;'> 1.35KB </span>","children":null,"spread":false},{"title":"get_file.py <span style='color:#111;'> 12.28KB </span>","children":null,"spread":false}],"spread":true},{"title":"similarity.py <span style='color:#111;'> 3.71KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 560B </span>","children":null,"spread":false},{"title":"processors","children":[{"title":"__init__.py <span style='color:#111;'> 81B </span>","children":null,"spread":false},{"title":"base_processor.py <span style='color:#111;'> 5.19KB </span>","children":null,"spread":false},{"title":"default_processor.py <span style='color:#111;'> 3.00KB </span>","children":null,"spread":false}],"spread":true},{"title":"embeddings","children":[{"title":"word_embedding.py <span style='color:#111;'> 9.01KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 81B </span>","children":null,"spread":false},{"title":"bert_embedding.py <span style='color:#111;'> 10.16KB </span>","children":null,"spread":false},{"title":"embedding.py <span style='color:#111;'> 4.89KB </span>","children":null,"spread":false}],"spread":true},{"title":"data","children":[{"title":"stopwords.txt <span style='color:#111;'> 8.92KB </span>","children":null,"spread":false}],"spread":true},{"title":"algorithm","children":[{"title":"__init__.py <span style='color:#111;'> 81B </span>","children":null,"spread":false},{"title":"rank_bm25.py <span style='color:#111;'> 5.48KB </span>","children":null,"spread":false},{"title":"distance.py <span style='color:#111;'> 6.26KB </span>","children":null,"spread":false}],"spread":false},{"title":"version.py <span style='color:#111;'> 84B </span>","children":null,"spread":false},{"title":"vector.py <span style='color:#111;'> 2.74KB </span>","children":null,"spread":false}],"spread":true},{"title":"setup.py <span style='color:#111;'> 1.89KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 11.65KB </span>","children":null,"spread":false},{"title":"docs","children":[{"title":"base1.jpg <span style='color:#111;'> 37.00KB </span>","children":null,"spread":false},{"title":"move1.jpg <span style='color:#111;'> 33.35KB </span>","children":null,"spread":false}],"spread":true},{"title":"tests","children":[{"title":"utils_test.py <span style='color:#111;'> 2.24KB </span>","children":null,"spread":false},{"title":"emb_bert_test.py <span style='color:#111;'> 650B </span>","children":null,"spread":false},{"title":"wmd_demo.py <span style='color:#111;'> 487B </span>","children":null,"spread":false},{"title":"emb_w2v_test.py <span style='color:#111;'> 1.91KB </span>","children":null,"spread":false},{"title":"emb_info_test.py <span style='color:#111;'> 329B </span>","children":null,"spread":false},{"title":"lcqmc_case_test.py <span style='color:#111;'> 4.74KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 80B </span>","children":null,"spread":false},{"title":"issue_test.py <span style='color:#111;'> 1.54KB </span>","children":null,"spread":false},{"title":"console_test.py <span style='color:#111;'> 654B </span>","children":null,"spread":false},{"title":"test_embedding.py <span style='color:#111;'> 1.80KB </span>","children":null,"spread":false},{"title":"longtext_test.py <span style='color:#111;'> 1.99KB </span>","children":null,"spread":false},{"title":"rankbm25_demo.py <span style='color:#111;'> 984B </span>","children":null,"spread":false}],"spread":false},{"title":".gitignore <span style='color:#111;'> 1.17KB </span>","children":null,"spread":false},{"title":"_config.yml <span style='color:#111;'> 26B </span>","children":null,"spread":false}],"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明