HarvestText:文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法

上传者: 42101641 | 上传时间: 2021-07-07 15:40:33 | 文件大小: 2.02MB | 文件类型: ZIP
HarvestText Sow with little data seed, harvest much from a text field. 播撒几多种子词,收获万千领域实 在和上同步。如果在Github上浏览/下载速度慢的话可以转到上操作。 用途 HarvestText是一个专注无(弱)监督方法,能够整合领域知识(如类型,别名)对特定领域文本进行简单高效地处理和分析的库。适用于许多文本预处理和初步探索性分析任务,在小说分析,网络文本,专业文献等领域都有潜在应用价值。 使用案例: (实体分词,文本摘要,关系网络等) (实体分词,情感分析,新词发现[辅助绰号识别]等) 相关文章: 【注:本库仅完成实体分词和情感分析,可视化使用matplotlib】 (命名实体识别,依存句法分析,简易问答系统) 本README包含各个功能的典型例子,部分函数的详细用法可在文档中找到: 具体功能如下: 基本处理

文件下载

资源详情

[{"title":"( 89 个子文件 2.02MB ) HarvestText:文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法","children":[{"title":"HarvestText-master","children":[{"title":"images","children":[{"title":"word_ego_net.jpg <span style='color:#111;'> 32.18KB </span>","children":null,"spread":false}],"spread":true},{"title":"docs","children":[{"title":"conf.py <span style='color:#111;'> 2.42KB </span>","children":null,"spread":false},{"title":"make.bat <span style='color:#111;'> 760B </span>","children":null,"spread":false},{"title":"modules.rst <span style='color:#111;'> 70B </span>","children":null,"spread":false},{"title":"_build","children":[{"title":"doctrees","children":[{"title":"environment.pickle <span style='color:#111;'> 122.24KB </span>","children":null,"spread":false},{"title":"harvesttext.doctree <span style='color:#111;'> 250.97KB </span>","children":null,"spread":false},{"title":"modules.doctree <span style='color:#111;'> 2.52KB </span>","children":null,"spread":false},{"title":"index.doctree <span style='color:#111;'> 5.36KB </span>","children":null,"spread":false}],"spread":true},{"title":"html","children":[{"title":".nojekyll <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"objects.inv <span style='color:#111;'> 1.02KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"harvesttext.rst <span style='color:#111;'> 1.55KB </span>","children":null,"spread":false},{"title":"Makefile <span style='color:#111;'> 634B </span>","children":null,"spread":false},{"title":"index.rst <span style='color:#111;'> 601B </span>","children":null,"spread":false}],"spread":true},{"title":"harvesttext","children":[{"title":"ent_network.py <span style='color:#111;'> 5.23KB </span>","children":null,"spread":false},{"title":"summary.py <span style='color:#111;'> 4.08KB </span>","children":null,"spread":false},{"title":"ent_retrieve.py <span style='color:#111;'> 1.37KB </span>","children":null,"spread":false},{"title":"resources","children":[{"title":"pinyin_adjlist.json <span style='color:#111;'> 75.19KB </span>","children":null,"spread":false},{"title":"sanguo_docs.json <span style='color:#111;'> 3.41MB </span>","children":null,"spread":false},{"title":"THUOCL.json <span style='color:#111;'> 2.50MB </span>","children":null,"spread":false},{"title":"LH_senti_lexicon.json <span style='color:#111;'> 82.27KB </span>","children":null,"spread":false},{"title":"qh_sent_dict.json <span style='color:#111;'> 170.60KB </span>","children":null,"spread":false},{"title":"sanguo_entity_dict.json <span style='color:#111;'> 364.55KB </span>","children":null,"spread":false},{"title":"bd_stopwords.json <span style='color:#111;'> 17.75KB </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.py <span style='color:#111;'> 426B </span>","children":null,"spread":false},{"title":"word_discover.py <span style='color:#111;'> 16.72KB </span>","children":null,"spread":false},{"title":"download_utils.py <span style='color:#111;'> 5.01KB </span>","children":null,"spread":false},{"title":"harvesttext.py <span style='color:#111;'> 38.77KB </span>","children":null,"spread":false},{"title":"resources.py <span style='color:#111;'> 5.30KB </span>","children":null,"spread":false},{"title":"sentiment.py <span style='color:#111;'> 2.29KB </span>","children":null,"spread":false},{"title":"algorithms","children":[{"title":"utils.py <span style='color:#111;'> 577B </span>","children":null,"spread":false},{"title":"texttile.py <span style='color:#111;'> 3.17KB </span>","children":null,"spread":false},{"title":"match_patterns.py <span style='color:#111;'> 641B </span>","children":null,"spread":false},{"title":"keyword.py <span style='color:#111;'> 1.10KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"sent_dict.py <span style='color:#111;'> 5.25KB </span>","children":null,"spread":false},{"title":"entity_discoverer.py <span style='color:#111;'> 9.61KB </span>","children":null,"spread":false},{"title":"word_discoverer.py <span style='color:#111;'> 9.98KB </span>","children":null,"spread":false}],"spread":false},{"title":"parsing.py <span style='color:#111;'> 9.72KB </span>","children":null,"spread":false}],"spread":false},{"title":".github","children":[{"title":"ISSUE_TEMPLATE","children":[{"title":"bug_report.md <span style='color:#111;'> 357B </span>","children":null,"spread":false},{"title":"-----.md <span style='color:#111;'> 234B </span>","children":null,"spread":false}],"spread":true},{"title":"workflows","children":[{"title":"pythonpublish.yml <span style='color:#111;'> 793B </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"tests","children":[{"title":"test_functionality.py <span style='color:#111;'> 15.25KB </span>","children":null,"spread":false},{"title":"test_using_typed_words_current <span style='color:#111;'> 644B </span>","children":null,"spread":false},{"title":"test_named_entity_recognition_current <span style='color:#111;'> 79B </span>","children":null,"spread":false},{"title":"test_entity_segmentation_current <span style='color:#111;'> 1.05KB </span>","children":null,"spread":false},{"title":"test_entity_error_check_expected <span style='color:#111;'> 385B </span>","children":null,"spread":false},{"title":"test_build_word_ego_graph_expected <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"test_entity_network_expected <span style='color:#111;'> 246B </span>","children":null,"spread":false},{"title":"test_new_word_register_expected <span style='color:#111;'> 95B </span>","children":null,"spread":false},{"title":"test_load_resources_current <span style='color:#111;'> 429B </span>","children":null,"spread":false},{"title":"test_named_entity_recognition_expected <span style='color:#111;'> 79B </span>","children":null,"spread":false},{"title":"test_text_summarization_expected <span style='color:#111;'> 97B </span>","children":null,"spread":false},{"title":"ht_model1 <span style='color:#111;'> 147.92KB </span>","children":null,"spread":false},{"title":"test_save_load_clear_expected <span style='color:#111;'> 659B </span>","children":null,"spread":false},{"title":"test_using_typed_words_expected <span style='color:#111;'> 644B </span>","children":null,"spread":false},{"title":"test_linking_strategy_current <span style='color:#111;'> 234B </span>","children":null,"spread":false},{"title":"test_entity_network_current <span style='color:#111;'> 246B </span>","children":null,"spread":false},{"title":"test_linking_strategy_expected <span style='color:#111;'> 234B </span>","children":null,"spread":false},{"title":"test_find_with_rules_expected <span style='color:#111;'> 623B </span>","children":null,"spread":false},{"title":"test_sentiment_dict_expected <span style='color:#111;'> 140B </span>","children":null,"spread":false},{"title":"test_depend_parse_expected <span style='color:#111;'> 451B </span>","children":null,"spread":false},{"title":"test_entity_segmentation_expected <span style='color:#111;'> 1.05KB </span>","children":null,"spread":false},{"title":"test_entity_search_current <span style='color:#111;'> 367B </span>","children":null,"spread":false},{"title":"test_hard_text_cleaning.py <span style='color:#111;'> 2.12KB </span>","children":null,"spread":false},{"title":"test_punct_type_exception.py <span style='color:#111;'> 603B </span>","children":null,"spread":false},{"title":"test_depend_parse_current <span style='color:#111;'> 451B </span>","children":null,"spread":false},{"title":"test_load_resources_expected <span style='color:#111;'> 414B </span>","children":null,"spread":false},{"title":"test_new_word_discover_current <span style='color:#111;'> 11B </span>","children":null,"spread":false},{"title":"test_entity_search_expected <span style='color:#111;'> 367B </span>","children":null,"spread":false},{"title":"test_new_word_discover_expected <span style='color:#111;'> 11B </span>","children":null,"spread":false},{"title":"test_entity_error_check_current <span style='color:#111;'> 385B </span>","children":null,"spread":false},{"title":"test_sentiment_dict_current <span style='color:#111;'> 140B </span>","children":null,"spread":false},{"title":"test_build_word_ego_graph_current <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"test_find_with_rules_current <span style='color:#111;'> 623B </span>","children":null,"spread":false},{"title":"test_new_word_register_current <span style='color:#111;'> 95B </span>","children":null,"spread":false},{"title":"test_save_load_clear_current <span style='color:#111;'> 659B </span>","children":null,"spread":false},{"title":"test_text_summarization_current <span style='color:#111;'> 97B </span>","children":null,"spread":false}],"spread":false},{"title":"LICENSE <span style='color:#111;'> 1.04KB </span>","children":null,"spread":false},{"title":"requirements.txt <span style='color:#111;'> 147B </span>","children":null,"spread":false},{"title":"examples","children":[{"title":"basics.py <span style='color:#111;'> 22.89KB </span>","children":null,"spread":false},{"title":"naiveKGQA.py <span style='color:#111;'> 10.93KB </span>","children":null,"spread":false},{"title":"kwd_benchmark","children":[{"title":"CSL.ipynb <span style='color:#111;'> 18.16KB </span>","children":null,"spread":false}],"spread":true},{"title":"entity_discover","children":[{"title":"entity_info_v2.txt <span style='color:#111;'> 255B </span>","children":null,"spread":false},{"title":"entity_info_v1.txt <span style='color:#111;'> 143.17KB </span>","children":null,"spread":false},{"title":"entity_discover.ipynb <span style='color:#111;'> 53.90KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"setup.py <span style='color:#111;'> 1.06KB </span>","children":null,"spread":false},{"title":".gitignore <span style='color:#111;'> 152B </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 42.32KB </span>","children":null,"spread":false},{"title":".gitattributes <span style='color:#111;'> 66B </span>","children":null,"spread":false}],"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明