SIFRank_zh:基于预训练模型的中文关键词提取方法(论文SIFRank

上传者: 42131628 | 上传时间: 2022-04-25 22:00:49 | 文件大小: 2.4MB | 文件类型: ZIP
SIFRank_zh 这是我们论文的相关代码原文是在对英文关键短语进行抽取,这里迁移到中文上,部分管道进行了改动英文原版在。。 版本介绍 2020/03 / 03——最初最初版本本版本中只包含了最基本的功能,部分细节还有待优化和扩展。 核心算法 预训练模型ELMo +句向量模型SIF 词向量ELMo优势:1)通过大规模预训练,较早的TFIDF,TextRank等基于统计和图的具有更多的语义信息; 2)ELMo是动态的,可以改善一词多义问题; 3)ELMo通过Char -CNN编码,对生隐词非常友好; 4)不同层的ELMo可以捕捉不同层次的信息 句子矢量SIF优势:1)根据词频对词向量进行平滑逆频率变换,能更好地捕捉句子的中心话题; 2)更好地过滤通用词 最终关键焦点识别 首先对句子进行分词和词性标注,再利用正则表达式确定确定名词短语(例如:形容词+名词),将名词作为前缀关键字 最终关键利率

文件下载

资源详情

[{"title":"( 40 个子文件 2.4MB ) SIFRank_zh:基于预训练模型的中文关键词提取方法(论文SIFRank","children":[{"title":"SIFRank_zh-master","children":[{"title":"auxiliary_data","children":[{"title":"chinese_stopwords.txt <span style='color:#111;'> 4.54KB </span>","children":null,"spread":false},{"title":"zhs.model","children":[{"title":"cnn_0_100_512_4096_sample.json <span style='color:#111;'> 473B </span>","children":null,"spread":false},{"title":"word.dic <span style='color:#111;'> 970.10KB </span>","children":null,"spread":false},{"title":"config","children":[{"title":"cnn_0_100_512_4096_sample.json <span style='color:#111;'> 473B </span>","children":null,"spread":false},{"title":"cnn_50_100_512_4096_sample.json <span style='color:#111;'> 473B </span>","children":null,"spread":false}],"spread":true},{"title":"cnn_50_100_512_4096_sample.json <span style='color:#111;'> 473B </span>","children":null,"spread":false},{"title":"char.dic <span style='color:#111;'> 52.94KB </span>","children":null,"spread":false},{"title":"config.json <span style='color:#111;'> 468B </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.py <span style='color:#111;'> 91B </span>","children":null,"spread":false},{"title":"dict.txt <span style='color:#111;'> 4.84MB </span>","children":null,"spread":false},{"title":"user_dict.txt <span style='color:#111;'> 62B </span>","children":null,"spread":false}],"spread":true},{"title":"others","children":[{"title":"elmo.py <span style='color:#111;'> 7.86KB </span>","children":null,"spread":false}],"spread":true},{"title":"data","children":[{"title":"__init__.py <span style='color:#111;'> 91B </span>","children":null,"spread":false},{"title":"test.01.txt <span style='color:#111;'> 770B </span>","children":null,"spread":false}],"spread":true},{"title":"model","children":[{"title":"extractor.py <span style='color:#111;'> 2.06KB </span>","children":null,"spread":false},{"title":"__pycache__","children":[{"title":"method.cpython-36.pyc <span style='color:#111;'> 5.75KB </span>","children":null,"spread":false},{"title":"extractor.cpython-36.pyc <span style='color:#111;'> 1.55KB </span>","children":null,"spread":false},{"title":"input_representation.cpython-36.pyc <span style='color:#111;'> 1.55KB </span>","children":null,"spread":false},{"title":"__init__.cpython-36.pyc <span style='color:#111;'> 124B </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.py <span style='color:#111;'> 91B </span>","children":null,"spread":false},{"title":"method.py <span style='color:#111;'> 6.72KB </span>","children":null,"spread":false},{"title":"input_representation.py <span style='color:#111;'> 1.79KB </span>","children":null,"spread":false}],"spread":true},{"title":"test","children":[{"title":"test.py <span style='color:#111;'> 2.06KB </span>","children":null,"spread":false}],"spread":true},{"title":".idea","children":[{"title":"codeStyles","children":[{"title":"Project.xml <span style='color:#111;'> 210B </span>","children":null,"spread":false},{"title":"codeStyleConfig.xml <span style='color:#111;'> 153B </span>","children":null,"spread":false}],"spread":true},{"title":"misc.xml <span style='color:#111;'> 197B </span>","children":null,"spread":false},{"title":"vcs.xml <span style='color:#111;'> 180B </span>","children":null,"spread":false},{"title":"modules.xml <span style='color:#111;'> 266B </span>","children":null,"spread":false},{"title":"dictionaries","children":[{"title":"sunyi.xml <span style='color:#111;'> 84B </span>","children":null,"spread":false}],"spread":true},{"title":"SIFRank.iml <span style='color:#111;'> 453B </span>","children":null,"spread":false},{"title":"workspace.xml <span style='color:#111;'> 32.61KB </span>","children":null,"spread":false}],"spread":true},{"title":"README.md <span style='color:#111;'> 8.21KB </span>","children":null,"spread":false},{"title":"util","children":[{"title":"__pycache__","children":[{"title":"fileIO.cpython-36.pyc <span style='color:#111;'> 4.24KB </span>","children":null,"spread":false}],"spread":true},{"title":"fileIO.py <span style='color:#111;'> 416B </span>","children":null,"spread":false}],"spread":true},{"title":"embeddings","children":[{"title":"__pycache__","children":[{"title":"sent_emb_sif.cpython-36.pyc <span style='color:#111;'> 7.71KB </span>","children":null,"spread":false},{"title":"__init__.cpython-36.pyc <span style='color:#111;'> 129B </span>","children":null,"spread":false},{"title":"word_emb_elmo.cpython-36.pyc <span style='color:#111;'> 1.15KB </span>","children":null,"spread":false}],"spread":true},{"title":"sent_emb_sif.py <span style='color:#111;'> 12.15KB </span>","children":null,"spread":false},{"title":"word_emb_elmo.py <span style='color:#111;'> 1.37KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 90B </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明