train.src (训练集的输入(短文本)) train.tgt (训练集的输出(摘要)) test.src (测试集的输入(短文本)) test.tgt (测试集的输出(摘要)) vaild.src (验证集的输入(短文本)) vaild.tgt (验证集的输出(摘要))
2023-02-28 22:34:46 230.78MB nlp
1
完整的LCSTS数据集,由于CSDN上传大小限制,里面是我的网盘下载链接。原文件内容是类似于XML格式的,关于原文件的解析与处理可以查看我的这篇博文:https://blog.csdn.net/u012495579/article/details/103697824
2022-02-26 22:19:51 75B LCSTS 摘要 自动摘要 NLP
1
基于Pytorch的中文文本摘要生成 开这个仓库的主要目的是记录一下自己实验过程和数据。 参考文本摘要领域大佬写的两篇论文: and ,然后参考另一位大佬修改的代码. 另外,在这里还是要感谢一下。这里的所有内容基本上没做什么修改(python读取文件的时候出现编码问题,我的猜想是大佬用的mac系统,类linux,所以对编码不敏感,我用windows的话就报错了。),最多修改了一下超参数,刚开始在自己windows笔记本上跑的话,确实有点吃力,设置的batch_size=10,好像后来还直接报cuda错误,我的猜想就是设置过大了,显存承受不了。说多了。直接看训练和测试效果吧。 实验结果 指标 验证集 测试集 ROUGE-1 34.06 31.87 ROUGE-2 16.46 15.47 ROUGE-L 33.83 30.93 0. 数据预处理 下载(提取码:g8c6 ),下载完之后放在根目
2021-06-27 09:50:04 8.84MB Python
1
完整版LCSTS数据集,由于文件大小限制,里面是一个txt,包含了下载链接
2019-12-21 20:16:48 172B LCSTS 文本摘要 自动摘要 数据集
1
文档中包含网盘的地址,数据共319M NLP方向文本摘要,文本分类,等方向可采纳! The LCSTS dataset includes two parts: /DATA: 1. PART I: is the main contents of LCSTS that contains 2,400,591 (short text, summary) pairs. It can be used to train supervised learning models for summary generation. 2. PART II: contains 10,666 human labled (short text, summary) pairs which can be used to train classifier to filter the noises of the PART I. 3. PART III: contains 1,106 (short text, summary) pairs, this part is labled by 3 persons with the same labels. These pairs with score 3,4 and 5 can be used as test set for evaluating summary generation systems. /Result: 1.sumary.generated.char.context.txt: contains the summary generated by using RNN+context on the character based input. 2.sumary.generated.char.nocontext.txt: contains the summary generated by using RNN+nocontext on the character based input. 3.sumary.generated.word.context.txt: contains the summary generated by using RNN+context on the word based input. 4.sumary.generated.word.nocontext.txt: contains the summary generated by using RNN+nocontext on the word based input. 5.weibo.txt: contains the weibo of the test set. 6.sumary.human: contains the sumaries corresponding to 'weibo.txt' written by human. This part is the test set of the paper. 7. rouge.char_context.txt: the rouge metric on sumary.generated.char.context 8. rouge.char_nocontext.txt:the rouge metric on sumary.generated.char.nocontext 9. rouge.word_context.txt: the rouge metric on sumary.generated.word.context 10. rouge.word_nocontext.txt:the rouge metric on sumary.generated.word.nocontext
2019-12-21 19:26:22 66B nlp
1