上传者: b285795298
|
上传时间: 2019-12-21 19:26:22
|
文件大小: 66B
|
文件类型: txt
文档中包含网盘的地址,数据共319M
NLP方向文本摘要,文本分类,等方向可采纳!
The LCSTS dataset includes two parts:
/DATA:
1. PART I: is the main contents of LCSTS that contains 2,400,591 (short text, summary) pairs. It can be used to train supervised learning models for summary generation.
2. PART II: contains 10,666 human labled (short text, summary) pairs which can be used to train classifier to filter the noises of the PART I.
3. PART III: contains 1,106 (short text, summary) pairs, this part is labled by 3 persons with the same labels. These pairs with score 3,4 and 5 can be used as test set for evaluating summary generation systems.
/Result:
1.sumary.generated.char.context.txt: contains the summary generated by using RNN+context on the character based input.
2.sumary.generated.char.nocontext.txt: contains the summary generated by using RNN+nocontext on the character based input.
3.sumary.generated.word.context.txt: contains the summary generated by using RNN+context on the word based input.
4.sumary.generated.word.nocontext.txt: contains the summary generated by using RNN+nocontext on the word based input.
5.weibo.txt: contains the weibo of the test set.
6.sumary.human: contains the sumaries corresponding to 'weibo.txt' written by human. This part is the test set of the paper.
7. rouge.char_context.txt: the rouge metric on sumary.generated.char.context
8. rouge.char_nocontext.txt:the rouge metric on sumary.generated.char.nocontext
9. rouge.word_context.txt: the rouge metric on sumary.generated.word.context
10. rouge.word_nocontext.txt:the rouge metric on sumary.generated.word.nocontext