基于tensorflow完整的文本分类(NLP)

上传者: 44510615 | 上传时间: 2021-06-21 17:03:08 | 文件大小: 11.84MB | 文件类型: ZIP
### 文本分类 #### 数据预处理 要求训练集和测试集分开存储,对于中文的数据必须先分词,对分词后的词用空格符分开,并且将标签连接到每条数据的尾部,标签和句子用分隔符\分开。具体的如下: * 今天 的 天气 真好\积极 #### 文件结构介绍 * config文件:配置各种模型的配置参数 * data:存放训练集和测试集 * ckpt_model:存放checkpoint模型文件 * data_helpers:提供数据处理的方法 * pb_model:存放pb模型文件 * outputs:存放vocab,word_to_index, label_to_index, 处理后的数据 * models:存放模型代码 * trainers:存放训练代码 * predictors:存放预测代码 #### 训练模型 * python train.py --config_path="config/textcnn_config.json" #### 预测模型 * 预测代码都在predictors/predict.py中,初始化Predictor对象,调用predict方法即可。 #### 模型的配置参数详述 ##### textcnn:基于textcnn的文本分类 * model_name:模型名称 * epochs:全样本迭代次数 * checkpoint_every:迭代多少步保存一次模型文件 * eval_every:迭代多少步验证一次模型 * learning_rate:学习速率 * optimization:优化算法 * embedding_size:embedding层大小 * num_filters:卷积核的数量 * filter_sizes:卷积核的尺寸 * batch_size:批样本大小 * sequence_length:序列长度 * vocab_size:词汇表大小 * num_classes:样本的类别数,二分类时置为1,多分类时置为实际类别数 * keep_prob:保留神经元的比例 * l2_reg_lambda:L2正则化的系数,主要对全连接层的参数正则化 * max_grad_norm:梯度阶段临界值 * train_data:训练数据的存储路径 * eval_data:验证数据的存储路径 * stop_word:停用词表的存储路径 * output_path:输出路径,用来存储vocab,处理后的训练数据,验证数据 * word_vectors_path:词向量的路径 * ckpt_model_path:checkpoint 模型的存储路径 * pb_model_path:pb 模型的存储路径 ##### bilstm:基于bilstm的文本分类 * model_name:模型名称 * epochs:全样本迭代次数 * checkpoint_every:迭代多少步保存一次模型文件 * eval_every:迭代多少步验证一次模型 * learning_rate:学习速率 * optimization:优化算法 * embedding_size:embedding层大小 * hidden_sizes:lstm的隐层大小,列表对象,支持多层lstm,只要在列表中添加相应的层对应的隐层大小 * batch_size:批样本大小 * sequence_length:序列长度 * vocab_size:词汇表大小 * num_classes:样本的类别数,二分类时置为1,多分类时置为实际类别数 * keep_prob:保留神经元的比例 * l2_reg_lambda:L2正则化的系数,主要对全连接层的参数正则化 * max_grad_norm:梯度阶段临界值 * train_data:训练数据的存储路径 * eval_data:验证数据的存储路径 * stop_word:停用词表的存储路径 * output_path:输出路径,用来存储vocab,处理后的训练数据,验证数据 * word_vectors_path:词向量的路径 * ckpt_model_path:checkpoint 模型的存储路径 * pb_model_path:pb 模型的存储路径 ##### bilstm atten:基于bilstm + attention 的文本分类 * model_name:模型名称 * epochs:全样本迭代次数 * checkpoint_every:迭代多少步保存一次模型文件 * eval_every:迭代多少步验证一次模型 * learning_rate:学习速率 * optimization:优化算法 * embedding_size:embedding层大小 * hidd

文件下载

资源详情

[{"title":"( 46 个子文件 11.84MB ) 基于tensorflow完整的文本分类(NLP)","children":[{"title":"text_classifier","children":[{"title":"data_helpers","children":[{"title":"eval_data.py <span style='color:#111;'> 5.45KB </span>","children":null,"spread":false},{"title":"train_data.py <span style='color:#111;'> 9.64KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 103B </span>","children":null,"spread":false},{"title":"__pycache__","children":[{"title":"data_base.cpython-36.pyc <span style='color:#111;'> 4.67KB </span>","children":null,"spread":false},{"title":"train_data.cpython-36.pyc <span style='color:#111;'> 7.90KB </span>","children":null,"spread":false},{"title":"eval_data.cpython-36.pyc <span style='color:#111;'> 5.56KB </span>","children":null,"spread":false},{"title":"__init__.cpython-36.pyc <span style='color:#111;'> 281B </span>","children":null,"spread":false}],"spread":true},{"title":"data_base.py <span style='color:#111;'> 2.96KB </span>","children":null,"spread":false}],"spread":true},{"title":"utils","children":[{"title":"optimizer.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"metrics.py <span style='color:#111;'> 4.33KB </span>","children":null,"spread":false},{"title":"__pycache__","children":[{"title":"metrics.cpython-36.pyc <span style='color:#111;'> 5.13KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"models","children":[{"title":"textcnn.py <span style='color:#111;'> 4.31KB </span>","children":null,"spread":false},{"title":"base.py <span style='color:#111;'> 5.19KB </span>","children":null,"spread":false},{"title":"bilstm.py <span style='color:#111;'> 3.85KB </span>","children":null,"spread":false},{"title":"bilstmatten.py <span style='color:#111;'> 5.49KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 274B </span>","children":null,"spread":false},{"title":"rcnn.py <span style='color:#111;'> 4.90KB </span>","children":null,"spread":false},{"title":"transformer.py <span style='color:#111;'> 11.89KB </span>","children":null,"spread":false},{"title":"__pycache__","children":[{"title":"bilstmatten.cpython-36.pyc <span style='color:#111;'> 2.83KB </span>","children":null,"spread":false},{"title":"textcnn.cpython-36.pyc <span style='color:#111;'> 2.32KB </span>","children":null,"spread":false},{"title":"transformer.cpython-36.pyc <span style='color:#111;'> 5.71KB </span>","children":null,"spread":false},{"title":"__init__.cpython-36.pyc <span style='color:#111;'> 438B </span>","children":null,"spread":false},{"title":"rcnn.cpython-36.pyc <span style='color:#111;'> 2.72KB </span>","children":null,"spread":false},{"title":"bilstm.cpython-36.pyc <span style='color:#111;'> 2.13KB </span>","children":null,"spread":false},{"title":"base.cpython-36.pyc <span style='color:#111;'> 4.63KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"trainers","children":[{"title":"train.py <span style='color:#111;'> 9.64KB </span>","children":null,"spread":false},{"title":"train.sh <span style='color:#111;'> 57B </span>","children":null,"spread":false},{"title":"__pycache__","children":[{"title":"train_base.cpython-36.pyc <span style='color:#111;'> 1.05KB </span>","children":null,"spread":false}],"spread":true},{"title":"train_base.py <span style='color:#111;'> 604B </span>","children":null,"spread":false}],"spread":true},{"title":"predictors","children":[{"title":"test.py <span style='color:#111;'> 962B </span>","children":null,"spread":false},{"title":"predict.py <span style='color:#111;'> 3.60KB </span>","children":null,"spread":false},{"title":"predict_base.py <span style='color:#111;'> 793B </span>","children":null,"spread":false},{"title":"__pycache__","children":[{"title":"predict.cpython-36.pyc <span style='color:#111;'> 3.57KB </span>","children":null,"spread":false},{"title":"predict_base.cpython-36.pyc <span style='color:#111;'> 1.35KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"config","children":[{"title":"rcnn_config.json <span style='color:#111;'> 643B </span>","children":null,"spread":false},{"title":"bilstm_atten_config.json <span style='color:#111;'> 650B </span>","children":null,"spread":false},{"title":"bilstm_config.json <span style='color:#111;'> 627B </span>","children":null,"spread":false},{"title":"transformer_config.json <span style='color:#111;'> 699B </span>","children":null,"spread":false},{"title":"textcnn_config.json <span style='color:#111;'> 661B </span>","children":null,"spread":false}],"spread":true},{"title":"README.md <span style='color:#111;'> 6.78KB </span>","children":null,"spread":false},{"title":"data","children":[{"title":"imdb","children":[{"title":"eval_data.txt <span style='color:#111;'> 1.20MB </span>","children":null,"spread":false},{"title":"test_data.txt <span style='color:#111;'> 1.20MB </span>","children":null,"spread":false},{"title":"train_data.txt <span style='color:#111;'> 30.32MB </span>","children":null,"spread":false}],"spread":true},{"title":"stop_words.txt <span style='color:#111;'> 11.01KB </span>","children":null,"spread":false},{"title":"thucnews","children":[{"title":"test.py <span style='color:#111;'> 0B </span>","children":null,"spread":false}],"spread":true},{"title":"english <span style='color:#111;'> 623B </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true}]

评论信息

  • qq_42275916 :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-11-09
  • bucaixiaosheng :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-11-09
  • wyijie1980 :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-11-05
  • nulinuliff :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-11-04
  • wmfapzj :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-07-26

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明