实验 词汇分析 1)使用任意分词方法编写算法实现汉语自动分词程序; 2)编写直接调用分词工具(jieba分词,中科院分词等)进行分词的程序; 3)用两种方法,给出至少50个句子的分词结果(以附件形式); 4)分别计算出两种分词结果的正确率,给出计算依据。
2021-06-07 14:07:10 33.55MB 自然语言处理 中文分词 jieba分词
1
实验 句法分析 1)使用至少两种依存句法分析工具(HanLP,Stanford CoreNLP 等)编写句法 程序; 2)给出至少20 个句子的分析结果,以结构化方式存储(json 或xml); 3)分别计算出不同方法结果的正确率,并对比不同方法的差异。 4)对结果进行可视化(选做)
2021-06-07 14:07:10 495.22MB 自然语言处理 standfordcorenlp hanlp
1
这是缩减版数据集,4000多条。 工单分类博文见 https://blog.csdn.net/kobeyu652453/article/details/106551131
2021-06-07 13:03:09 519KB NLP 数据集
java版飞机大战源码 awesome-chinese-nlp A curated list of resources for NLP (Natural Language Processing) for Chinese 中文自然语言处理相关资料 图片来自复旦大学邱锡鹏教授 Contents 列表 1. 2. 3. 4. 5. Chinese NLP Toolkits 中文NLP工具 Toolkits 综合NLP工具包 by 清华 (C++/Java/Python) by 中科院 (Java) by 哈工大 (C++) LTP的python封装 by 复旦 (Java) by 百度 Baidu's open-source lexical analysis tool for Chinese, including word segmentation, part-of-speech tagging & named entity recognition. (Java) (Python) 一款轻量级的 NLP 处理套件。 (Python) Python library for processing
2021-06-07 12:02:51 87KB 系统开源
1
Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities. However, an open challenge in this area is developing techniques that can go beyond simple edge prediction and handle more complex logical queries, which might involve multiple unobserved edges, entities, and variables. For instance, given an incomplete biological knowledge graph, we might want to predict what drugs are likely to target proteins involved with both diseases X and Y?—a query that requires reasoning about all possible proteins that might interact with diseases X and Y. Here we introduce a framework to efficiently make predictions about conjunctive logical queries—a flexible but tractable subset of first-order logic—on incomplete knowledge graphs. In our approach, we embed graph nodes in a low-dimensional space and represent logical operators as learned geometric operations (e.g., translation, rotation) in this embedding space. By performing logical operations within a low-dimensional embedding space, our approach achieves a time complexity that is linear in the number of query variables, compared to the exponential complexity required by a naive enumeration-based approach. We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.
2021-06-07 11:07:52 1.3MB NLP
1
简历分析仪 使用NLP根据技能和经验分析使用NLP进行项目分配的员工的简历/简历。
2021-06-06 21:30:20 474KB cv analyzer project-management resume-analysis
1
# 中文命名实体识别 基于条件随机场(Conditional Random Field, CRF)的NER模型 ## 数据集 数据集用的是论文ACL 2018[Chinese NER using Lattice LSTM](https://github.com/jiesutd/LatticeLSTM)中收集的简历数据,数据的格式如下,它的每一行由一个字及其对应的标注组成,标注集采用BIOES,句子之间用一个空行隔开。 ``` 美 B-LOC 国 E-LOC 的 O 华 B-PER 莱 I-PER 士 E-PER 我 O 跟 O 他 O 谈 O 笑 O 风 O 生 O ``` 该数据集就位于项目目录下的`data`文件夹里。 ## 运行结果 具体的输出可以查看`output.txt`文件。 ## 环境 首先安装依赖项: pip3 install -r requirement.txt 安装完毕之后,直接使用 python3 main.py > output.txt 即可训练、评估以及测试模型,评估模型将会打印出模型的精确率、召回率、F1分数值以及混淆矩阵。
2021-06-06 16:32:00 609KB NER NLP CRF
1
ruijin_round2:瑞金医院MMC人工智能辅助建立知识图谱大赛复赛
1
python3.6环境centos编译好的pyltp文件,pip install直接安装,自己编译需要安装很多的库
2021-06-04 21:08:59 30.28MB pyltp nlp python3.6
1
大规模新闻文本分类数据集,有多个领域,按文件夹摆放,不仅可以用来做文本分类实验,数据不少甚至可以用来做BERT预训练
2021-06-04 21:06:29 1.45GB NLP
1