OReilly.Text.Mining.with.R.A.Tidy.Approach 一本关于用R语言做文本分析的书
2021-06-07 04:29:27 9.64MB R语言
1
The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
2021-06-06 17:06:49 15.33MB 机器学习
1
mining heterogeneous information networks for principles and methodologies hanjiawei 编写的关于异构信息网络的方面的理论书籍
2021-06-04 00:54:50 2.61MB hetero
1
最新版的Data Mining: Practical Machine Learning Tools and Techniques,Weka的配套教材
2021-06-02 13:31:08 10.24MB DataMining WEKA Java
1
星期五包装的数据挖掘 我正在使用来分析这些数据。 它由于 2014 年 8 月开发,并在俄勒冈州波特兰市的发布。 它是一个开源库,用于识别和呈现数据集中的异常值和异常——专为与 Google 文档和库一起使用而设计。 我的用途 分析员工数据,快速了解数据挖掘。 列/变量中值的比例和频率 列/变量之间的相关性 在总体中显着过度索引或索引不足的变量/列 数据集中最典型、最不典型和关键异常值记录 部署 我正在使用 vbs scipt 抓取数据文件夹中的 sql 数据。
2021-05-31 11:03:18 940KB JavaScript
1
Data Mining with R的完整版 2011年amazon排名第一的数据挖掘书
2021-05-19 11:56:07 6.48MB Data Mining R
1
新浪新闻文本分类 语料库重建 本项目的语料来源新浪新闻网,通过spider.py爬虫模块获得全部语料,总计获得10类新闻文本,每一类新闻文本有10条。 采纳新浪新闻网的一个api获取新闻文本,api的url为 使用进度池并发执行爬虫,加快抓取速度。 数据预处理 本项目的数据预处理包括:分词处理,去噪,向量化,由stopwords.py模块,text2term.py模块,vectorizer.py模块实现。 本项目借助第三方库解霸完成文本的分词处理。 通过停用词表移除中文停用词,通过正则表达式消除数字(中文数字&阿拉伯数字)。 filter_pattern = re . compile ( ur'[-+]?[\w\d]+|零|一|二|三|四|五|六|七|八|九|十|百|千|万|亿' ) 使用进程池并发执行数据的分词和去噪,加快数据预处理的过程。 把数据集1:1划分为训练集和测试集,各50w篇文档。 通过scikit-learn提供的CountVectorizer类完成矢量化,得到训练集和测试集两个文本的特征矩阵,矩阵类型为稀疏矩阵。 移除文档中文档频率小于0.1%的特征,这些特征我们认
2021-05-14 10:13:12 98KB data-mining text-classification svm scikit-learn
1
INSE6180 使用3个研究论文的数据挖掘算法实现。 该项目使用所有上述算法对从IMDb数据库获得的数据进行ML分析。 这些算法(朴素贝叶斯算法,决策树算法和支持向量机)在不同的数据集上效果最佳,但为了使它们更公平,已使用了新的IMDb数据库。 首先,对数据进行清洗,预处理,修剪然后整合,以便为分类器提供可能的最佳有意义数据。 考虑到要进行分析,分类器从头开始用Python语言编写了脚本。 最后,在已开发的分类器中进行分析,并进行比较研究。 队友:Gursimran Singh –40080981 Ufuoma Ubor-40072909 Darshan Dhananjay –40079241 Ashmeet Singh -40070369 V. Subramaniyaswamy,MV Vaibhav,RV Prasad和R. Logesh,“使用多元回归和SVM预测电影票房成功
2021-05-11 20:09:35 2.63MB Python
1
The main goal of this book is to introduce the reader to the use of R as a tool for data mining. R is a freely downloadable1 language and environment for statistical computing and graphics. Its capabilities and the large set of available add-on packages make this tool an excellent alternative to many existing (and expensive!) data mining tools.
2021-04-29 09:23:19 44.09MB R-Language Statistics
1
Claymore-Dual-Miner:下载以太坊矿工(2020年更新)
2021-04-24 00:16:24 4.54MB gpu ethereum start-mining claymore-miner
1