首页 > 图书中心 >图书详情

文本数据挖掘(英文版)

It is suitable for students, researchers and practitioners interested in text data mining both as a learning text and as a reference book. Professors can readily use it for classes on text data mining or NLP.

作者:宗成庆、夏睿、张家俊
定价:119
印次:1-2
ISBN:9787302590293
出版日期:2021.10.01
印刷日期:2022.05.24

《Text data mining》 offers thorough and detailed introduction to the fundamental theories and methods of text data mining, ranging from pre-processing (for both Chinese and English texts), text representation, feature selection, to text classification and text clustering. Also it presents predominant applications of text data mining, for example, topic model, sentiment analysis and opinion mining, topic detection and tracking, information extraction, and text automatic summarization, etc.

more >

Preface With the rapid development and popularization of Internet and mobile communi- cation technologies, text data mining has attracted much attention. In particular, with the wide use of new technologies such as cloud computing, big data, and deep learning, text mining has begun playing an increasingly important role in many application ?elds, such as opinion mining and medical and ?nancial data analysis, showing broad application prospects.  Although I was supervising graduate students studying text classi?cation and automatic summarization more than ten years ago, I did not have a clear understand- ing of the overall concept of text data mining and only regarded the research topics as speci?c applications of natural language processing. Professor Jiawei Han’s book Data Mining: Concepts and Technology, published by Elsevier, Professor Bing Liu’s Web Data Mining, published by Springer, and other books have greatly bene?ted me. Every time I listen to their talks and discuss these topics with them face to face, I have bene?ted immensely. I was inspired to write this book for the course Text Data Mining, which I was invited to teach to graduates of the University of Chinese Academy of Sciences. At the end of 2015, I accepted the invitation and began to prepare the content design and selection of materials for the course. I had to study a large number of related papers, books, and other materials and began to seriously think of the rich connotation and extension of the term Text Data Mining. After more than a year’s study, I started to compile the courseware. With teaching practice, the outline of the concept has gradually formed.  Rui Xia and Jiajun Zhang, two talented young people, helped me materialize my original writing plan. Rui Xia received his master’s degree in 2007 and was admitted to the Institute of Automation, Chinese Academy of Sciences, and studied for Ph.D. degree under my supervision. He was engaged in sentiment classi?cation and took it as the research topic of his Ph.D. dissertation. After he received his Ph.D. degree in 2011, his interests extended to opinion mining, text clustering and classi?cation, topic modeling, event detection and tracking, and other related topics. He has published a series of in?uential papers in the ?eld of sentiment analysis and opinion mining. He received the ACL 2019 outstanding paper award, and his paper on ensemble learning for sentiment classi?cation has been cited more than III IV Preface 600 times. Jiajun Zhang joined our institute after he graduated from university in 2006 and studied in my group in pursuit of his Ph.D. degree. He mainly engaged in machine translation research, but he performed well in many research topics, such as multilanguage automatic summarization, information extraction, and human– computer dialogue systems. Since 2016, he has been teaching some parts of the course on Natural Language Processing in cooperation with me, such as machine translation, automatic summarization, and text classi?cation, at the University of Chinese Academy of Sciences; this course is very popular with students. With the solid theoretical foundation of these two talents and their keen scienti?c insights, I am grati?ed that many cutting-edge technical methods and research results could be veri?ed and practiced and included in this book.  From early 2016 to June 2019, when the Chinese version of this book was published, it took more than three years. In these three years, most holidays, weekends, and other spare times of ours were devoted to the writing of this book. It was really suffering to endure the numerous modi?cations or even rewriting, but we were also very happy. We started to translate the Chinese version into English in the second half of 2019. Some more recent topics, including BERT (bidirectional encoder representations from transformers), have been added to the English version. As a cross domain of natural language processing and machine learning, text data mining faces the double challenges of the two domains and has broad application to the Internet and equipment for mobile communication. The topics and techniques presented in this book are all the technical foundations needed to develop such practical systems and have attracted much attention in recent years. It is hoped that this book will provide a comprehensive understanding for students, professors, and researchers in related areas. However, I must admit that due to the limitation of the authors’ ability and breadth of knowledge, as well as the lack of time and energy, there must be some omissions or mistakes in the book. We will be very grateful if readers provide criticism, corrections, and any suggestions. Beijing, China Chengqing Zong 20 May 2020

more >
扫描二维码
下载APP了解更多
图书分类全部图书
more >
  • Chengqing Zong is professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences.  He serves as chairs for many prestigious conferences such as ACL-IJCNLP, IJCAI, IJCAI-ECAI, AAAI and COLING, etc., and served as associate editors for prestigious journals such as TALLIP, Machine Translation, etc. He is the President of Asian Federation on Natural Language Processing and a member of International Committee on Computational Linguistics. 

  • 《文本数据挖掘(英文版)》面向文本挖掘任务的实际需求,通过实例从原理上对相关技术的理论方法和实现算法进行阐述,写作风格力求言简意赅,深入浅出,而不过多地涉及实现细节,尽量使读者能够在充分理解基本原理的基础上掌握应用系统的实现方法。
more >
  • Contents

    1 Introduction 1

    1.1 The Basic Concepts 1

    1.2 Main Tasks of Text Data Mining 3

    1.3 Existing Challenges in Text Data Mining 6

    1.4 Overview and Organization of This Book 9

    1.5 Further Reading 12

    2 Data Annotation and Preprocessing 15

    2.1 Data Acquisition 15

    2.2 Data Preprocessing 20

    2.3 Data Annotation 22

    2.4 Basic Tools of NLP 25

    2.4.1 Tokenization and POS Tagging 25

    2.4.2 Syntactic Parser 27

    2.4.3 N-gram Language Model 29

    2.5 Further Reading 30

    3 Text Representation 33

    3.1 Vector Space Model 33

    3.1.1 Basic Concepts 33

    3.1.2 Vector Space Construction 34

    3.1.3 Text Length Normalization 36

    3.1.4 Feature Engineering 37

    3.1.5 Other Text Representatio...

精彩书评more >

标题

评论

版权所有(C)2023 清华大学出版社有限公司 京ICP备10035462号 京公网安备11010802042911号

联系我们 | 网站地图 | 法律声明 | 友情链接 | 盗版举报 | 人才招聘