Preface
XML is short for eXtensible Markup Language, whose purpose is to aid information systems in sharing structured data, especially via the Internet, to encode documents, and to serialize data. The properties that XML inherited make it the widely popular standard of representing and exchanging data.
When working with those XML data, there are (loosely speaking) three different functions that need to be performed: adding information to the repository, searching and retrieving information from the repository, and updating information from the repository. We focus on all three parts of functions. There are two decades since the beginning of XML, and this .eld has expanded tremendously and is still expanding. Thus, no book on XML can be comprehensive now¡ªcertainly this one is not. We just present which we could present clearly.
What Is the Uniqueness of This Book?
This book aims to provide an understanding of principles and techniques on XML query processing and XML keyword search. For this purpose, progress has been made in the following aspects:
Firstly, we give a brief introduction of XML, including the emergence of XML database, XML data model, and searching and querying XML data. In order to facilitate query process over XML data that conforms to an ordered tree-structure data model ef.ciently, a number of labeling schemes for XML data have been proposed.
Secondly, we proposed several XML path indexes. Over the past decade, XML has become a commonly used format for storing and exchanging data in a wide variety of systems. Due to this widespread use, the problem of effectively and ef.ciently managing XML collections has attracted signi.cant attention. Without a structural summary and an ef.cient index, query processing can be quite inef.cient
vi Preface
due to an exhaustive traversal on XML data. To overcome the inef.ciency, several path indexes have been proposed, such as pre.x scheme, extended Dewey ID, and CDBS.
Thirdly, answering twig queries ef.ciently is important in XML tree pattern processing. In order to perform ef.cient processing, we introduce two kinds of join algorithms, both of which play signi.cant roles. Also, solutions about how to speed up query processing and how to reduce the intermediate results to save spaces are present.
Fourthly, we show a set of holistic algorithms to ef.ciently process the extended XML tree patterns. Previous algorithms focus on XML tree pattern queries with only P-C and A-D relationships. Little work has been done on extended XML tree pattern queries which contain wildcards, negation function, and order restriction, all of which are frequently used in XML query languages. The holistic algorithm will make it more completed.
Fifthly, we study XML keyword search semantics algorithms and ranking strategy. We present XML keyword search semantics such as SLCA, VLCA, and MLCEA, which is useful and meaningful for keyword search. Based on some of the semantics, we present XML keyword search algorithms such as DIL Query Processing Algorithm. In addition, we introduce the XML keyword search ranking strategy; we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges.
Sixthly, we introduce the problem of XML keyword query re.nement and offer a novel content-aware XML keyword query re.nement framework. We also introduce LCRA, which provides a concise interface where user can explicitly specify their search concern¡ªpublications (default) or authors.
Lastly, we present several future works, such as graphical XML data processing, complex XML pattern matching, and MapReduce-based XML query processing.
Jiaheng Lu
Acknowledgement
I would like to express my gratitude to Prof. Tok Wang Ling in National University of Singapore, for his support, advice, patience, and encouragement. He is my PhD advisor and has taught me innumerable lessons and insights on the workings of academic research in general.
My thanks also go to Prof. Mong-Li Lee, Prof. Chee Yong Chan, and Prof. Anthony K H. Tung in National University of Singapore, who provided valuable feedback and suggestions to my idea.
I shall thank my colleagues Prof. Shan Wang, Prof. Xiaoyong Du, Prof. Xiaofeng Meng, Prof. Hong Chen, Prof. Cuiping Li, and Prof. Xuan Zhou in Renmin University of China, who give me tremendous supports and advices on my research on XML data processing.
My thanks also go to my friends Ting Chen, Yabing Chen, Qi He, Changqing Li, Huanzhang Liu, Wei Ni, Cong Sun, Tian Yu, Zhifeng Bao, and Huayu Wu in National University of Singapore. They have contributed to many interesting and good spirited discussions related to this research. They also provided tremendous mental support to me when I got frustrated at times.
My thanks also go to my students Chunbin Lin, Caiyun Yao, Junwei Pan, Haiyong Wang, Si Chen, Siming Yang, and Xiaozhen Huo in Renmin University of China. My thinking on XML was shaped by a long process of exciting and inspiring interactions with my students. I am immensely grateful to all of them.
I would also like to thank Dr. Jirong Wen, Microsoft Research Asia; Prof. Rui Zhang, Melbourne University; Prof. Jianzhong Li, Harbin Institute of Technology; Prof. Bin Cui, Peking University; Prof. Jianhua Feng and Prof. Guoliang Li, Tsinghua University; and Mr. Hanyou Wang and Ms. Hui Xue, Tsinghua University Press, for their recommendation and valuable suggestions.
Last, but not least, I would like to thank my wife Chun Pu for her understanding and love during the past few years. Her support and encouragement was in the end what made this book possible. My parents and parents-in-law receive my deepest gratitude and love for their dedication and the many years of support during my studies.
Acknowledgement
My research is partially supported by National 863 project (2009AA01Z133); National Science Foundation, China (61170011); Beijing National Science Foundation (109004); and Research Funds of Renmin University of China (No: 11XNJ003).
Jiaheng Lu