This book emerged from the course Superscalar Processor Design, which has been taught at Carnegie Mellon University since 1995. Superscalar Processor Design is a mezzanine course targeting seniors and first-year graduate students. Quite a few of the more aggressive juniors have taken the course in the spring semester of their jun-ior year. The prerequisite to this course is the Introduction to Computer Architecture course. The objectives for the Superscalar Processor Design course include: (1) to teach modern processor design skills at the microarchitecture level of abstraction;
(2) to cover current microarchitecture techniques for achieving high performance via the exploitation of instruction-level parallelism (ILP); and (3) to impart insights and hands-on experience for the effective design of contemporary high-performance microprocessors for mobile, desktop, and server markets. In addition to covering the contents of this book, the course contains a project component that involves the microarchitectural design of a future-generation superscalar microprocessor.
During the decade of the 1990s many microarchitectural techniques for increas-ing clock frequency and harvesting more ILP to achieve better processor perfor-mance have been proposed and implemented in real machines. This book is an attempt to codify this large body of knowledge in a systematic way. These techniques include deep pipelining, aggressive branch prediction, dynamic register renaming, multiple instruction dispatching and issuing, out-of-order execution, and speculative load/store processing. Hundreds of research papers have been published since the early 1990s, and many of the research ideas have become reality in commercial superscalar microprocessors. In this book, the numerous techniques are organized and presented within a clear framework that facilitates ease of comprehension. The foundational principles that underlie the plethora of techniques are highlighted.
While the contents of this book would generally be viewed as graduate-level material, the book is intentionally written in a way that would be very accessible to undergraduate students. Significant effort has been spent in making seemingly complex techniques to appear as quite straightforward through appropriate abstrac-tion and hiding of details. The priority is to convey clearly the key concepts and fundamental principles, giving just enough details to ensure understanding of im-plementation issues without massive dumping of information and quantitative data. The hope is that this body of knowledge can become widely possessed by not just microarchitects and processor designers but by most B.S. and M.S. students with interests in computer systems and microprocessor design.
Here is a brief summary of the chapters.
Chapter 1: Processor Design
This chapter introduces the art of processor design, the instruction set architecture (ISA) as the specification of the processor, and the microarchitecture as the imple-mentation of the processor. The dynamic/static interface that separates compile-time
PREFACE xi
software and run-time hardware is defined and discussed. The goal of this chapter is not to revisit in depth the traditional issues regarding ISA design, but to erect the proper framework for understanding modern processor design.
Chapter 2: Pipelined Processors
This chapter focuses on the concept of pipelining, discusses instruction pipeline design, and presents the performance benefits of pipelining. Pipelining is usually in-troduced in the first computer architecture course. Pipelining provides the foundation for modern superscalar techniques and is presented in this chapter in a fresh and unique way. We intentionally avoid the massive dumping of bar charts and graphs; instead, we focus on distilling the foundational principles of instruction pipelining.
Chapter 3: Memory and I/O Systems
This chapter provides a larger context for the remainder of the book by including a thorough grounding in the principles and mechanisms of modern memory and I/O systems. Topics covered include memory hierarchies, caching, main memory de-sign, virtual memory architecture, common input/output devices, processor-I/O in-teraction, and bus design and organization.
Chapter 4: Superscalar Organization
This chapter introduces the main concepts and the overall organization of superscalar processors. It provides a ¡°big picture¡± view for the reader that leads smoothly into the detailed discussions in the next chapters on specific superscalar techniques for achiev-ing performance. This chapter highlights only the key features of superscalar processor organizations. Chapter 7 provides a detailed survey of features found in real machines.
Chapter 5: Superscalar Techniques
This chapter is the heart of this book and presents all the major microarchitecture tech-niques for designing contemporary superscalar processors for achieving high perfor-mance. It classifies and presents specific techniques for enhancing instruction flow, register data flow, and memory data flow. This chapter attempts to organize a plethora of techniques into a systematic framework that facilitates ease of comprehension.
Chapter 6: The PowerPC 620
This chapter presents a detailed analysis of the PowerPC 620 microarchitecture and uses it as a case study to examine many of the issues and design tradeoffs intro-duced in the previous chapters. This chapter contains extensive performance data of an aggressive out-of-order design.
Chapter 7: Intel¡¯s P6 Microarchitecture
This is a case study chapter on probably the most commercially successful contempo-rary superscalar microarchitecture. It is written by the Intel P6 design team led by Bob Colwell and presents in depth the P6 microarchitecture that facilitated the implemen-tation of the Pentium Pro, Pentium II, and Pentium III microprocessors. This chapter offers the readers an opportunity to peek into the mindset of a top-notch design team.
xii MODERN PROCESSOR DESIGN
Chapter 8: Survey of Superscalar Processors
This chapter, compiled by Prof. Mark Smotherman of Clemson University, pro-vides a historical chronicle on the development of superscalar machines and a survey of existing superscalar microprocessors. The chapter was first completed in 1998 and has been continuously revised and updated since then. It contains fasci-nating information that can¡¯t be found elsewhere.
Chapter 9: Advanced Instruction Flow Techniques
This chapter provides a thorough overview of issues related to high-performance instruction fetching. The topics covered include historical, currently used, and pro-posed advanced future techniques for branch prediction, as well as high-bandwidth and high-frequency fetch architectures like trace caches. Though not all such tech-niques have yet been adopted in real machines, future designs are likely to incorpo-rate at least some form of them.
Chapter 10: Advanced Register Data Flow Techniques
This chapter highlights emerging microarchitectural techniques for increasing per-formance by exploiting the program characteristic of value locality. This program characteristic was discovered recently, and techniques ranging from software memoization, instruction reuse, and various forms of value prediction are described in this chapter. Though such techniques have not yet been adopted in real machines, future designs are likely to incorporate at least some form of them.
Chapter 11: Executing Multiple Threads
This chapter provides an introduction to thread-level parallelism (TLP), and pro-vides a basic introduction to multiprocessing, cache coherence, and high-perfor-mance implementations that guarantee either sequential or relaxed memory ordering across multiple processors. It discusses single-chip techniques like multi-threading and on-chip multiprocessing that also exploit thread-level parallelism. Finally, it visits two emerging technologies¡ªimplicit multithreading and preexecution¡ªthat attempt to extract thread-level parallelism automatically from single-threaded programs.
In summary, Chapters 1 through 5 cover fundamental concepts and foundation-al techniques. Chapters 6 through 8 present case studies and an extensive survey of actual commercial superscalar processors. Chapter 9 provides a thorough overview of advanced instruction flow techniques, including recent developments in ad-vanced branch predictors. Chapters 10 and 11 should be viewed as advanced topics chapters that highlight some emerging techniques and provide an introduction to multiprocessor systems.
This is the first edition of the book. An earlier beta edition was published in 2002 with the intent of collecting feedback to help shape and hone the contents and presen-tation of this first edition. Through the course of the development of the book, a large set of homework and exam problems have been created. A subset of these problems are included at the end of each chapter. Several problems suggest the use of the
PREFACE xiii
Simplescalar simulation suite available from the Simplescalar website at http://www .simplescalar.com. A companion website for the book contains additional support mate-rial for the instructor, including a complete set of lecture slides (www.mhhe.com/shen).
Acknowledgments
Many people have generously contributed their time, energy, and support toward the completion of this book. In particular, we are grateful to Bob Colwell, who is the lead author of Chapter 7, Intel¡¯s P6 Microarchitecture. We also acknowledge his coauthors, Dave Papworth, Glenn Hinton, Mike Fetterman, and Andy Glew, who were all key members of the historic P6 team. This chapter helps ground this textbook in practical, real-world considerations. We are also grateful to Professor Mark Smotherman of Clemson University, who meticulously compiled and au-thored Chapter 8, Survey of Superscalar Processors. This chapter documents the rich and varied history of superscalar processor design over the last 40 years. The guest authors of these two chapters added a certain radiance to this textbook that we could not possibly have produced on our own. The PowerPC 620 case study in Chapter 6 is based on Trung Diep¡¯s Ph.D. thesis at Carnegie Mellon University. Finally, the thorough survey of advanced instruction flow techniques in Chapter 9 was authored by Gabriel Loh, largely based on his Ph.D. thesis at Yale University.
In addition, we want to thank the following professors for their detailed, in-sightful, and thorough review of the original manuscript. The inputs from these reviews have significantly improved the first edition of this book.
.David Andrews, University of Arkansas .Walid Najjar, University of California
.
Angelos Bilas, University of Toronto Riverside .Fred H. Carlin, University of California at .Vojin G. Oklabdzija, University of California
Santa Barbara at Davis .Yinong Chen, Arizona State University .Soner Onder, Michigan Technological
.
Lynn Choi, University of California at Irvine University .Dan Connors, University of Colorado .Parimal Patel, University of Texas at San
.
Karel Driesen, McGill University Antonio .Alan D. George, University of Florida .Jih-Kwon Peir, University of Florida .Arthur Glaser, New Jersey Institute of .Gregory D. Peterson, University of
Technology Tennessee .Rajiv Gupta, University of Arizona .Amir Roth, University of Pennsylvania .Vincent Hayward, McGill University .Kevin Skadron, University of Virginia .James Hoe, Carnegie Mellon University .Mark Smotherman, Clemson University .Lizy Kurian John, University of Texas at Austin .Miroslav N. Velev, Georgia Institute of
.
Peter M. Kogge, University of Notre Dame Technology .Angkul Kongmunvattana, University of .Bin Wei, Rutgers University Nevada at Reno .Anthony S. Wojcik, Michigan State University .Israel Koren, University of Massachusetts at .Ali Zaringhalam, Stevens Institute of
Amherst Technology .Ben Lee, Oregon State University .Xiaobo Zhou, University of Colorado at
.
Francis Leung, Illinois Institute of Technology Colorado Springs
xiv MODERN PROCESSOR DESIGN
This book grew out of the course Superscalar Processor Design at Carnegie Mellon University. This course has been taught at CMU since 1995. Many teaching assis-tants of this course have left their indelible touch in the contents of this book. They include Bryan Black, Scott Cape, Yuan Chou, Alex Dean, Trung Diep, John Faistl, Andrew Huang, Deepak Limaye, Chris Nelson, Chris Newburn, Derek Noonburg, Kyle Oppenheim, Ryan Rakvic, and Bob Rychlik. Hundreds of students have taken this course at CMU; many of them provided inputs that also helped shape this book. Since 2000, Professor James Hoe at CMU has taken this course even further. We both are indebted to the nurturing we experienced while at CMU, and we hope that this book will help perpetuate CMU¡¯s historical reputation of producing some of the best computer architects and processor designers.
A draft version of this textbook has also been used at the University of Wisconsin since 2000. Some of the problems at the end of each chapter were actu-ally contributed by students at the University of Wisconsin. We appreciate their test driving of this book.
John Paul Shen, Director, Microarchitecture Research, Intel Labs, Adjunct Professor, ECE Department, Carnegie Mellon University
Mikko H. Lipasti, Assistant Professor,ECE Department, University of Wisconsin
June 2004
Soli Deo Gloria