ECE8001 - Advanced Computer Architecture (Spring 2011)
Course description: The course covers advance topics in computer architecture (pipelining, in-order vs. out of order executions, cache and memory hierarchies, Flynn’s taxonomy), and focuses on parallel computer architecture (general purpose multi-cores and graphical processing units). In particular, emphasis will be given to both architectural and programmability aspects.
Contents |
Syllabus
References
Books
Basic computer architecture references:
- J. Hennessy and D. Patterson, “Computer Architecture: A Quantitative Approach,” Fourth edition, Morgan-Kaufmann, 2006 (ISBN: 0123704901)
Parallel computer architecture references:
- S. Akhter and J. Roberts, “Multi-Core Programming,” Intel Press, 2006 (ISBN: 0976483246)
- S. W. Keckler, K. Olukotun, H. P. Hofstee, “Multicore Processors and Systems,” Springer ed., 2009 (ISBN: 978-1-4419-0262-7)
CUDA and parallel computing references:
- D. B. Kirk and W. W. Hwu, “Programming Massively Parallel Processors – A Hands-on Approach,” Morgan Kaufmann, 2010 (ISBN: 978-0-12-381472-2)
Online material
- CUDA Toolkit
- CUDA Library Documentation
- NVIDIA Fermi Architecture Whitepaper
- CUDA Training Material
- POSIX Threads Tutorial
- OpenMP specification
- OpenMP tutorial
- MPI
- Cilk
Linux/Make tutorials for beginners
Tools
Presentations
Papers
Chip Multiprocessors
- The case for a single-chip multiprocessor, K. Olukotun et al, ASPLOS 1996.
- Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction, R. Kumar et al, MICRO 2003.
- Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, R. Kumar et al, ISCA 2004.
- Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling, R. Kumar et al, ISCA 2005.
- A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems, D. Kaseridis et al, HPCA 2010.
Transactional Memory
- Transactional memory: architectural support for lock-free data structures, M. Herlihy and J. Moss, ISCA 1993.
- Software transactional memory for dynamic-sized data structures, M. Herlihy et al, PODC 2003.
- Transactional Memory Coherence and Consistency, L. Hammond et al, ISCA 2004.
- Unbounded transactional memory, C. S. Ananian et al, HPCA 2005.
- LogTM: Log-based Transactional Memory, K. E. Moore, HPCA 2006.
Manycore systems and Graphics Processing Units
- Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator, J. Kelm et al, ISCA 2009.
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, S. Hong and H. Kim, ISCA 2009.
- Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?, E. Chung et al, MICRO 2010.
- Many-Thread Aware Prefetching Mechanisms for GPGPU Applications, J. Lee et al, MICRO 2010.
- Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels, M. Steffen et al, MICRO 2010.
- Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU, V. W. Lee et al, ISCA 2010.
- Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance, J. Meng et al, ISCA 2010.
Programming models
- Programming model for a heterogeneous x86 platform, B. Saha et al, PLDI 2009.
- Merge: a programming model for heterogeneous multi-core systems, M. D. Linderman et al, ASPLOS 2008.
- Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping, C.-K. Luk et al, MICRO 2009.
- Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling, R. Chen et al, PACT 2010.
- EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system, P. H. Wang et al, PLDI 2007.
- The implementation of the Cilk-5 multithreaded language, M. Frigo et al, PLDI 1998.
Conferences
- ASPLOS: Architectural Support for Programming Language and Operating Systems
- ISCA: ACM/IEEE Symposium on Computer Architecture
- DAC: Design Automation Conference
- MICRO: International Symposium on Micro-architecture
- HPCA: IEEE Symposium on High-Performance Computer Architecture
- PACT: International Conference on Parallel Architecture and Compilation Techniques
- PLDI: ACM-SIGPLAN Symposium on Programming Language Design and Implementation
- PPoPP: Principles and Practice of Parallel Programming
Lecture Notes
- Introduction
- Background - Pipelining & Cache Organization
- Background - Loop Unrolling & Branch Prediction
- Background - Out-of-order Execution, Speculation & Multiple Issue Processors
- Multithreading & OpenMP
- POSIX Threads
- Introduction to GPUs
- CUDA Programming Basics
- CUDA Memories
- CUDA Performance Tuning
- CUDA Concurrent Execution
Homework
- Homework #1 - Solution: Part-1: Problem 2.6 & 2.10; Part-2: Jay Eggert's solution Kittisak Sajjapongse's solution
- Homework #2 - dataset for HW2 and output for HW2
- Homework #3 - code for HW3 - Solution: Kevin Stone's solution
Projects
Calendar
| Week | Tuesday | Thursday | Note | ||||||||||||
| 1 | Jan 18 - Introduction |
|
Submit the Background Form | ||||||||||||
| 2 | Jan 25 - Background (2) |
|
|||||||||||||
| 3 | Feb 1 - CLASS CANCELED | Feb 3 - CLASS CANCELED | |||||||||||||
| 4 | Feb 8 - Background (4) | Feb 10 - Multithreading and OpenMP (1) | |||||||||||||
| 5 |
|
|
|||||||||||||
| 6 | Feb 22 - OpenMP (3) & Introduction to GPUs | Feb 24 - GPU Programming (1) | |||||||||||||
| 7 |
|
|
Project selection | ||||||||||||
| 8 |
|
|
|||||||||||||
| 9 | Mar 15 - GPU Memories (3) | Mar 17- Project proposal presentations (1) | |||||||||||||
| 10 |
|
Mar 24 - CUDA Performance (2) | |||||||||||||
| 11 |
|
|
|||||||||||||
| 12 | Apr 12 - Homework #3 due |
|
|||||||||||||
| 13 |
|
|
|||||||||||||
| 14 |
|
|
|||||||||||||
| 15 | May 3 - Final project presentations | May 5 - Final project presentations |
Paper Review Guideline
- Briefly summarize the main contributions of the paper
- List the paper's strengths
- List the paper's weaknesses
- Discuss the paper
Examples:
- Jay Eggert's review to "Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction"
- Ian Graves's review to "Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator"
- Adam Procter's review to "Programming model for a heterogeneous x86 platform"
Review schedule: pdf
Paper Presentation Guideline
(25 min + questions, 5-10 min discussion, avg 2 min/slides)
- Context & Introduction
- Summary of Paper Contributions
- Details of Paper Contributions
- Results
- Summary
- Comments and Discussion Topics
Teams
- Kittisak Sajjapongse
- Adam Procter & Ian Graves - 05/13 @ 1:00 pm
- Jay Eggert & Ryanne Thomas - 05/13 @ 9:00 am
- Christopher Spain & Kevin Stone - 05/13 @ 11:00 am
- Prashant Revankar & Sankalp Shivaprakash & Yan Li - 05/10 @ 3:00 pm
- Yifeng (Felix) Zeng & Chinchao Suriyakul & Stanley Ikpe - 05/13 @ 4:00 pm
- Xiaodong Xu & Dheeraj Kaveti & Boinpally Kranthi Kumar
- Mahdieh Poostchi - 05/13 @ 5:00 pm
- Xiangge (Rafael) Li - 05/13 @ 2:00 pm