ECE8001 - Advanced Computer Architecture (Spring 2012)
Course description: The course covers parallel computer architecture (general purpose multi-core and many-core processors, shared and distributed memory systems, clusters). Emphasis will be given to both architectural and programmability aspects.
Contents |
Syllabus
Prerequisite
- ECE4270-7270: Computer Organization (or equivalent Computer Architecture course - please contact instructor).
- ECE4220-7220: Real Time Embedded Systems OR CS 4520-7520: Operating Systems (or equivalent Operating Systems course - please contact instructor).
- C/C++ Programming Experience
Note: Homework assignments and class projects can be done using the machines available in Lafferre Hall C1246, which is a Linux Lab. Therefore, you will need either basic Linux experience, or flexibility to learn to use Linux. If you are unfamiliar with Linux, you may have a look at the following tutorial. If you are used to work with GUI-based development environments, you may want to use Eclipse CDT. Here is a great tutorial on using Eclipse in combination with Cygwin (under Windows).
Course Mailing List
http://groups.google.com/group/ece8001
Email: ece8001@googlegroups.com
References
Books (recommended)
Basic computer architecture references:
- J. Hennessy and D. Patterson, “Computer Architecture: A Quantitative Approach,” Fifth edition, Morgan-Kaufmann, 2011 (ISBN: 012383872X)
Parallel computer architecture references:
- S. Akhter and J. Roberts, “Multi-Core Programming,” Intel Press, 2006 (ISBN: 0976483246)
- S. W. Keckler, K. Olukotun, H. P. Hofstee, “Multicore Processors and Systems,” Springer ed., 2009 (ISBN: 978-1-4419-0262-7)
CUDA and parallel computing references:
- D. B. Kirk and W. W. Hwu, “Programming Massively Parallel Processors – A Hands-on Approach,” Morgan Kaufmann, 2010 (ISBN: 978-0-12-381472-2)
Online material
- CUDA Toolkit
- CUDA Manuals
- CUDA Library Documentation
- NVIDIA Fermi Architecture Whitepaper
- CUDA Training Material
- POSIX Threads Tutorial
- OpenMP specification
- OpenMP tutorial
- MPI
- Cilk
Linux/Make tutorials for beginners
Tools
Presentations
Papers
Chip Multiprocessors
- The case for a single-chip multiprocessor, K. Olukotun et al, ASPLOS 1996.
- Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction, R. Kumar et al, MICRO 2003.
- Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, R. Kumar et al, ISCA 2004.
- Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling, R. Kumar et al, ISCA 2005.
- A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems, D. Kaseridis et al, HPCA 2010.
Transactional Memory
- Transactional memory: architectural support for lock-free data structures, M. Herlihy and J. Moss, ISCA 1993.
- Software transactional memory for dynamic-sized data structures, M. Herlihy et al, PODC 2003.
- Transactional Memory Coherence and Consistency, L. Hammond et al, ISCA 2004.
- Unbounded transactional memory, C. S. Ananian et al, HPCA 2005.
- LogTM: Log-based Transactional Memory, K. E. Moore, HPCA 2006.
- Transactional Memory: An Overview, T. Harris et al, MICRO 2007.
Manycore systems and Graphics Processing Units
- Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator, J. Kelm et al, ISCA 2009.
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, S. Hong and H. Kim, ISCA 2009.
- Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?, E. Chung et al, MICRO 2010.
- Many-Thread Aware Prefetching Mechanisms for GPGPU Applications, J. Lee et al, MICRO 2010.
- Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels, M. Steffen et al, MICRO 2010.
- Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU, V. W. Lee et al, ISCA 2010.
- Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance, J. Meng et al, ISCA 2010.
- Improving GPU Performance via Large Warps and Two-Level Warp Scheduling, V. Narasiman et al, MICRO 2011.
- Energy-efficient mechanisms for managing thread context in throughput processors, M. Gebhart et al, ISCA 2011.
- A Compile-Time Managed Multi-Level Register File Hierarchy, M. Gebhart et al, MICRO 2011.
- Hardware Transactional Memory for GPU Architectures, W. Fung, MICRO 2011.
Programming models
- Programming model for a heterogeneous x86 platform, B. Saha et al, PLDI 2009.
- Merge: a programming model for heterogeneous multi-core systems, M. D. Linderman et al, ASPLOS 2008.
- Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping, C.-K. Luk et al, MICRO 2009.
- Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling, R. Chen et al, PACT 2010.
- EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system, P. H. Wang et al, PLDI 2007.
- The implementation of the Cilk-5 multithreaded language, M. Frigo et al, PLDI 1998.
- MapCG: writing parallel program portable between CPU and GPU, C. Hong et al, PACT 2010.
- Tarazu: Optimizing MapReduce On Heterogeneous Clusters, F. Ahmad et al, ASPLOS 2012.
Conferences
- ASPLOS: Architectural Support for Programming Language and Operating Systems
- ISCA: ACM/IEEE Symposium on Computer Architecture
- DAC: Design Automation Conference
- MICRO: International Symposium on Micro-architecture
- HPCA: IEEE Symposium on High-Performance Computer Architecture
- PACT: International Conference on Parallel Architecture and Compilation Techniques
- PLDI: ACM-SIGPLAN Symposium on Programming Language Design and Implementation
- PPoPP: Principles and Practice of Parallel Programming
Lecture Notes
- Introduction
- Introduction to SIMD Architecture
- Introduction to Graphics Processing Units
- Introduction to CUDA Programming
- Introduction to CUDA Memories
- CUDA Performance Considerations
- CUDA Concurrent Execution
- Parallel Thread Execution (PTX)
- Conditional Branching on GPU and Fermi Architecture
- Parallel Reduction in CUDA
- Introduction to MPI
- MPI Collective Communication
- MPI Data Grouping
- MPI Communicators and Dynamic Processes
- Cache Coherence & Memory Consistency Models
Homework
- Homework 1 - Due Feb 21 (Tuesday)
- Homework 2 code- Due March 13 (Tuesday)
- Homework 3 script- Due April 10 (Tuesday)
Calendar
| Week | Tuesday | Thursday | ||||||||||||||||||||
| 1 | Jan 17 - Introduction |
| ||||||||||||||||||||
| 2 |
|
Jan 26 - Introduction to GPUs and CUDA | ||||||||||||||||||||
| 3 | Jan 31 - Introduction to CUDA & CUDA Memories | Feb 2 - CUDA Memories | ||||||||||||||||||||
| 4 | Feb 7 - CUDA Memories | Feb 9 - CUDA Performance Optimization | ||||||||||||||||||||
| 5 | Feb 14 - CUDA Performance Optimization | Feb 16 - CUDA Concurrent Execution | ||||||||||||||||||||
| 6 |
|
Feb 23 - Conditional Branching on GPU, Fermi Architecture & Parallel Reduction | ||||||||||||||||||||
| 7 | Feb 28 - Introduction to MPI |
| ||||||||||||||||||||
| 8 | March 6 - MPI - Collective Communication (1) |
| ||||||||||||||||||||
| 9 |
|
| ||||||||||||||||||||
| 10 | March 20 - MPI Data Grouping | March 22 - Project Analysis and Design Presentations | ||||||||||||||||||||
| 11 | April 3 - MPI Communicators and Dynamic Processes |
| ||||||||||||||||||||
| 12 |
|
| ||||||||||||||||||||
| 13 | April 17 - Cache Coherence (1) |
| ||||||||||||||||||||
| 14 | April 24 - Cache Coherence (2) |
| ||||||||||||||||||||
| 15 | May 1 - Cache Coherence (3) | May 3 - Final Project Presentations |
Paper Review Guideline
- The review should not exceed two pages
- The review should include the following points:
- Brief summary of the main contributions of the paper
- Paper's strengths
- Paper's weaknesses
- Critical discussion on the paper
- The review should not be a rewording of the content of the paper. The SUMMARY can be limited to two-three paragraphs (half a page or so). In the summary you should state, WITH YOUR OWN WORDS, the main contributions of the paper (and, possibly, what methodology was used by the authors to make their point). For this part, avoid copying-and-pasting from the paper. You can, however, cite parts/sentences of the paper in the discussion section (using proper quotation).
- The most important part of your review is your critical thinking about the paper:
- What are the weaknesses of the papers?
- What are the strengths of the papers?
- What are the assumptions? Do they make sense? Would the results change significantly with different assumptions?
- Are the experiments sound? Is the evaluation methodology clearly explained? Would the results of the experiment change if the authors had used workload with different characteristics?
- What is the potential impact of the paper?
- Can you related the paper to some other papers you read?
Paper Review Grading Guideline
Paper Presentation Guideline
(25 min + questions, 5-10 min discussion, avg 2 min/slides)
- Context & Introduction
- Summary of Paper Contributions
- Details of Paper Contributions
- Results
- Summary
- Comments and Discussion Topics