ECE8001 - Advanced Computer Architecture (Spring 2012)

From Networking and Parallel Systems Lab - by Michela Becchi
Jump to: navigation, search
NV CUDA Teaching Center 3D thumb.jpg

Course description: The course covers parallel computer architecture (general purpose multi-core and many-core processors, shared and distributed memory systems, clusters). Emphasis will be given to both architectural and programmability aspects.

Contents

Syllabus

Prerequisite

Note: Homework assignments and class projects can be done using the machines available in Lafferre Hall C1246, which is a Linux Lab. Therefore, you will need either basic Linux experience, or flexibility to learn to use Linux. If you are unfamiliar with Linux, you may have a look at the following tutorial. If you are used to work with GUI-based development environments, you may want to use Eclipse CDT. Here is a great tutorial on using Eclipse in combination with Cygwin (under Windows).

Course Mailing List

http://groups.google.com/group/ece8001

Email: ece8001@googlegroups.com

ECE8001 Background Form

References

Books (recommended)

Basic computer architecture references:

Parallel computer architecture references:

CUDA and parallel computing references:

Online material

Linux/Make tutorials for beginners

Tools

Presentations

Papers

Chip Multiprocessors

Transactional Memory

Manycore systems and Graphics Processing Units

Programming models

Conferences

Lecture Notes

Homework

Calendar

Week Tuesday Thursday
1 Jan 17 - Introduction
Jan 19 - Vector processors
Read Thousand core chips: a technology perspective & summary to ECE8001group
2
Jan 24 - SIMD Multimedia Extensions & Introduction to GPUs
Fill in the ECE8001 Background Form
Jan 26 - Introduction to GPUs and CUDA
3 Jan 31 - Introduction to CUDA & CUDA Memories Feb 2 - CUDA Memories
4 Feb 7 - CUDA Memories Feb 9 - CUDA Performance Optimization
5 Feb 14 - CUDA Performance Optimization Feb 16 - CUDA Concurrent Execution
6
Feb 21 - HW #1 due
Parallel Thread Execution
Feb 23 - Conditional Branching on GPU, Fermi Architecture & Parallel Reduction
7 Feb 28 - Introduction to MPI
March 1 - Paper presentations: GPU Efficiency (1)
1. Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels
Presenter: Fadi; Reviewers: Daniel, Chris H., Chris S.
2. Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance
Presenter: Fadi; Reviewers: Mark, Di
8 March 6 - MPI - Collective Communication (1)
March 8 - Paper presentations: GPU Efficiency (2)
1. Improving GPU Performance via Large Warps and Two-Level Warp Scheduling
Presenter: Chris S.; Reviewers: Daniel, Fadi, Di
2. Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
Presenter: Xiaobo; Reviewers: Mark, Chris H.
9
March 13 - HW #2 due
MPI - Collective Communication (2)
March 15 - Paper presentations: GPU Efficiency (3)
1. Energy-efficient Mechanisms for Managing Thread Contexts in Throughput Processors
Presenter: Xiaobo; Reviewers: Daniel, Mark, Fadi
2. A Compile-Time Managed Multi-Level Register File Hierarchy
Presenter: Di; Reviewers: Xiaobo, Chris H., Chris S.
10 March 20 - MPI Data Grouping March 22 - Project Analysis and Design Presentations
11 April 3 - MPI Communicators and Dynamic Processes
April 5 - Paper presentations: Transactional Memory
1. Transactional Memory: An Overview
Presenter: Daniel; Reviewers: Chris H., Di
2. Hardware Transactional Memory for GPU Architectures
Presenter: Chris S.; Reviewers: Mark, Fadi, Xiaobo
12
April 10 - Resource contention and sharing
1. Contention Aware Execution: Online Contention Detection and Response
Presenter: Fadi
2. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations
Presenter: Kittisak
April 12 - Paper presentations: Many-core Debate
1. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU
Presenter: Daniel; Reviewers: Di, Chris H.
2. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator
Presenter: Mark; Reviewers: Fadi, Chris S., Xiaobo
13 April 17 - Cache Coherence (1)
April 19 - Paper presentations: Map Reduce
1. MapCG: Writing Parallel Program Portable between CPU and GPU
Presenter: Mark; Reviewers: Di, Chris S.
2. Tarazu: Optimizing MapReduce On Heterogeneous Clusters
Presenter: Chris H.; Reviewers: Daniel, Fadi, Xiaobo
14 April 24 - Cache Coherence (2)
April 26 - Paper presentations: Programming Models
1. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping
Presenter: Di; Reviewers: Chris S., Xiaobo
2. The Implementation of the Cilk-5 Multithreaded Language
Presenter: Chris H.; Reviewers: Daniel, Mark
15 May 1 - Cache Coherence (3) May 3 - Final Project Presentations

Paper Review Guideline

Paper Review Grading Guideline

Paper Presentation Guideline

(25 min + questions, 5-10 min discussion, avg 2 min/slides)

Presentation Feedback Form

  1. Context & Introduction
  2. Summary of Paper Contributions
  3. Details of Paper Contributions
  4. Results
  5. Summary
  6. Comments and Discussion Topics
Personal tools
Namespaces
Variants
Actions
main
courses
NPS only
Toolbox