週次 |
日期 |
單元主題 |
第1週 |
02/14 |
Introduction to the course: What is high-performance computing (HPC)? Why do we want to build high-performance systems? Why high-performance is important to big data analytics and AI applications? How to design high-performance systems for big data analytics and AI applications?
#Exercise 0 will also serve as a self-test for students to decide to take this course or not. |
第2週 |
02/21 |
Parallel computing: Why do we need parallel computing? What are the paradigms for parallel computing? How to pursue high-performance with parallel computing in practice? Where are the performance bottlenecks and how to identify them?
#Exercise 1: Let's starting writing simple parallel programs to run on your multicore computers in the exercise. |
第3週 |
02/28 |
Enjoy the Holiday! |
第4週 |
03/07 |
Parallel computer architecture: To achieve high performance, a system has to exploit parallelism at different levels, from circuit level (FPGA), instruction level (processor pipeline, SIMD, vector), chip level (multicore), machine-level (multiprocessor, heterogeneous CPU/GPU systems), to cluster level (datacenter, supercomputer). Actually, in addition to high-performance, we also need to pay attention to the power efficiency and cost-performance ratio in order to design a good system.
#Exercise 2: Supercomputing has been used for simulating the weather and making prediction. What is the state-of-the-art for weather simulation, in terms of computing architecture? Can GPU be used to accelerate weather simulation? Can big data and AI be used for weather prediction? |
第5週 |
03/14 |
Parallel computing vs big data: How to store petabyte-scale big data in high-performance cluster filesystems such as HDFS? How to process big data in datacenter with Hadoop MapReduce? Other than parallel computing, the key is data locality and the trick is colocation. How to accelerate data processing with in-memory computing? Lots of open source middleware projects are available for you to explore.
#Exercise 3: Read a research paper and report the techniques to accelerate big data processing. |
第6週 |
03/21 |
Parallel computing vs AI/deep learning: Many AI applications contain lots of parallelism, and parallel computing can effectively accelerate these applications. Parallel algorithms have been developed for search and expert systems before the last AI Winter. Datacenter and GPU clusters are keys to open the deep learning era. How to train deep learning models with thousands of GPUs in the datacenter?
#Exercise 4: What is the largest neural network model today? How to accelerate the training process? |
第7週 |
03/28 |
Profiling and optimizing parallel applications: Now that you know how parallel computing helps improve application performance, you also have to know that parallel computing may NOT work as well as you think as many issues may occur to hurt the performance. How to profile and analyze such performance issues? How to apply performance optimization tricks to solve the issues? |
第8週 |
04/04 |
Enjoy the Holiday! |
第9週 |
04/11 |
Midterm Exam |
第10週 |
04/18 |
Parallel programming for shared memory systems and GPU: threads, processes, OpenMP, OpenAcc, CUDA, TensorFlow, etc. How to create new threads? How to declare private and shared variables? How to parallelize the code? How to balance workload? How to synchronize threads? How to "see" the interprocessor communications? How to benefit most from the GPU?
#Exercise 5: Write a simple shared-memory parallel program and report your observations with a performance tool. |
第11週 |
04/25 |
Parallel programming for distributed systems: MPI, MapReduce, Spark, etc. Understanding these programming paradigms enables you to utilize supercomputers and clusters to scale the performance. MPI uses messages to exchange data. MapReduce use storage for input and output data. Spark uses memory buffers to establish processing pipelines.
#Exercise 6: Write a simple distributed parallel program and report your observations with a performance tool. |
第12週 |
05/02 |
(If School is open on Labor Day) High-performance networking and data exchange in datacenter: How does an MPI program exchange data efficiently between hundreds of machines? How do network bandwidth and latency impact the performance? How to implement high-performance network for datacenters and supercomputers? How to accelerate data exchange with remote direct memory access (RDMA)? |
第13週 |
05/09 |
Neural network architecture: How to estimate the performance for neural networks with or without deep learning accelerators? How to find good neural networks for your application with platform-aware neural architecture search (NAS)? How to compress a neural network to reduce its resource consumption? |
第14週 |
05/16 |
Information security and data privacy: How to protect data? There are security protocols and cryptographic methods for this purpose. The real new challenges today are to perform big data analytics and develop AI models under data protection. How to do it with techniques such as trusted computing hardware (SGX), federated learning, secure multiparty computation, and homomorphic encryption? |
第15週 |
05/23 |
Post-Moore - Neuromorphic computing and quantum computing: The increase of computing performance has depended on the Moore's Law for the past 60 years, but the Moore's Law is slowing down and will eventually ends. How to continue improving the capability of big data analytics and AI in the post-Moore era? |
第16週 |
05/30 |
Final Exam |
第17週 |
06/06 |
Advanced Topics: To be announced. |
第18週 |
06/13 |
Advanced Topics: To be announced.
# Final report is due. |