課程資訊
課程名稱
高效能巨量資料與人工智慧系統
High-Performance Big Data and Artificial Intelligence Systems 
開課學期
110-2 
授課對象
電機資訊學院  資訊網路與多媒體研究所  
授課教師
洪士灝 
課號
CSIE5373 
課程識別碼
922 U4620 
班次
 
學分
3.0 
全/半年
半年 
必/選修
選修 
上課時間
星期一7,8,9(14:20~17:20) 
上課地點
資104 
備註
須具備計算機結構與作業系統之基礎。
總人數上限:100人 
Ceiba 課程網頁
http://ceiba.ntu.edu.tw/1102HPBDAIS 
課程簡介影片
 
核心能力關聯
核心能力與課程規劃關聯圖
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

** 課程主要採用NTU COOL: https://cool.ntu.edu.tw/courses/14081
** CEIBA上的課程內容設定為公開(https://ceiba.ntu.edu.tw/course/3943f9/index.htm),但僅提供基本資訊、第一周的課程投影片和Exercise 0。
** 本課程已列入資工系「計算機結構」領域專長Level-4選修課程,本年度開始將強調軟硬體系統架構,修習者請確認本身對於計算機結構、作業系統有足夠之素養。
** 請利用Exercise 0檢驗自己是否適合修課,如果您有部份基礎,雖然不熟,但是有興趣和願意花時間藉由這門課學習深入這些技術,那麼還是可以來修課。

近年巨量資料與人工智慧的快速發展,創造許多新興的應用,為了對於更大量的資料進行分析處理以及追求更強大的人工智慧,許多國家級的科技研究乃至於大型商業應用都開始採用高效能計算(超級電腦)技術來提升競爭力,而如今的高效能計算平台也紛紛開始支援重要的巨量資料與人工智慧應用,因此高效能計算成為帶動前瞻科技的火車頭之一。

然而高效能計算平台包含一些進階的技術,包括異質計算、平行計算、分散式處理、高速網路等,往往必須透過軟硬體整合優化的方式,才能打造出高效能與高效率的系統和應用,因此能夠善用高效能計算平台的人才並不多見。對此一領域有興趣的學生,即便修習多項相關課程,恐怕仍然無法完整涵蓋此領域之基本知識與技能,更難以將多門課程所學到的東西加以整合運用。

針對以上所述之需求與門檻,本課程將採用問題導向式教學法(Problem-Based Learning),以巨量資料與人工智慧領域中的實務問題為核心,教授相關的高效能計算知識與技能,並且鼓勵學生進行小組討論、論文研讀、期末專題,以培養學生主動學習、批判思考和問題解決能力。 

課程目標
在一學期的課程中,我們將探討各類型巨量資料與人工智慧應用常遇到的系統議題,探討如何打造高效能的系統。學生將學習埋藏在系統內部的關鍵技術,包括系統架構、軟體框架、軟硬體整合與優化,以及最新的技術發展趨勢。

在一學期的課程中,我們將探討:
(一) 平行與分散式計算原理
(二) 高效能計算的軟硬體架構
(三) 高效率的巨量資料儲存與分析系統
(四) 高效率的人工智慧訓練與推論系統
(五) 系統效能評估與優化實際案例

以上的每個階段,都包含了軟硬體整合與優化的議題,本課程除了介紹相關的系統架構、軟體框架之外,也將帶領學生探討最新的技術發展趨勢以及應用個案。 
課程要求
課堂討論、作業、考試、期末專題報告 
預期每週課後學習時數
 
Office Hours
每週二 15:00~17:00 備註: 關於課程和作業的問題,請先在NTU COOL的線上討論版發問,若是希望面談,請敘述理由及事先預約。 Please ask questions on-line first. If you still need to meet in person, please state your reasons and make an appointment. 
指定閱讀
投影片、參考書籍與論文 
參考書目
待補 
評量方式
(僅供參考)
 
No.
項目
百分比
說明
1. 
作業 
40% 
 
2. 
考試 
40% 
 
3. 
期末報告 
20% 
 
 
課程進度
週次
日期
單元主題
第1週
02/14  Introduction to the course: What is high-performance computing (HPC)? Why do we want to build high-performance systems? Why high-performance is important to big data analytics and AI applications? How to design high-performance systems for big data analytics and AI applications?

#Exercise 0 will also serve as a self-test for students to decide to take this course or not. 
第2週
02/21  Parallel computing: Why do we need parallel computing? What are the paradigms for parallel computing? How to pursue high-performance with parallel computing in practice? Where are the performance bottlenecks and how to identify them?

#Exercise 1: Let's starting writing simple parallel programs to run on your multicore computers in the exercise. 
第3週
02/28  Enjoy the Holiday! 
第4週
03/07  Parallel computer architecture: To achieve high performance, a system has to exploit parallelism at different levels, from circuit level (FPGA), instruction level (processor pipeline, SIMD, vector), chip level (multicore), machine-level (multiprocessor, heterogeneous CPU/GPU systems), to cluster level (datacenter, supercomputer). Actually, in addition to high-performance, we also need to pay attention to the power efficiency and cost-performance ratio in order to design a good system.

#Exercise 2: Supercomputing has been used for simulating the weather and making prediction. What is the state-of-the-art for weather simulation, in terms of computing architecture? Can GPU be used to accelerate weather simulation? Can big data and AI be used for weather prediction? 
第5週
03/14  Parallel computing vs big data: How to store petabyte-scale big data in high-performance cluster filesystems such as HDFS? How to process big data in datacenter with Hadoop MapReduce? Other than parallel computing, the key is data locality and the trick is colocation. How to accelerate data processing with in-memory computing? Lots of open source middleware projects are available for you to explore.

#Exercise 3: Read a research paper and report the techniques to accelerate big data processing. 
第6週
03/21  Parallel computing vs AI/deep learning: Many AI applications contain lots of parallelism, and parallel computing can effectively accelerate these applications. Parallel algorithms have been developed for search and expert systems before the last AI Winter. Datacenter and GPU clusters are keys to open the deep learning era. How to train deep learning models with thousands of GPUs in the datacenter?

#Exercise 4: What is the largest neural network model today? How to accelerate the training process?  
第7週
03/28  Profiling and optimizing parallel applications: Now that you know how parallel computing helps improve application performance, you also have to know that parallel computing may NOT work as well as you think as many issues may occur to hurt the performance. How to profile and analyze such performance issues? How to apply performance optimization tricks to solve the issues? 
第8週
04/04  Enjoy the Holiday! 
第9週
04/11  Midterm Exam 
第10週
04/18  Parallel programming for shared memory systems and GPU: threads, processes, OpenMP, OpenAcc, CUDA, TensorFlow, etc. How to create new threads? How to declare private and shared variables? How to parallelize the code? How to balance workload? How to synchronize threads? How to "see" the interprocessor communications? How to benefit most from the GPU?

#Exercise 5: Write a simple shared-memory parallel program and report your observations with a performance tool. 
第11週
04/25  Parallel programming for distributed systems: MPI, MapReduce, Spark, etc. Understanding these programming paradigms enables you to utilize supercomputers and clusters to scale the performance. MPI uses messages to exchange data. MapReduce use storage for input and output data. Spark uses memory buffers to establish processing pipelines.

#Exercise 6: Write a simple distributed parallel program and report your observations with a performance tool. 
第12週
05/02  (If School is open on Labor Day) High-performance networking and data exchange in datacenter: How does an MPI program exchange data efficiently between hundreds of machines? How do network bandwidth and latency impact the performance? How to implement high-performance network for datacenters and supercomputers? How to accelerate data exchange with remote direct memory access (RDMA)?  
第13週
05/09  Neural network architecture: How to estimate the performance for neural networks with or without deep learning accelerators? How to find good neural networks for your application with platform-aware neural architecture search (NAS)? How to compress a neural network to reduce its resource consumption? 
第14週
05/16  Information security and data privacy: How to protect data? There are security protocols and cryptographic methods for this purpose. The real new challenges today are to perform big data analytics and develop AI models under data protection. How to do it with techniques such as trusted computing hardware (SGX), federated learning, secure multiparty computation, and homomorphic encryption?  
第15週
05/23  Post-Moore - Neuromorphic computing and quantum computing: The increase of computing performance has depended on the Moore's Law for the past 60 years, but the Moore's Law is slowing down and will eventually ends. How to continue improving the capability of big data analytics and AI in the post-Moore era?  
第16週
05/30  Final Exam 
第17週
06/06  Advanced Topics: To be announced. 
第18週
06/13  Advanced Topics: To be announced.

# Final report is due.