課程資訊
課程名稱
統計與機器學習
Statistical and Machine Learning 
開課學期
112-2 
授課對象
共同教育中心  統計碩士學位學程  
授課教師
王彥雯 
課號
HDAS7004 
課程識別碼
855 M0040 
班次
 
學分
3.0 
全/半年
半年 
必/選修
必修 
上課時間
星期三2,3,4(9:10~12:10) 
上課地點
公衛211 
備註
生醫資訊與生物統計學領域選修課程之一。
限本系所學生(含輔系、雙修生)
總人數上限:30人
外系人數限制:2人 
 
課程簡介影片
 
核心能力關聯
核心能力與課程規劃關聯圖
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

本課程將介紹常見的統計學習方法及相關理論,範圍將聚焦於監督式學習(supervised learning),包含regularized regression、SVM、SVR、tree-based methods、ensemble learning、imbalanced classification等,並搭配R或Python的操作,引入實際問題的處理與分析,培養學生從資料中了解問題,並擷取有用的資訊以解決實際問題的能力。

This course will introduce statistical and machine learning methods and discuss related theories. The topics will focus on supervised learning, including regularized regression, SVM, SVR, tree-based methods, ensemble learning, imbalanced classification, etc. Students will learn how to solve real problems, analyze data through R/Python, and explain the results. 

課程目標
本課程將介紹常見的統計與機器學習方法及其理論,同時搭配實例分析、程式撰寫、文獻閱讀與期末報告,期望學生修習完後能:
1. 具備使用統計與機器學習方法分析資料及解決實務問題之能力。(B, C, D)
2. 具備正確解讀分析結果之能力。(A, B, C)
3. 透過文獻閱讀,了解如何應用統計與機器學習方法解決實務應用問題。(A)
This course will introduce popular statistical and machine learning methods and their related theories. We will use real examples and research papers to demonstrate the concepts and students will also learn programming skills. At the end of the course the students are expected to have the ability to
1. Apply statistical and machine learning methods to analyze real data and solve real problems. (B, C, D)
2. Interpret results of data analysis correctly. (A, B, C)
3. Understand how to use statistical and machine learning methods to solve application problems through reading research papers. (A) 
課程要求
修課學生需修過基礎統計學與迴歸分析,並具備基本的程式撰寫經驗。
Students who would like to take this course must have taken basic statistics and regression analysis in advance, and have basic programming experience. 
預期每週課後學習時數
 
Office Hours
另約時間 備註: 與助教另約時間 助教:梁雅婷(d11849007@ntu.edu.tw)、蔡岱融(d12849003@ntu.edu.tw) 
指定閱讀
1. James, G., Witten, D., Hastie, T., and Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R. 2nd edition. Springer.
2. James, G, Witten, D, Hastie, T, Tibshirani, R and Taylor, J (2023). An Introduction to Statistical Learning with Applications in Python. Springer.
3. Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edition. Springer. 
參考書目
1. Abu-Mostafa, Y. S., Magdon-Ismail, M. and Lin, H.-T. (2012). Learning from Data: a Short Course. AMLBook.
2. Alpaydin, E. (2014). Introduction to Machine Learning, 3rd edition. MIT Press Ltd.
3. Cristianini, N., and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge: Cambridge University Press.
4. Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., and Herrera, F. (2018). Learning from imbalanced data sets. Cham: Springer.
5. Lander, J. P. (2017). R for everyone: Advanced analytics and graphics, 2ne ed., Pearson Education. (中文版由旗標出版)
6. Lantz, B. (2019). Machine Learning with R: Expert techniques for predictive modeling, 3rd edition. Packt Publishing.
7. Summa, M. G., Bottou, L., Goldfarb, B., Murtagh, F., Pardoux, C. and Touati, M. (2012). Statistical Learning and Data Science. Chapman and Hall/CRC.
8. Wickham, H. and Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media. (中文版由碁峯出版)
9. Zhang, C., & Ma, Y. (Eds.). (2012). Ensemble Machine Learning: Methods and Applications. Springer Science & Business Media.
10. Müller, A. C. and Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O'Reilly Media.
11. 塚本邦尊、山田典一 、大澤文孝著,莊永裕譯 (2020)。東京大學資料科學家養成全書:使用 Python 動手學習資料分析。臉譜出版。 
評量方式
(僅供參考)
 
No.
項目
百分比
說明
1. 
平時表現、課堂參與、作業練習 Participation, homework 
35% 
 
2. 
文獻閱讀報告 Literature reading report 
30% 
針對給定的主題,學生必須搜尋相關文獻並上台報告文獻內容。 For a given topic, students are required to read and to present the related paper. 
3. 
期末口頭報告 Final oral presentation 
15% 
修課學生必須利用上課所學,完成一個分組專題報告,在期末時口頭發表分析成果。 Students must apply what they have learned in the class to complete a group project and give an oral presentation. 
4. 
期末書面報告 Written report 
20% 
修課學生需要撰寫一份實務分析之成果報告。 Students are required to write a final report about practical data analysis. 
 
針對學生困難提供學生調整方式
 
上課形式
作業繳交方式
考試形式
其他
由師生雙方議定
課程進度
週次
日期
單元主題
第1週
2/21  1. 課程導論:什麼是統計與機器學習?
Introduction: what is statistical and machine learning?

2. 監督式學習概念介紹
Overview of supervised learning 
第2週
2/21  和平紀念日(放假日) 
第3週
3/6  模型評估:評估、交叉驗證、拔靴法
Model assessment: evaluation, cross-validation, bootstrap 
第4週
3/13  正規化迴歸 -- ridge, LASSO, elastic-net regression
Regularized regression -- ridge , LASSO, elastic-net regression 
第5週
3/20  分類:羅吉斯迴歸、線性迴歸、k-近鄰演算法
Classification: logistic regression, linear regression, k-nearest neighbors 
第6週
3/27  分類:貝氏決策理論、區辨分析
Classification: Bayesian decision theory, discriminant analysis 
第7週
4/3  分類:支撐向量機
Classification: support vector machines 
第8週
4/10  迴歸:Support vector regression
Regression: Support vector regression 
第9週
4/17  樹的方法:決策樹、隨機森林
Tree-based methods -- decision trees, random forests 
第10週
4/24  集成學習:提升法、裝袋法、stacking、cascading
Ensemble learning: boosting, bagging, stacking, cascading 
第11週
5/1  不平衡資料分類
Imbalanced classification 
第12週
5/8  文獻閱讀報告(I)
Literature reading report (I) 
第13週
5/15  文獻閱讀報告(II)
Literature reading report (II) 
第14週
5/22  專題演講 
第15週
5/29  期末口頭報告
Oral presentation 
第16週
6/5  期末口頭報告
Oral presentation