課程名稱 |
資料科學與社會研究 Data Science and Social Inquiry |
開課學期 |
112-1 |
授課對象 |
社會科學院 經濟學系 |
授課教師 |
陳由常 |
課號 |
ECON5166 |
課程識別碼 |
323 U1250 |
班次 |
|
學分 |
3.0 |
全/半年 |
半年 |
必/選修 |
選修 |
上課時間 |
星期三6,7,8(13:20~16:20) |
上課地點 |
社科506 |
備註 |
「資料科學與社會分析學士班跨域專長」必修課。 限學士班三年級以上 或 限碩士班以上 或 限博士班 總人數上限:60人 |
|
|
課程簡介影片 |
|
核心能力關聯 |
核心能力與課程規劃關聯圖 |
課程大綱
|
為確保您我的權利,請尊重智慧財產權及不得非法影印
|
課程概述 |
Please check
https://docs.google.com/document/d/1Va_CnqUgMtGCAO6hRUENvu4F7j2KTsWAXSBxKP0OZ0M/edit?usp=sharing
for detail information. Below is a problem set that helps you decide whether you are ready for this course (draft version, don't write yet)
https://drive.google.com/file/d/1fWoYhHQmbVyyyupOJQp73sZDZRD_PZsK/view?usp=sharing
---
Econ 5166 serves as an introduction to “classical” machine learning (ML) methods such as PCA, LASSO, decision trees, random forests, and more, with a strong focus on their practical applications in social science research and business. This course is designed for students who have already completed an initial course in statistics, have some hands-on data manipulation experience, and are keen to delve into the underlying principles of machine learning.
Despite the myriad of excellent ML courses available at NTU, Econ 5166 stands apart due to two distinctive aspects. Firstly, the course emphasizes the underlying relationship between ML and statistics. It deciphers how ML, like any data-based exploratory technique, fits into the broader statistical framework. The connections between fundamental statistical concepts and classical ML methods—correlation and PCA, OLS regression and LASSO, hypothesis testing and classification, to name a few—will be illustrated. This class also serves as an opportunity to revisit statistics by exploring its core concepts (like correlation, expectations) in light of real-world applications. It's worth noting that given our focus on understanding statistical underpinnings rather than merely the methodology, we will primarily concentrate on the more traditional, accessible ML methods. Modern methods like deep learning will be conceptually addressed as an extension of what we will actually learn in class.
The second distinctive feature of this course involves a in-depth exploration of ML applications in social science research, and to a lesser extent, business. Our primary goal is to equip you with practical ML skills to tackle real-world challenges effectively. To achieve this, each method we discuss will be motivated by business applications, followed by an analysis of a research paper or one of my own research projects to demonstrate the relevance of ML. Additionally, an integral part of this course is a project assignment where you'll refine your skills in problem formulation, coding for data analysis, precise interpretation of statistical results, and effective communication of your findings. This hands-on approach not only solidifies your theoretical understanding but also enhances your ability to use ML methods in practical, real-world scenarios. |
課程目標 |
1. Developing working knowledge about machine learning methods: Students will learn how to intuitively understand the principles of various machine learning methods through mathematical definitions and algorithms. Furthermore, they will be able to apply this knowledge in actual data analysis work, such as feature selection and interpreting analysis results.
2. Understanding machine learning algorithms through statistics: Students will utilize basic concepts such as conditional expectation to grasp the statistical implications of these algorithms (for example, cross-validation). Simultaneously, this course also emphasizes how to lead students to re-understand basic statistical concepts like correlation, regression analysis, hypothesis testing, etc., from a practical application perspective.
3. Cultivating data processing skills: Students will learn a series of data processing skills, including data cleaning, ETL (extract, transform, load), web crawling, data visualization, to application development of data products, and the verification of data reliability and the inspection of potential errors in the analysis process.
4. Fostering basic literacy in data science: Students will cultivate essential abilities for a data scientist, such as enhancing mathematical maturity and mastery of statistics, and refine their scientific problem-solving method in the final project. At the same time, they will learn how to apply data (science) in a business environment and have a preliminary understanding of the division of labor and required skills for various job functions.
|
課程要求 |
1. Homework
2. Midterm
3. Final Project Presentation |
預期每週課後學習時數 |
5 |
Office Hours |
|
指定閱讀 |
待補 |
參考書目 |
Murphy (2022), Probabilistic Machine Learning: An Introduction |
評量方式 (僅供參考) |
|
針對學生困難提供學生調整方式 |
上課形式 |
以錄影輔助 |
作業繳交方式 |
|
考試形式 |
|
其他 |
|
|
週次 |
日期 |
單元主題 |
第1週 |
9/06 |
Introduction |
第2週 |
9/13 |
Principal Component Analysis |
第3週 |
9/20 |
Principal Component Analysis |
第4週 |
9/27 |
Factor analysis |
第5週 |
10/04 |
Clustering |
第6週 |
10/11 |
Clustering |
第7週 |
10/18 |
Project Discussion (No Class) |
第8週 |
10/25 |
Penalized Regression |
第9週 |
11/01 |
Penalized Regression |
第10週 |
11/08 |
Penalized Regression |
第11週 |
11/15 |
Midterm |
第12週 |
11/22 |
Project Discussion (No Class) |
第13週 |
11/29 |
Tree Algorithms |
第14週 |
12/06 |
Tree Algorithms |
第15週 |
12/13 |
Tree Algorithms |
第16週 |
12/20 |
Final Project Rehearsal (No Class/ Graded) |
第17週 |
12/27 |
Project Presentation |
|