課程名稱 |
統計學習理論簡介 Introduction to Statistical Learning |
開課學期 |
112-2 |
授課對象 |
社會科學院 政治學系 |
授課教師 |
曾煥凱 |
課號 |
PS5696 |
課程識別碼 |
322EU2320 |
班次 |
|
學分 |
2.0 |
全/半年 |
半年 |
必/選修 |
選修 |
上課時間 |
星期二8,9,10(15:30~18:20) |
上課地點 |
社科302 |
備註 |
本課程以英語授課。政治思想,國際關係,公共行政,本國政治,比較政治。第10節為實習課。 限學士班三年級以上 總人數上限:30人 外系人數限制:10人 |
|
|
課程簡介影片 |
|
核心能力關聯 |
本課程尚未建立核心能力關連 |
課程大綱
|
為確保您我的權利,請尊重智慧財產權及不得非法影印
|
課程概述 |
Statistical learning is the process of extracting regularities from data using statistical models with the goal of finding a predictive function based on existing data to be able to make prediction on unseen data of similar type. The course introduces students to the concepts and analytical tools of statistical learning, it emphasizes "learning by doing'' by getting students familiarized with the use of R programming language to perform analysis on empirical data. The first part of the course starts with a refresher on the fundamentals of statistics-mean, variance, distribution, probabilities-before proceeding to more specialized topics. The first part of this course also gives a gentle introduction to R programming, during which issues of dimensionality and balance will be discussed with their diagnostic and preprocessing tasks implemented in R. The second part of the course introduces families of binary, penalized, discriminant, and mixture models, along with performance evaluation metrics. We conclude, in the third part of the course, with emerging data analysis methods such as text mining and network analysis.
Each class meeting usually begins with a lecture on that week's topic. During the lecture, the instructor will instruct students how to perform the analytical tasks by running R on the screen. Lecture note and code will be displayed on class slides and available for download. A total of FOUR course assignments will be given throughout the semester, which will help build the necessary analytical and programming foundation toward the completion of a 10-page term project.
This class is supported by Datacamp, the most intuitive learning platform for data science. Learn R, Python and SQL the way you learn best through a combination of short expert videos and hands-on-the-keyboard exercises.
You also access all course materials via our shared Dropbox folder: https://www.dropbox.com/scl/fo/ypahd0f6wimg0egu3lrhv/h?dl=0&rlkey=31l1qe03rmjflpcqcgdnrmavy |
課程目標 |
After the completion of this course, students will be able to:
1. Distinguish and process different types of data.
2. Identify which classification models to use for a particular dataset and/or modeling assumptions.
3. Perform analytical tasks in R with real-world data.
4. Apply these analytical skills to their (students') own research projects. |
課程要求 |
No prior coding experience in any of the commercial or open source programming languages is required. The course is self-contained in terms of instructing students the basics of programming necessary to perform the analytical tasks covered in this course. We will be using R, a versatile open-source programming language, as the primary programming language for instruction. RStudio will be the main instructional IDE (Integrated Development Environment) for running R applications in this course, but your are free to use other IDEs of your choice. Students are encouraged to constantly practice running R as well as explore alternative ways of doing the same tasks to get the most out of the practical aspect of this course. If you prefer to use other programming languages instead (e.g., Python, Matlab, Stata), I am open to discuss how I can better accommodate your needs. |
預期每週課後學習時數 |
8-10 hours per week plus self-administered programming exercise. |
Office Hours |
另約時間 備註: Please email me to schedule an appointment. |
指定閱讀 |
Please refer to the syllabus.
You can also download course materials directly from our Dropbox shared folder: https://www.dropbox.com/scl/fo/h13y8xir047usv6zihblo/h?rlkey=2qspxecr243o33gw5z6wyid07&dl=0 |
參考書目 |
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, with Applications in R (New York, NY: Springer Nature, 2013).
Max Kuhn and Kjell Johnson, Applied Predictive Modeling (New York, NY: Springer Nature, 2013).
J. Scott Long. 1997. Regression Models for Categorical and Limited Dependent Variables (Thousand Oaks, CA: SAGE Publications).
Making efforts to keep track course progress is essential. You are expected to have finished assigned readings before each week's meeting and practice assigned R analytical exercises to increase your proficiency with key statistical learning concepts and R programming.
Other readings are sourced from book chapters and articles published in academic journals and websites. Specific readings for each class are identified on this syllabus. Readings marked with a * will be available on course website; readings marked with a "v" means "review'' from past weeks. Items marked with a "globe" are clickable web-based materials. Items marked with a blacksquare are brief introduction on specific subjects provided by the instructor. |
評量方式 (僅供參考) |
No. |
項目 |
百分比 |
說明 |
1. |
Weekly readings and exercises |
20% |
Making efforts to keep track course progress is essential. You are expected to have finished assigned readings before each week's meeting and practice assigned R analytical exercises to increase your proficiency with key statistical learning concepts and R programming. |
2. |
Assignments |
40% |
A total of FOUR data analysis assignments will be given every 3-4 week to give students hand-on opportunities to apply their analytical and programming skills to real data from selected topic areas. Students are allowed to form study group to discuss assignments, reference textbooks, or make use of crowdsourcing Q&A forums, such as stackoverflow, quora and reddit. Remember, the instructor and TA are always at your service.
Submitted assignments need to be your own works. You are encouraged to discuss assignments with your peers but you are FORBIDDEN to submit duplicated answers or have someone do the assignments for you. |
3. |
Term Project |
40% |
At the end of the semester, students are required to submit an analytical paper of approximately 10-12 pages (but no more than 15 pages), centering on drawing statistical inference from the analysis of a dataset (or multiple datasets). There will be no assigned topics; instead, students will use their own discretion to select research topics from the social science or other cognitive fields, so long as you are using the analytical concepts and tools acquired in this course to approach them. Students will need to submit their topics at the 10th class meeting (insert date) and are encouraged to schedule an appointment with the instructor to discuss their topics. |
|
針對學生困難提供學生調整方式 |
上課形式 |
提供學生彈性出席課程方式 |
作業繳交方式 |
延長作業繳交期限 |
考試形式 |
書面(口頭)報告取代考試 |
其他 |
由師生雙方議定 |
|
週次 |
日期 |
單元主題 |
第1週 |
02/20 |
Course introduction |
第2週 |
02/27 |
Regression Methods I |
第3週 |
03/05 |
Regression Methods II |
第4週 |
03/12 |
Regression Methods III |
第5週 |
03/19 |
Nonlinear Regression I |
第6週 |
03/26 |
Nonlinear Regression II |
第7週 |
04/02 |
Statistical Learning I |
第8週 |
04/09 |
Statistical Learning II |
第9週 |
04/16 |
Movie (in lieu of midterm) |
第10週 |
04/23 |
Statistical Learning III |
第11週 |
04/30 |
Dimension Reduction and Prediction Accuracy |
第12週 |
05/07 |
Resampling Methods |
第13週 |
05/14 |
Drawing Inference from Text Data I |
第14週 |
05/21 |
Drawing Inference from Text Data II |
第15週 |
05/28 |
Course wrap-up |
第16週 |
06/04 |
No class. |
第17週 |
06/11 |
No class. Term paper due. |
第18週 |
06/18 |
No class |
|