課程大綱

課程資訊

課程名稱

統計學習理論簡介
Introduction to Statistical Learning

開課學期

109-2

授課對象

社會科學院政治學系

授課教師

曾煥凱

課號

PS5696

課程識別碼

322EU2320

班次

學分

2.0

全/半年

半年

必/選修

選修

上課時間

星期二8,9,10(15:30~18:20)

上課地點

社科302

備註

本課程以英語授課。政治思想，國際關係，公共行政，本國政治，比較政治。第10節為實習課。
限學士班三年級以上
總人數上限：30人
外系人數限制：10人

課程網頁

https://www.dropbox.com/sh/qcq5ddmf46xu20l/AACXG9ul5dcHMzQE0tjiEUA1a?dl=0

課程簡介影片

核心能力關聯

本課程尚未建立核心能力關連

課程大綱

為確保您我的權利,請尊重智慧財產權及不得非法影印

課程概述

Statistical learning is the process of extracting regularities from data using statistical models with the goal of finding a predictive function based on existing data to be able to make prediction on unseen data of similar type. The course introduces students to the concepts and analytical tools of statistical learning, it emphasizes "learning by doing" by getting students familiarized with the use of R programming language to perform analysis on empirical data. The first part of the course starts with a refresher on the fundamentals of statistics-mean, variance, distribution, probabilities-before proceeding to more specialized topics. The first part of this course also gives a gentle introduction to R programming, during which issues of dimensionality and balance will be discussed with their diagnostic and preprocessing tasks implemented in R. The second part of the course introduces families of binary, penalized, discriminant, and mixture models, along with performance evaluation metrics. We conclude, in the third part of the course, with the trendy topic of text mining, that is, drawing inference from text data.
Each class meeting usually begins with a lecture on that week's topic. During the lecture, the instructor will instruct students how to perform the analytical tasks by running R on the screen. Lecture note and code will be displayed on class slides and available for download. A total of FOUR assignments will be given throughout the semester, which will help build the necessary analytical and programming foundation toward the completion of a 10-page term project.

Required course materials can be accessed through the following links:
1. Dropbox: https://www.dropbox.com/sh/xzo66kmjufg2a91/AAD8TASzpcMES4x3mqKPKyFoa?dl=0
(for syllabus, readings, slides, datasets, etc.)

2. Rpubs: https://rpubs.com/hktseng
(for R code)

This class is graciously supported by DataCamp, the most intuitive learning platform for data science. Learn R, Python and SQL the way you learn best through a combination of short expert videos and hands-on-the-keyboard exercises. Take over 100+ courses by expert instructors on topics such as importing data, data visualization or machine learning and learn faster through immediate and personalised feedback on every exercise.

課程目標

After the completion of this course, students will be able to:

1. Distinguish and process different types of data.
2. Identify which classification models to use for a particular dataset and/or modeling assumptions.
3. Perform analytical tasks in R with real-world data.
4. Apply these analytical skills to their own research projects.

課程要求

I. Weekly Course Works (20%)

Making efforts to keep track course progress is essential. You are expected to have finished assigned readings before each week's meeting and practice assigned R analytical applications to increase your proficiency with key statistical learning concepts and R programming.

II. Assignments (40%)

A total of 4 data analysis assignments will be given every 3-4 week to give students hand-on opportunities to apply their analytical and programming skills to real data from selected topic areas. Students are allowed to form study group to discuss assignments, reference textbooks, or make use of crowdsourcing Q&A forums, such as stackoverflow, quora and reddit. Remember, the instructor and TA are always at your service.
Submitted assignments need to be your own works. You are encouraged to discuss assignments with your peers but you are FORBIDDEN to submit duplicated answer or have someone do the assignments for you.

III. Term Project: Research Paper (40%)

At the end of the semester, students are required to submit an analytical paper of approximately 10-12 pages (but no more than 15 pages), centering on drawing statistical inference from the analysis of a dataset (or multiple datasets). There will be no assigned topics; instead, students will use their own discretion to select research topics from the social science or other cognitive fields, so long as you are using the analytical concepts and tools acquired in this course to approach them. Students will need to submit their topics at the tenth class meeting and are encouraged to schedule an appointment with the instructor to discuss their topics.

預期每週課後學習時數

Office Hours

另約時間備註： By appointment

指定閱讀

參考書目

1. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, with Applications in R (New York, NY: Springer Nature, 2013).
2. J. Scott Long. 1997. Regression Models for Categorical and Limited Dependent Variables (Thousand Oaks, CA: SAGE Publications).

評量方式
(僅供參考)

No.	項目	百分比	說明
1.	Assignments	40%
2.	Term Project: Research Paper	40%
3.	Weekly Course Works	20%

課程進度

週次	日期	單元主題
第1週	2/23	Introduction
第2週	3/2	Regression Methods I
第3週	3/9	Regression Methods II
第4週	3/16	Dimensionality and Preprocessing
第5週	3/23	Nonlinear Regression I
第6週	3/30	Nonlinear Regression II
第7週	4/6	No Class
第8週	4/13	Statistical Learning I
第9週	4/20	Statistical Learning II
第10週	4/27	Short Film and Term Paper Q&A
第11週	5/4	Statistical Learning III
第12週	5/11	Dimensionality Reduction and Prediction Accuracy
第13週	5/18	Feature Engineering
第14週	5/25	Resampling Methods
第15週	6/1	Text mining
第16週	6/8	Drawing Inference from Text Data
第17週	6/15	Course wrap (or Network Analysis (TBA))
第18週	6/22	Final (No Class)