課程資訊
課程名稱
資料科學之計算方法與工具
Computational Methods and Tools for Data Science 
開課學期
104-2 
授課對象
理學院  應用數學科學研究所  
授課教師
王偉仲 
課號
MATH5024 
課程識別碼
221 U6820 
班次
 
學分
全/半年
半年 
必/選修
選修 
上課時間
星期二7,8,9(14:20~17:20) 
上課地點
天數101 
備註
與陳君厚合開
總人數上限:80人 
Ceiba 課程網頁
http://ceiba.ntu.edu.tw/1042MATH5024_DS 
課程簡介影片
 
核心能力關聯
本課程尚未建立核心能力關連
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

課程配合網站:https://sites.google.com/a/math.ntu.edu.tw/t-2016s-data/


This course will cover the following topics.

(I) Singular value decomposition (SVD) and principal component analysis (PCA)
- Fundamentals and computations of SVD
- Fundamentals and computations of PCA
- Proper orthogonal modes and robust PCA
- Random sampling and random projections
- Randomized algorithms for low-rank matrix approximation
- Application in oscillating mass and dimensionality reduction
- Hands-on experiments and implementations in MATLAB

(II) Extended SVD and PCA
- Sparse SVD and sparse PCA
- Nonnegative SVD and nonnegative PCA
- Tensor decomposition, high order SVD, multilinear PCA, and nonnegative tensor decomposition
- Multilinear algebra
- Application in Cryo electron microscopy images
- Hands-on experiments and implementations in MATLAB

(III) Independent Component Analysis (ICA)
- Introduction to ICA
- Principal component analysis (whitening) as the preprocessing
- ICA via optimization approaches
- ICA by maximization of nongaussianity
- ICA by maximum likelihood estimation
- ICA by minimization of mutual information
- Applications in blind source separation for audio signals and images
- Hands-on experiments and implementations in MATLAB

(IV) Image Processing and Analysis
- Basic concepts of images
- Linear filtering for image denoising
- Diffusion and image processing
- Image recognition via machine learning
- SVD and linear discrimination analysis
- Application in real images
- Hands-on experiments and implementation in MATLAB

(V) Visualization and Exploratory Data Analysis (EDA)
- Interactive graphics and EDA
- General introduction to statistical graphics and matrix visualization
- Visualization for continuous and binary data
- Visualization for categorical data
- Visualization for data with cartography links
- Covariate-adjusted data visualization and visualization for other types of data and modeling
- Visualization for symbolic data and for big data 

課程目標
This course intends to prepare students for solving contemporary data science problems numerically. Focusing on latest data science applications, successful students of this course will be equipped with a solid background and proficiency in modern yet fundamental computational methods and tools, so that they will be able to translate theoretical concepts into working computer programs. 
課程要求
Programming languages (mainly in MATLAB while C is helpful to projects), Calculus, Linear Algebra, Statistics 
預期每週課後學習時數
 
Office Hours
另約時間 
參考書目
- Randomized Algorithms for Matrices and Data (2010), Michael W. Mahoney
- Applied Numerical Linear Algebra (1997), by James W. Demmel
- Numerical Linear Algebra (1997) by Lloyd N. Trefethen and David Bau III
- An Introduction to Parallel Programming Hardcover (2011), Peter Pacheco
- Parallel Computing for Data Science: With Examples in R, C++ and CUDA (2015),
Norman Matloff
- Independent Component Analysis (2001), Aapo Hyvärinen, Juha Karhunen, Erkki Oja
- Exploratory Data Analysis (1977), John W. Tukey.
- Tensor Decompositions and Applications (2009), Tamara G. Kolda and Brett W. Bader
- Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way
Data Analysis and Blind Source Separation (2009), Andrzej Cichocki, Rafal Zdunek, Anh
Huy Phan, Shun-ichi Amari 
指定閱讀
(1) Data-Driven Modeling & Scientific Computation: Methods for Complex Systems and Big Data Paperback (2013), J. Nathan Kutz (Lectures Notes: http://goo.gl/otL0qc)
(2) Handbook of Data Visualization (2008), Editors: Chun-houh Chen, Wolfgang Karl Härdle, Antony Unwin, (Eds.) (Springer Handbooks of Computational Statistics). To download chapters: http://www.springer.com/us/book/9783540330363 
評量方式
(僅供參考)
 
No.
項目
百分比
說明
1. 
Homeworks 
30% 
 
2. 
Team Project 
40% 
 
3. 
Midterm 
30% 
 
 
課程進度
週次
日期
單元主題