Course Information
Course title
R programming and application to Public Health data 
Semester
113-2 
Designated for
COLLEGE OF PUBLIC HEALTH  Health Data Analytics and Statistics  
Instructor
Amrita Chattopadhyay 
Curriculum Number
EPM5060 
Curriculum Identity Number
849EU0600 
Class
 
Credits
3.0 
Full/Half
Yr.
Half 
Required/
Elective
Elective 
Time
Thursday 2,3,4(9:10~12:10) 
Remarks
The upper limit of the number of students: 30. 
 
Course introduction video
 
Table of Core Capabilities and Curriculum Planning
Table of Core Capabilities and Curriculum Planning
Course Syllabus
Please respect the intellectual property rights of others and do not copy any of the course information without permission
Course Description

This course aims to provide a thorough introduction to R programming skills and enable students with comprehensive understanding and practical experience in public health data analysis. The course is structured into two distinct sections. The first section will train the students on using R, statistical software (freely available), towards writing smart codes for accomplishing data manipulation, data-processing and statistical analysis. In section 2, the students will be provided with real health data and will be trained to conduct a step by step analyses protocol implementing the techniques that they learnt in section 1. Additionally, a theoretical introduction will be provided at the beginning of each class to ensure a wholesome understanding of the concepts underlying each days task.
• R-programming: importing data, data handling and manipulation, resampling strategies, statistical analysis techniques encompassing descriptive statistics, testing of hypothesis and regression. Data visualization techniques using ggplot2 and r-base plots, reading plots towards correct interpretation.
• Health datasets analysis: Real de-identified datasets will be provided. Alternately, students can acquire health datasets by themselves (if they want) or use their own research datasets too. The students will be trained and guided to conduct hands-on analysis in a step-by step manner to accomplish descriptive data analysis, variable selection, association analysis/survival analysis, on the provided dataset(s). The students will also be allowed to apply any bioinformatics tools for visualization techniques.
Combining R-programming, theoretical introduction along with hands-on analysis, the course equips participants with the skills to effectively analyze public health datasets and make informed, data-driven decisions in their research and practice. The course will for most part be computer based.
 

Course Objective
Upon completion the students will be able tosuccesfully do the following
• Use R to conduct all kinds of data manipulation and data cleaning
• Develop statistical thinking and apply statistics in modern public health research and practice
• Describe a data set using descriptive statistics and graphical methods as an initial step for more advanced analysis in R software.
• Implement suitable methods to formulate and analyze statistical associations between variables in a data set using R.
• Interpret the results and provide potential explanations for the findings.
Skills that the student will gain:
• Data analysis with R
• Linear Regression
• Logistic regression
• Group comparison testing
• Survival analysis
• Visualization of data
• Statistical thinking.
 
Course Requirement
Biostatistics, Statistics, Basic programming (optional), Data preprocessing (optional), Data acquisition (optional) 
Student Workload (Expected weekly study hours before and/or after class)
 
Office Hours
Note: by appointment 
Designated reading
1. Biostatistics with R, a guide for Medical doctors, Marco Moscarelli, Springer
2. A learning guide to R, Remko Duursma, Jeff Powel, Glenn Stone, Western Sydney University
3. Survival Analysis in Medicine and Genetics, Jialiang Li, Shuangge Ma, Chapman and Hall
4. Working with Data in Public Health, A practical pathway with R, Peng Zhao, Springer
 
References
1. The Practical Guide to Clinical Research and Publication, Academic Press, Uzuung Yoon
2. Practical Clinical Research Design and Application, A Primer for Physicians, Surgeons, and Clinical Healthcare Professionals, Springer Open, Peter D. Fabricant
 
Grading
 
No.
Item
%
Explanations for the conditions
1. 
Attendance, Class involvement, class interaction and participation: Evaluations will be done by end of class progress for every week, attendance, and interaction with the teacher.  
20% 
 
2. 
homework: Core capacity A, B, C, E and F will be judged by the home-work assignments each week, which is critical for continuance of analysis in the following week. 
30% 
 
3. 
midterm: the core capacity A, C, E, and F will be evaluated by their written report and Q&A. 
25% 
 
4. 
final analysis and presentation: the core capacity A,B, C, E, and F will be evaluated by dataset analysis report, oral presentation in class and Q&A. 
25% 
 
  1. NTU has not set an upper limit on the percentage of A+ grades.
  2. NTU uses a letter grade system for assessment. The grade percentage ranges and the single-subject grade conversion table in the NATIONAL TAIWAN UNIVERSITY Regulations Governing Academic Grading are for reference only. Instructors may adjust the percentage ranges according to the grade definitions. For more information, see the Assessment for Learning Section.
 
Progress
Week
Date
Topic
Week 1
2/20  R course day 1
a. Course introduction
b. R, R studio, R package installations
c. Data Import, data frames, Tibbles

 
Week 2
2/27  Application Day 1
Real dataset: Health data
a. Data import using R
b. Data structure using R
c. R object and datatypes
d. R vector and matrix
e. Recoding categorical data using R
f. Missing data using R
g. Data cleaning using R
 
Week 3
3/06  R course day 2
Exploratory data analysis using R
a. R subsetting
b. R functions
c. R loops
d. descriptive statistics using R
e. Summary statistics using R
f. R base plots
 
Week 4
3/13  Application day 2
Health data: Categorical variables
a. Data cleaning using R
b. Missing data using R
c. Data manipulation using R
d. Stratification (by categories) using R
e. Subsetting (desired columns and rows) using R
f. Descriptive statistics, and summary statistics using R
g. Visualize categorical data using R: Barplots, piecharts, dotplots
 
Week 5
3/20  Application day 3
Health data: Continuous (numeric) variables
a. Data cleaning using R
b. Missing data removal/imputation using R
c. Data manipulation using R
d. Subsetting using R
e. Descriptive statistics, and summary statistics using R
f. Visualize continuous data using R: Histograms, Boxplots
 
Week 6
3/27  Review week: mock exercise
Write R codes on a practice data:
a. Data import
b. Data cleaning
c. Data structure
d. Identify numerical and categorical data
e. Create subsets
f. Descriptive/summary statistics
g. Visualization
 
Week 7
4/03  Midterm practical exam
Use R to analyze Real health data
1. Compare subdata groups –categorical
2. Compare sub data groups-continuous
3. Descriptive analysis
4. Visualizations
 
Week 8
4/10  R course Day 3
a. Regression Using R: logistic, linear, Cox-proportional Hazards
b. Parametric tests using R
c. Nonparametric tests using R
d. ANOVA tests using R to compare more than two groups
e. Visualization using R: R base plots, ggplot2: Scatterplots, correlation plots
 
Week 9
4/17  Application Day 4
Dataset: Health Data: discrete variables
Statistical tests using R
a. Create contingency tables
b. Fisher exact test, Chi-Square test, Rank tests, Kruskal Wallis test
c. One-sided test, two sided test
 
Week 10
4/24  Application Day 5
Dataset: Health data - Continuous variables
Statistical tests using R
a. Normality testing: Shapiro Wilks test
b. Equality of Variance test: Bartlett test
c. Proportion test, Z-test, T-test, Mann-Whitney test, Wilcoxon rank sum test.
d. One sided and two sided tests
 
Week 11
5/01  Application Day 6:
Dataset: Health dataset
a. Linear regression
b. Variable selection strategies using R
c. Correlation analysis and correlation test using R
d. Data fitting and visualization (scatterplots), correlation plots using R
 
Week 12
5/08  Application Day 7:
Dataset: Health dataset
a. Association analysis, Logistic regression
b. Variable selection strategies using R
c. Correlation analysis and correlation test using R
d. Data fitting and visualization (scatterplots), correlation plots using R
 
Week 13
5/15  R course day 4:
Survival data analysis using R
a. Pre-Processing
b. Kaplan Meier Analysis using R
c. Cox Proportional Hazards regression using R
d. Univariate and multivariate analysis using R
e. Model performance metrics using R
a. Cross-validation using R, R loops 
Week 14
5/22  Application Day 8:
Survival data analysis using R
a. Kaplan Meier analysis
b. Cox-Proportional Hazards regression analysis
c. KM plots, Forest plots
d. Discriminant analysis
e. Cross-validation performance analysis
 
Week 15
5/29  Review week: mock exercise
Use example datasets to do the following
a. Statistical tests using R (discrete and continuous)
b. Regression analysis using R
c. Survival analysis using R
 
Week 16
6/5  Final Practical Exam
End to end data analysis (health data or survival data)