|
Course title |
R programming and application to Public Health data |
|
Semester |
113-2 |
|
Designated for |
COLLEGE OF PUBLIC HEALTH Health Data Analytics and Statistics |
|
Instructor |
Amrita Chattopadhyay |
|
Curriculum Number |
EPM5060 |
|
Curriculum Identity Number |
849EU0600 |
|
Class |
|
|
Credits |
3.0 |
|
Full/Half Yr. |
Half |
|
Required/ Elective |
Elective |
|
Time |
Thursday 2,3,4(9:10~12:10) |
|
Remarks |
The upper limit of the number of students: 30. |
|
|
|
|
Course introduction video |
|
|
Table of Core Capabilities and Curriculum Planning |
Table of Core Capabilities and Curriculum Planning |
|
Course Syllabus
|
|
Please respect the intellectual property rights of others and do not copy any of the course information without permission
|
|
Course Description |
This course aims to provide a thorough introduction to R programming skills and enable students with comprehensive understanding and practical experience in public health data analysis. The course is structured into two distinct sections. The first section will train the students on using R, statistical software (freely available), towards writing smart codes for accomplishing data manipulation, data-processing and statistical analysis. In section 2, the students will be provided with real health data and will be trained to conduct a step by step analyses protocol implementing the techniques that they learnt in section 1. Additionally, a theoretical introduction will be provided at the beginning of each class to ensure a wholesome understanding of the concepts underlying each days task.
• R-programming: importing data, data handling and manipulation, resampling strategies, statistical analysis techniques encompassing descriptive statistics, testing of hypothesis and regression. Data visualization techniques using ggplot2 and r-base plots, reading plots towards correct interpretation.
• Health datasets analysis: Real de-identified datasets will be provided. Alternately, students can acquire health datasets by themselves (if they want) or use their own research datasets too. The students will be trained and guided to conduct hands-on analysis in a step-by step manner to accomplish descriptive data analysis, variable selection, association analysis/survival analysis, on the provided dataset(s). The students will also be allowed to apply any bioinformatics tools for visualization techniques.
Combining R-programming, theoretical introduction along with hands-on analysis, the course equips participants with the skills to effectively analyze public health datasets and make informed, data-driven decisions in their research and practice. The course will for most part be computer based.
|
|
Course Objective |
Upon completion the students will be able tosuccesfully do the following
• Use R to conduct all kinds of data manipulation and data cleaning
• Develop statistical thinking and apply statistics in modern public health research and practice
• Describe a data set using descriptive statistics and graphical methods as an initial step for more advanced analysis in R software.
• Implement suitable methods to formulate and analyze statistical associations between variables in a data set using R.
• Interpret the results and provide potential explanations for the findings.
Skills that the student will gain:
• Data analysis with R
• Linear Regression
• Logistic regression
• Group comparison testing
• Survival analysis
• Visualization of data
• Statistical thinking.
|
|
Course Requirement |
Biostatistics, Statistics, Basic programming (optional), Data preprocessing (optional), Data acquisition (optional) |
|
Student Workload (Expected weekly study hours before and/or after class) |
|
|
Office Hours |
Note: by appointment |
|
Designated reading |
1. Biostatistics with R, a guide for Medical doctors, Marco Moscarelli, Springer
2. A learning guide to R, Remko Duursma, Jeff Powel, Glenn Stone, Western Sydney University
3. Survival Analysis in Medicine and Genetics, Jialiang Li, Shuangge Ma, Chapman and Hall
4. Working with Data in Public Health, A practical pathway with R, Peng Zhao, Springer
|
|
References |
1. The Practical Guide to Clinical Research and Publication, Academic Press, Uzuung Yoon
2. Practical Clinical Research Design and Application, A Primer for Physicians, Surgeons, and Clinical Healthcare Professionals, Springer Open, Peter D. Fabricant
|
|
Grading |
|
No. |
Item |
% |
Explanations for the conditions |
|
1. |
Attendance, Class involvement, class interaction and participation: Evaluations will be done by end of class progress for every week, attendance, and interaction with the teacher. |
20% |
|
2. |
homework: Core capacity A, B, C, E and F will be judged by the home-work assignments each week, which is critical for continuance of analysis in the following week. |
30% |
|
3. |
midterm: the core capacity A, C, E, and F will be evaluated by their written report and Q&A. |
25% |
|
4. |
final analysis and presentation: the core capacity A,B, C, E, and F will be evaluated by dataset analysis report, oral presentation in class and Q&A. |
25% |
|
- NTU has not set an upper limit on the percentage of A+ grades.
- NTU uses a letter grade system for assessment. The grade percentage ranges and the single-subject grade conversion table in the NATIONAL TAIWAN UNIVERSITY Regulations Governing Academic Grading are for reference only. Instructors may adjust the percentage ranges according to the grade definitions. For more information, see the Assessment for Learning Section.
|
|
Week |
Date |
Topic |
|
Week 1 |
2/20 |
R course day 1
a. Course introduction
b. R, R studio, R package installations
c. Data Import, data frames, Tibbles
|
|
Week 2 |
2/27 |
Application Day 1
Real dataset: Health data
a. Data import using R
b. Data structure using R
c. R object and datatypes
d. R vector and matrix
e. Recoding categorical data using R
f. Missing data using R
g. Data cleaning using R
|
|
Week 3 |
3/06 |
R course day 2
Exploratory data analysis using R
a. R subsetting
b. R functions
c. R loops
d. descriptive statistics using R
e. Summary statistics using R
f. R base plots
|
|
Week 4 |
3/13 |
Application day 2
Health data: Categorical variables
a. Data cleaning using R
b. Missing data using R
c. Data manipulation using R
d. Stratification (by categories) using R
e. Subsetting (desired columns and rows) using R
f. Descriptive statistics, and summary statistics using R
g. Visualize categorical data using R: Barplots, piecharts, dotplots
|
|
Week 5 |
3/20 |
Application day 3
Health data: Continuous (numeric) variables
a. Data cleaning using R
b. Missing data removal/imputation using R
c. Data manipulation using R
d. Subsetting using R
e. Descriptive statistics, and summary statistics using R
f. Visualize continuous data using R: Histograms, Boxplots
|
|
Week 6 |
3/27 |
Review week: mock exercise
Write R codes on a practice data:
a. Data import
b. Data cleaning
c. Data structure
d. Identify numerical and categorical data
e. Create subsets
f. Descriptive/summary statistics
g. Visualization
|
|
Week 7 |
4/03 |
Midterm practical exam
Use R to analyze Real health data
1. Compare subdata groups –categorical
2. Compare sub data groups-continuous
3. Descriptive analysis
4. Visualizations
|
|
Week 8 |
4/10 |
R course Day 3
a. Regression Using R: logistic, linear, Cox-proportional Hazards
b. Parametric tests using R
c. Nonparametric tests using R
d. ANOVA tests using R to compare more than two groups
e. Visualization using R: R base plots, ggplot2: Scatterplots, correlation plots
|
|
Week 9 |
4/17 |
Application Day 4
Dataset: Health Data: discrete variables
Statistical tests using R
a. Create contingency tables
b. Fisher exact test, Chi-Square test, Rank tests, Kruskal Wallis test
c. One-sided test, two sided test
|
|
Week 10 |
4/24 |
Application Day 5
Dataset: Health data - Continuous variables
Statistical tests using R
a. Normality testing: Shapiro Wilks test
b. Equality of Variance test: Bartlett test
c. Proportion test, Z-test, T-test, Mann-Whitney test, Wilcoxon rank sum test.
d. One sided and two sided tests
|
|
Week 11 |
5/01 |
Application Day 6:
Dataset: Health dataset
a. Linear regression
b. Variable selection strategies using R
c. Correlation analysis and correlation test using R
d. Data fitting and visualization (scatterplots), correlation plots using R
|
|
Week 12 |
5/08 |
Application Day 7:
Dataset: Health dataset
a. Association analysis, Logistic regression
b. Variable selection strategies using R
c. Correlation analysis and correlation test using R
d. Data fitting and visualization (scatterplots), correlation plots using R
|
|
Week 13 |
5/15 |
R course day 4:
Survival data analysis using R
a. Pre-Processing
b. Kaplan Meier Analysis using R
c. Cox Proportional Hazards regression using R
d. Univariate and multivariate analysis using R
e. Model performance metrics using R
a. Cross-validation using R, R loops |
|
Week 14 |
5/22 |
Application Day 8:
Survival data analysis using R
a. Kaplan Meier analysis
b. Cox-Proportional Hazards regression analysis
c. KM plots, Forest plots
d. Discriminant analysis
e. Cross-validation performance analysis
|
|
Week 15 |
5/29 |
Review week: mock exercise
Use example datasets to do the following
a. Statistical tests using R (discrete and continuous)
b. Regression analysis using R
c. Survival analysis using R
|
|
Week 16 |
6/5 |
Final Practical Exam
End to end data analysis (health data or survival data)
|
|