Session 3 - Correlations and Regression Models

Session 3

Correlation & Regression Models

Session Contents

Correlation Analysis
Regression models

Pearson & Spearman Correlations in R

To compute correlations in R, use the cor() function. Pearson measures linear relationships, while Spearman is rank-based.

# Load example data
data <- data.frame(
var1 = rnorm(100),  # Continuous variable
var2 = rnorm(100)   # Another continuous variable

# Pearson correlation (linear relationship)
cor(data$var1, data$var2, method = "pearson")

# Spearman correlation (monotonic relationship)
cor(data$var1, data$var2, method = "spearman")

Interpreting Correlation Statistics in R

When computing correlations in R, key statistics help interpret the results:

Correlation coefficient (r / ρ): Measures the strength and direction of the relationship.
p-value: Tests whether the correlation is statistically significant.
Confidence interval: Indicates the range within which the true correlation lies.

Regression models

Simple & Adjusted Linear Regression in R

A simple linear regression includes one predictor, while an adjusted model controls for additional variables.

Simple Model: Outcome (Y) predicted by a single predictor (X).
Adjusted Model: Additional covariates (Z1, Z2) are included to account for confounding.

# Load example dataset
data <- data.frame(
    Y  = rnorm(100, mean = 50, sd = 10),  # Outcome variable
    X  = rnorm(100, mean = 20, sd = 5),   # Main predictor
    Z1 = rnorm(100, mean = 10, sd = 3),   # Adjusting variable 1
    Z2 = rnorm(100, mean = 30, sd = 7)    # Adjusting variable 2
)

# Simple linear regression
model_simple <- lm(Y ~ X, data = data)
summary(model_simple)

# Adjusted linear regression
model_adjusted <- lm(Y ~ X + Z1 + Z2, data = data)
summary(model_adjusted)

Understanding Linear Regression Output

When running summary(lm()) in R, key statistics help interpret the model:

Estimate (β): Coefficient that quantifies the effect of the predictor.
Std. Error: Variability of the coefficient estimate.
p-value: Probability that the coefficient is significantly different from 0.
R² (R-squared): Proportion of variance in Y explained by the model.

# Run a simple linear regression
model <- lm(Y ~ X, data = data)
summary(model)

Logistic Regression in R

Logistic regression models the probability of a binary outcome using a predictor variable.

Outcome (Y): Binary variable (e.g., 1 = Success, 0 = Failure).
Predictor (X): Continuous or categorical variable affecting Y.

# Load example dataset
data <- data.frame(
    Y = rbinom(100, 1, prob = 0.5),  # Binary outcome (0 or 1)
    X = rnorm(100, mean = 20, sd = 5)  # Predictor variable

# Logistic regression model
model_logistic <- glm(Y ~ X, data = data, family = binomial)
summary(model_logistic)

Understanding Logistic Regression Output

The summary(glm()) function in R provides key statistics to interpret a logistic regression model:

Estimate (β): Log-odds change for each unit increase in the predictor.
Std. Error: Variability of the coefficient estimate.
p-value: Probability that the coefficient is significantly different from 0.

# Run a logistic regression
model_logistic <- glm(Y ~ X, data = data, family = binomial)
summary(model_logistic)

Differences Between Linear and Logistic Regression

Characteristic	Linear Regression	Logistic Regression
Type of Y Variable	Continuous (e.g., blood pressure)	Binary (e.g., disease: Yes/No)
Interpretation of β	Change in Y per unit increase in X	Change in log-odds of Y per unit increase in X
Predictions	Continuous values	Probabilities (between 0 and 1)

Interpretation of Logistic RM

# Fit logistic regression model
model <- glm(Y ~ X, data = data, family = binomial)

# Extract coefficients and standard errors
coef_table <- summary(model)$coefficients
beta <- coef_table["X", "Estimate"]  # Logistic regression coefficient
se <- coef_table["X", "Std. Error"]  # Standard error

# Convert to OR and 95% CI
OR <- exp(beta)
CI_lower <- exp(beta - 1.96 * se)
CI_upper <- exp(beta + 1.96 * se)

# Print results
c(OR = OR, CI_lower = CI_lower, CI_upper = CI_upper)