ISLR Ch4 Exercises #5, #13
- We now examine the differences between LDA and QDA.
\((a)\) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?
On the training set of data you would expect
LDAto perform better than theQDAif the decision boundary is linear and there are relatively few training observations. OtherwiseQDAwill perform better.
On the test set of data LDA will be better if the common correlation between
X_1andX_2have and the bayes decision boundary is linear.
\((b)\) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?
On the training set of data
QDAwill perform better thanLDAunless there are very few observations
On the test set of data
QDAwill perform better
\((c)\) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?
We expect the relative prediction accuracy of
QDAmodel to improve because it is more flexible than theLDAmodel. This flexibility allowsQDAto outpeform LDA because onceNbecomes large enough variance of the classifier is not a major concern, And with enoughK'sthe assumption of a common covariance matrix for theKclasses is unrealistic.
\((d)\) True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer.
True,LDAis much more affected by the variance in the observations thanQDAso unless there is extremely low variance in the data or very few observationsQDAshould outpeformLDA
- Using the Boston data set, fit classification models in order to predict whether a given suburb has a crime rate above or below the median. Explore logistic regression, LDA, and KNN models using various subsets of the predictors. Describe your findings.
Boston <- MASS::Boston %>% as.data.frame()
crim <- Boston %$% ifelse(crim < median(crim), 0 , 1)
Boston$crim.rate <- crim
X <- split(Boston, rep(1:2, nrow(Boston)/2))
Train <- as.data.frame(X[[1]])
Test <- as.data.frame(X[[2]])
Logisitc Regeression
fit.glm <- glm(crim.rate ~ . - crim - crim.rate,
data=Boston,
family=binomial,
subset=rownames(Train))
probs <- predict(fit.glm, Test, type = "response")
pred.glm <- ifelse(probs < .5, 0,1)
table(pred.glm, Test$crim.rate)
##
## pred.glm 0 1
## 0 116 11
## 1 12 114
mean(pred.glm != Test$crim.rate)
## [1] 0.09090909
Logistic regression over boston using every other row as training/testing data and all predictors resulted in a test error rate of 9.09%
fit.glm <- glm(crim.rate ~
indus + age+ dis + rad + ptratio + black + nox + indus*age,
data=Boston,
family=binomial,
subset=rownames(Train))
probs <- predict(fit.glm, Test, type = "response")
pred.glm <- ifelse(probs < .5, 0,1)
table(pred.glm, Test$crim.rate)
##
## pred.glm 0 1
## 0 117 9
## 1 11 116
mean(pred.glm != Test$crim.rate)
## [1] 0.07905138
A second logistic regression model using \(indus^1\) \(age^2\) \(dis^3\) \(ptratio^4\) \(black^5\) \(nox^6\) and \(indus*age^7\) resulted in a test error rate of 7.91%
indus: proportion of non-retail business acres per town.age: proportion of owner-occupied units built prior to 1940.dis: weighted mean of distances to five Boston employment centres.ptratio: pupil-teacher ratio by town.black: \(1000(Bk - 0.63)^2\) whereBkis the proportion of blacks by town.nox: nitrogen oxides concentration (parts per 10 million).indus*age: the interaction betweenindusandage
LDA
lda.fit <- MASS::lda(crim.rate ~ . - crim - crim.rate,
data=Boston,
family=binomial,
subset=rownames(Train))
probs <- predict(lda.fit, Test, model = "response")
table(probs$class, Test$crim.rate)
##
## 0 1
## 0 124 28
## 1 4 97
mean(probs$class != Test$crim.rate)
## [1] 0.1264822
LDA over boston using every other row as training/testing data and all predictors resulted in a test error rate of 12.65%
lda.fit <- MASS::lda(crim.rate ~
indus + age + dis + rad + ptratio + black + nox + indus*age,
data=Boston,
family=binomial,
subset=rownames(Train))
probs <- predict(lda.fit, Test, model = "response")
table(probs$class, Test$crim.rate)
##
## 0 1
## 0 122 29
## 1 6 96
mean(probs$class != Test$crim.rate)
## [1] 0.1383399
A second LDA over boston using the same predictors as the second GLM resulted in a test error rate of 15.02%
KNN
set.seed(1)
best <- .Machine$integer.max
best.i <- 0
worst <- 0
worst.i <- 0
for(i in 1:253){
k <- class::knn(Train,Test, Train$crim.rate, k = i)
error <- 100 * mean(k != Train$crim.rate)
if(error < 10) {
cat(sprintf("<h5>K == %d</h5>",i))
cat(pander(table(k,Test$crim.rate)))
cat(sprintf("<p>When K == %d, KNN has a test error rate of %.2f%%</p>",i,error))
}
if(error < best) {
best <- error
best.i <- i
}
if(error > worst) {
worst <- error
worst.i <- i
}
}
K == 1
| 0 | 1 | |
|---|---|---|
| 0 | 116 | 10 |
| 1 | 12 | 115 |
When K == 1, KNN has a test error rate of 8.30%
K == 3
| 0 | 1 | |
|---|---|---|
| 0 | 116 | 11 |
| 1 | 12 | 114 |
When K == 3, KNN has a test error rate of 9.49%
K == 4
| 0 | 1 | |
|---|---|---|
| 0 | 117 | 11 |
| 1 | 11 | 114 |
When K == 4, KNN has a test error rate of 9.88%