ISLR Ch 3

Mar 5, 2018 10 min read r, ISLR

ISLR Ch3 Exercises #4, #15

I collect a set of data (`n = 100` observations) containing a single predictor and a quantitative response. I then fit a linear regression model to the data, as well as a separate cubic regression, i.e. \(Y = \beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\beta_{3}X^{3}+\epsilon\).

\((a)\) Suppose that the true relationship between `X` and `Y` is `linear`, i.e. \(Y = \beta_0 + \beta_{1}X + \epsilon\). Consider the training residual sum of squares `(RSS)` for the linear regression, and also the training `RSS` for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.

We would expect RSS to be lower for the cubic regression. This model is much more flexible and will allow for the line to be much closer to the training data set.

\((b)\) Answer \((a)\) using test rather than training RSS.

We would expect the RSS to be lower for the linear regression. Since the true underlying relationship is linear the cubic regression is more likely to overpredict on the training set and therefor have a higher RSS for the test set.

\((c)\) Suppose that the true relationship between `X` and `Y` is not linear, but we don’t know how far it is from linear. Consider the training RSS for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.

The cubic regression model will perform better on the training set because it has greater degrees of freedom and the model is also non linear.

\((d)\) Answer \((c)\) using test rather than training RSS.

It is very possible that the cubic regression model will perform better on the test rather than the training data than the linear regression model but. If the cubic regression model is over trained on the training data then the linear regerssion model might strike a better line through the test data data and have a lower RSS

This problem involves the `Boston` data set, which we saw in the lab for this chapter. We will now try to predict `per capita crime rate` using the other variables in this data set. In other words, `per capita crime rate` is the `response`, and the other variables are the predictors.

Boston <- MASS::Boston

\((a)\) For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

library(magrittr)
lm.age     <- Boston %$% lm(crim~age)
lm.black   <- Boston %$% lm(crim~black)
lm.chas    <- Boston %$% lm(crim~chas)
lm.dis     <- Boston %$% lm(crim~dis)
lm.indus   <- Boston %$% lm(crim~indus)
lm.lstat   <- Boston %$% lm(crim~lstat)
lm.medv    <- Boston %$% lm(crim~medv)
lm.nox     <- Boston %$% lm(crim~nox)
lm.ptratio <- Boston %$% lm(crim~ptratio)
lm.rad     <- Boston %$% lm(crim~rad)
lm.rm      <- Boston %$% lm(crim~rm)
lm.tax     <- Boston %$% lm(crim~tax)
lm.zn      <- Boston %$% lm(crim~zn)

the following variables had a statistically signifigant associate between crim and themselves age, black, dis, lstat, medv, ptratio, rad, rm, tax, zn

\((b)\) Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis \(H_0\) : \(\beta_j = 0\)?

mult.reg.all <- lm(crim~., data=Boston)
pander(anova(mult.reg.all))

Analysis of Variance Table
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
zn	1	1502	1502	36.21	3.457e-09
indus	1	4689	4689	113.1	6.469e-24
chas	1	247.8	247.8	5.976	0.01485
nox	1	1271	1271	30.65	5.041e-08
rm	1	138.5	138.5	3.341	0.0682
age	1	165.5	165.5	3.992	0.04628
dis	1	300.1	300.1	7.237	0.007383
rad	1	7238	7238	174.6	2.519e-34
tax	1	3.311	3.311	0.07984	0.7776
ptratio	1	7.281	7.281	0.1756	0.6754
black	1	455.3	455.3	10.98	0.000989
lstat	1	497.7	497.7	12	0.0005772
medv	1	447.9	447.9	10.8	0.001087
Residuals	492	20400	41.46	NA	NA

we can drop the following predictors, tax, ptratio. rm is almost able to be rejected. Next chas and age are on the chopping block. Lastly age and medv also don’t have *** signifigance

\((c)\) How do your results from \((a)\) compare to your results from \((b)\)? Create a plot displaying the univariate regression coefficients from \((a)\) on the `x-axis`, and the multiple regression coefficients from \((b)\) on the `y-axis`. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the `x-axis`, and its coefficient estimate in the multiple linear regression model is shown on the `y-axis`.

from the plot you can see that nox is varies quite a lot from the univariate regression coefficients {\(31.25\)} to the multivariate regression coefficients {\(-10.32\)}

Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form \(Y = \beta_{0} + \beta_{1}{X} + \beta_{2}{X}^{2} + \beta_{3}{X}^{3} + \epsilon\).

The colors represent how far apart two predictors are from eachother. For example when lstat is high medv is low. And when rad is high tax is also high. The more red something is the more they differ, and the more green something is the more they are simmilar. Yellow indicates that there is almost no correlation between the two variables.

\(age.\)

lm.age.d <- Boston %$% lm(crim~poly(age,3))
pander(summary(lm.age.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3485	10.37	5.919e-23
poly(age, 3)1	68.18	7.84	8.697	4.879e-17
poly(age, 3)2	37.48	7.84	4.781	2.291e-06
poly(age, 3)3	21.35	7.84	2.724	0.00668

the cubic polynomial is not statistically signifigant

\(black.\)

lm.black.d <- Boston %$% lm(crim~poly(black,3))
pander(summary(lm.black.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3536	10.22	2.14e-22
poly(black, 3)1	-74.43	7.955	-9.357	2.73e-19
poly(black, 3)2	5.926	7.955	0.745	0.4566
poly(black, 3)3	-4.835	7.955	-0.6078	0.5436

the cubic coefficient is not statistically signifigant

\(chas.\)

lm.chas.d <- Boston %$% lm(crim~poly(chas,1))
pander(summary(lm.chas.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3822	9.455	1.216e-19
poly(chas, 1)	-10.8	8.597	-1.257	0.2094

can only be evaulated to one degree because chas is a boolean variable (1’s and 0’s) and it is not statistically signifigant

\(dis.\)

lm.dis.d <- Boston %$% lm(crim~poly(dis,3))
pander(summary(lm.dis.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3259	11.09	1.06e-25
poly(dis, 3)1	-73.39	7.331	-10.01	1.253e-21
poly(dis, 3)2	56.37	7.331	7.689	7.87e-14
poly(dis, 3)3	-42.62	7.331	-5.814	1.089e-08

all three polynomials are statistically signifigant

\(indus.\)

lm.indus.d <- Boston %$% lm(crim~poly(indus,3))
pander(summary(lm.indus.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.33	10.95	3.606e-25
poly(indus, 3)1	78.59	7.423	10.59	8.854e-24
poly(indus, 3)2	-24.39	7.423	-3.286	0.001086
poly(indus, 3)3	-54.13	7.423	-7.292	1.196e-12

the square polynomial is not statistically signifigant

\(lstat.\)

lm.lstat.d <- Boston %$% lm(crim~poly(lstat,3))
pander(summary(lm.lstat.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3392	10.65	4.939e-24
poly(lstat, 3)1	88.07	7.629	11.54	1.678e-27
poly(lstat, 3)2	15.89	7.629	2.082	0.0378
poly(lstat, 3)3	-11.57	7.629	-1.517	0.1299

the square and cubic polynomials are not statistically signifigant

\(medv.\)

lm.medv.d <- Boston %$% lm(crim~poly(medv,3))
pander(summary(lm.medv.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.292	12.37	7.024e-31
poly(medv, 3)1	-75.06	6.569	-11.43	4.931e-27
poly(medv, 3)2	88.09	6.569	13.41	2.929e-35
poly(medv, 3)3	-48.03	6.569	-7.312	1.047e-12

all three polynomials are statistically signifigant

\(nox.\)

lm.nox.d <- Boston %$% lm(crim~poly(nox,3))
pander(summary(lm.nox.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3216	11.24	2.743e-26
poly(nox, 3)1	81.37	7.234	11.25	2.457e-26
poly(nox, 3)2	-28.83	7.234	-3.985	7.737e-05
poly(nox, 3)3	-60.36	7.234	-8.345	6.961e-16

all three polynomials are statistically signifigant

\(ptratio.\)

lm.ptratio.d <- Boston %$% lm(crim~poly(ptratio,3))
pander(summary(lm.ptratio.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.361	10.01	1.271e-21
poly(ptratio, 3)1	56.05	8.122	6.901	1.565e-11
poly(ptratio, 3)2	24.77	8.122	3.05	0.002405
poly(ptratio, 3)3	-22.28	8.122	-2.743	0.006301

the cubic polynomial is not statistically signifigant, and the square polynomial is barely statistically signifigant

\(rad.\)

lm.rad.d <- Boston %$% lm(crim~poly(rad,3))
pander(summary(lm.rad.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.2971	12.16	5.15e-30
poly(rad, 3)1	120.9	6.682	18.09	1.053e-56
poly(rad, 3)2	17.49	6.682	2.618	0.009121
poly(rad, 3)3	4.698	6.682	0.7031	0.4823

the cubic polynomial is not statistically signifigant, and the square polynomial is barely statistically signifigant, less so than for ptratio

\(rm.\)

lm.rm.d <- Boston %$% lm(crim~poly(rm,3))
pander(summary(lm.rm.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3703	9.758	1.027e-20
poly(rm, 3)1	-42.38	8.33	-5.088	5.128e-07
poly(rm, 3)2	26.58	8.33	3.191	0.001509
poly(rm, 3)3	-5.51	8.33	-0.6615	0.5086

the cubic polynomial is not statistically signifigant, and the square polynomial is just almost statistically signifigant but not quite

\(tax.\)

lm.tax.d <- Boston %$% lm(crim~poly(tax,3))
pander(summary(lm.tax.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3047	11.86	8.956e-29
poly(tax, 3)1	112.6	6.854	16.44	6.976e-49
poly(tax, 3)2	32.09	6.854	4.682	3.665e-06
poly(tax, 3)3	-7.997	6.854	-1.167	0.2439

the cubic polynomial is not statistically signifigant

\(zn.\)

lm.zn.d <- Boston %$% lm(crim~poly(zn,3))
pander(summary(lm.zn.d)$coefficients)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	3.614	0.3722	9.709	1.547e-20
poly(zn, 3)1	-38.75	8.372	-4.628	4.698e-06
poly(zn, 3)2	23.94	8.372	2.859	0.004421
poly(zn, 3)3	-10.07	8.372	-1.203	0.2295

the cubic polynomial is not statistically signifigant, and the square polynomial is almost statistically signifigant

There is a possibility of a non linear relationship for indus, medv and, nox with crim. `

homework R Markdown

ISLR Ch 3

ISLR Ch3 Exercises #4, #15

I collect a set of data (n = 100 observations) containing a single predictor and a quantitative response. I then fit a linear regression model to the data, as well as a separate cubic regression, i.e. \(Y = \beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\beta_{3}X^{3}+\epsilon\).

\((b)\) Answer \((a)\) using test rather than training RSS.

\((d)\) Answer \((c)\) using test rather than training RSS.

This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.

\((a)\) For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

\((b)\) Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis \(H_0\) : \(\beta_j = 0\)?

Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form \(Y = \beta_{0} + \beta_{1}{X} + \beta_{2}{X}^{2} + \beta_{3}{X}^{3} + \epsilon\).

\(age.\)

\(black.\)

\(chas.\)

\(dis.\)

\(indus.\)

\(lstat.\)

\(medv.\)

\(nox.\)

\(ptratio.\)

\(rad.\)

\(rm.\)

\(tax.\)

\(zn.\)

Related

I collect a set of data (`n = 100` observations) containing a single predictor and a quantitative response. I then fit a linear regression model to the data, as well as a separate cubic regression, i.e. \(Y = \beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\beta_{3}X^{3}+\epsilon\).

This problem involves the `Boston` data set, which we saw in the lab for this chapter. We will now try to predict `per capita crime rate` using the other variables in this data set. In other words, `per capita crime rate` is the `response`, and the other variables are the predictors.