Exam 3

# Introduction

One of the foundational theories of modern macroeconomics is the real business cycle theory, where the business cycle results from an economy reacting to external forces opposed to economic concerns. We are investigating the relationship between productivity, measured in output per hour worked for non-financial firms, and four measures of the economy, GDP, value of goods, investment spending and total hours worked. To this end, we have seven specific problems we will attempt to answer to best understand the theory. We will be using economic data from the United States from 1947 to 2016 in a dataset which we will refer to as macro.

## Notes

For the purpose of this paper, two rows have been removed from the dataset, as they contained NA’s. We felt that those rows would not contribute to our models as we would be unable to predict from them and unable to use them in many of our functions. For the purpose of our bootstrap, our parametric models have been bootstrapped one thousand times, while our non-parametric models have been bootstrapped five times, due to their lengthy run time. Both model types have used a block size of 24 for the purpose of block resampling as per the Professors hint.

# Exploratory Data Analysis

I J Cor
2 GDP Investment 0.7700481
3 GDP Hours 0.3708821
4 GDP Productivity 0.4772074
5 Consumption GDP 0.9463065
6 Consumption Investment 0.7642131
7 Consumption Hours 0.3942924
8 Consumption Productivity 0.4209672
12 Investment Productivity 0.2792879
15 Hours Investment 0.5852466
16 Hours Productivity -0.4909803

## Correlations

Before we began our modeling, we did some brief exploratory data analysis, paying special attention to the correlations between the variables. GDP and Consumption had the largest positive correlation at $$0.9463065$$, while Hours and Productivity had the largest negative correlation at $$-0.4909803$$. Investment and Productivity had the smallest absolute correlation at $$0.2792879$$. Due to the high correlations between Consumption and Investment and Hours and Investment, we will be testing models with those interaction terms.

## Plot of GDP

Above, we see GDP plotted over time. There is an unsteady rise in GDP till ~2005, where GDP rapidly falls lower than its original levels. We believe the fall in GDP is associated with The “Great Recession,” a period associated with steady decline in the late 2000’s. While the Great Recession traditional start date is 2007, we speculate that some of our variables may have indicated decline prior to the official recession.

# Initial Modeling of GDP

#Subset of data till 2005
macro.2005 <- macro[1:which(macro$X == "2005-09-30"), ] gdp.b.mse <- data.frame(None = numeric(1), CI = numeric(1), HI = numeric(1), Both = numeric(1), GAM = numeric(1), Kernel = numeric(1)) gdp.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.2005, lm, "GDP ~ Consumption + Investment + Hours + Productivity", 25), 5) gdp.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.2005, lm, "GDP ~ Consumption + Investment + Hours + Productivity + (Consumption * Investment)", 25), 5) gdp.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.2005, lm, "GDP ~ Consumption + Investment + Hours + Productivity + (Hours * Investment)", 25), 5) gdp.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.2005, lm, "GDP ~ Consumption + Investment + Hours + Productivity + (Consumption * Investment) + (Hours * Investment)", 25), 5) gdp.b.mse[1,5] <- signif(mse.bootstrap(1000, macro.2005, gam, "GDP ~ s(Consumption) + s(Investment) + s(Hours) + s(Productivity)", 25), 5) gdp.b.mse[1,6] <- signif(mse.bootstrap(5, macro.2005, npreg, "GDP ~ Consumption + Investment + Hours + Productivity", 25), 5) None CI HI Both GAM Kernel 0.0002148 0.0001981 0.000208 0.0001942 9.81e-05 7e-07 ## Choosing a GDP Model After our brief exploration of the data, we began modeling GDP through 2005 using our other variables. The table above shows the bootstrapped mean squared errors of six models we tested; four linear models, a generalized additive model, and a kernel. Of our linear models, the model with both interaction terms faired the best. However, both the generalized additive model and the kernel faired considerably better. Since there is only a small difference in mean squared errors between the generalized additive model and the kernel, we have chosen the generalized additive model as our model of choice for this problem; its summary and partial response functions can be seen below. #Comparing Models gdp.lm <- lm(GDP ~ (Consumption) + (Investment) + (Hours) + (Productivity) + (Consumption * Investment) + (Hours * Investment), data = macro.2005) gdp.gam <- gam(GDP ~ s(Consumption) + s(Investment) + s(Hours) + s(Productivity), data = macro.2005) gdp.kernel <- npreg(GDP ~ Consumption + Investment + Hours + Productivity, data = macro.2005, tol=1e-3, ftol=1e-4) s.table.edf s.table.Ref.df s.table.F s.table.p.value s(Consumption) 6.320772 7.464546 91.743644 0.0000000 s(Investment) 3.850478 4.825630 8.122124 0.0000007 s(Hours) 4.489154 5.536719 2.104193 0.0705268 s(Productivity) 6.389897 7.515334 19.807413 0.0000000 ## Summary of Chosen Model The summary and partial response functions of our chosen model, the generalized additive model, can be seen above. From the summary and the partial response functions, we see thatConsumption and Productivity both appear to have significant positive relationships with GDP, while Investment and Hours show weaker relationships and larger p-values (Hours is not significant). While it was recommended that we predict on the post-2005 data using our chosen model, due to the marginal differences between the mean squared errors, we have predicted on three models, the results can be seen below. #Predicting new data #Great Recession data rec.years <- (which(macro$X == "2005-12-31") + 1):nrow(macro)

#GDP predictors
gdp.pred.mse <- data.frame(LM = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))

#LM predictions
gdp.pred.mse[1,1] <- signif(
pred.bootstrap(
1000,
macro.2005,
macro[rec.years,],
lm,
"GDP ~
(Consumption) +
(Investment) +
(Hours) +
(Productivity) +
(Consumption * Investment) +
(Hours * Investment)",
25,
"GDP"),5)

#GAM predictions
gdp.pred.mse[1,2] <- signif(
pred.bootstrap(
1000,
macro.2005,
macro[rec.years,],
gam,
"GDP ~
s(Consumption) +
s(Investment) +
s(Hours) +
s(Productivity)",
25,
"GDP"),5)

#Kernel model predictions
gdp.pred.mse[1,3] <- signif(
pred.bootstrap(
5,
macro.2005,
macro[rec.years,],
npreg,
"GDP ~
(Consumption) +
(Investment) +
(Hours) +
(Productivity)",
25,
"GDP"),5)
LM GAM Kernel
0.0006498 0.0029124 0.0012088

## Model Predictions

As mentioned in the paragraph above, we predicted on the post-2005 data for three models: the best of our linear models, our generalized additive model, and our kernel. The bootstrapped mean squared errors can be seen in the table above. What’s most striking about these results, is the predicting power of the linear model. While it had performed worse on the pre-2005 data, it had the smallest mean squared error in predicting the post-2005 data. We speculate, the drastic change brought by the recession was best weathered by the relative simplicity of the linear model while the more complicated models suffered due to the sudden change.

# GDP Model $t -$ Without Productivity

#lagged data
macro.lag <- data.frame(apply(macro, 2, design.matrix.from.ts, 1, right.older = FALSE))
macro.lag <- data.frame(cbind(macro.lag[,c(1:2)],
apply(macro.lag[,c(3:12)], 2,
function(x) as.numeric(as.character(x)))))
#2005 data
macro.2005.lag <- macro.lag[1:which(macro.lag$X.lag1 == "2005-09-30"), ] t1.noPro.b.mse <- data.frame(None = numeric(1), CI = numeric(1), HI = numeric(1), Both = numeric(1), GAM = numeric(1), Kernel = numeric(1)) t1.noPro.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.2005.lag, lm, "GDP.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0", 25), 5) t1.noPro.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.2005.lag, lm, "GDP.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0 + (Consumption.lag0 * Investment.lag0)", 25), 5) t1.noPro.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.2005.lag, lm, "GDP.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0 + (Hours.lag0 * Investment.lag0)", 25), 5) t1.noPro.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.2005.lag, lm, "GDP.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0 + (Consumption.lag0 * Investment.lag0) + (Hours.lag0 * Investment.lag0)", 25), 5) t1.noPro.b.mse[1,5] <- signif(mse.bootstrap(1000, macro.2005.lag, gam, "GDP.lag1 ~ s(GDP.lag0) + s(Consumption.lag0) + s(Investment.lag0) + s(Hours.lag0)", 25), 5) t1.noPro.b.mse[1,6] <- signif(mse.bootstrap(5, macro.2005.lag, npreg, "GDP.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0", 25), 5) None CI HI Both GAM Kernel 7.99e-05 7.79e-05 7.97e-05 7.84e-05 6e-05 2.5e-06 ## Choosing a$t -1 $Model Since our five variables are time series variables, we re-ran our models of GDP at time $$t$$ as a predictor of all variables excluding Productivity at time $$t - 1$$. The bootstrapped mean squared errors are on the table above. The linear model with the interaction term between Consumption and Investment was the best linear model and was marginally better than the generalized additive model and the kernel. Like our last section, we chose to report and summarize the generalized additive model due to its flexibility and ease of use and understanding. Since the three models had negligible differences between mean squared errors, we will predict the post-2005 data for all three. s.table.edf s.table.Ref.df s.table.F s.table.p.value s(GDP.lag0) 1.000000 1.000000 1101.986130 0.0000000 s(Consumption.lag0) 3.126139 3.913003 1.066775 0.4141427 s(Investment.lag0) 1.958425 2.474079 15.161657 0.0000001 s(Hours.lag0) 2.189202 2.711123 3.862109 0.0113873 ## Summary of Chosen Model Above is the summary and the partial response functions of our generalized additive model. Our partial response functions indicate that there is a strong linear, significant relationship with GDP at $$t - 1$$, which is understandable as we are predicting GDP from GDP. There is a slightly negative significant relationship with Investment and strongly neutral insignificant relationships with Consumption andHours. ## Comparison of Partial Response Function} Comparing the two chosen models leads to some interesting speculation. In our second model, Consumption is no longer a significant predictor and Investment is not a negative predictor. Since there are some large differences between the two models, we can only offer speculation as to why the power of the predictors has changed. Since GDP encompasses Consumption and Investment the change in significance may result from the its inclusion. We will not speculate on the removal of Productivity until after we have done separate testing on it. #Predicting new data #Lagged Recession years lag.rec.years<- (which(macro.lag$X.lag1 == "2005-09-30") + 1):nrow(macro.lag)

#GDP predictors
t1.noPro.pred.mse <- data.frame(LM = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))

#linear model predictions
t1.noPro.pred.mse[1,1] <- signif(
pred.bootstrap(
1,
macro.2005.lag,
macro.lag[lag.rec.years,],
lm,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Consumption.lag0 * Investment.lag0)",
25,
"GDP.lag1"),5)

#Gam predictions
t1.noPro.pred.mse[1,2] <- signif(
pred.bootstrap(
1000,
macro.2005.lag,
macro.lag[lag.rec.years,],
gam,
"GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0)",
25,
"GDP.lag1"),5)

#Kernel model predictions
t1.noPro.pred.mse[1,3] <- signif(
pred.bootstrap(
5,
macro.2005.lag,
macro.lag[lag.rec.years,],
npreg,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0)",
25,
"GDP.lag1"),5)
LM GAM Kernel
8.09e-05 0.0013816 0.0042868

## Model Predictions

The table above shows the mean squared errors of our three chosen models predictions of the post-2005 data. The linear model was far and ahead the top model at predicting the post-2005 data. We continue to speculate that the simplicity of the linear models allow it to predict better than the more computationally intensive models.

## Comparison between Chosen Model 1 and Chosen Model 2

Having compared the partial response function above, we sought to compare between the performances of our chosen and not chosen models. Our parametric models performed better on the pre-2005 data and on predicting the post-2005 data with the time series model while our kernel performed better with the original model. In the next section, we will test a similar model with the addition of Productivity in the hopes of finding a better model of predicting GDP.

# GDP Model at $$t- 1$$ with Productivity

t1.b.mse <- data.frame(None = numeric(1), CI = numeric(1), HI = numeric(1),
Both = numeric(1), GAM = numeric(1), Kernel = numeric(1))

t1.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0", 25), 5)

t1.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0 +
(Consumption.lag0 * Investment.lag0)", 25), 5)

t1.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0 +
(Hours.lag0 * Investment.lag0)", 25), 5)

t1.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0 +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0)", 25), 5)

t1.b.mse[1,5] <- signif(mse.bootstrap(1000, macro.2005.lag, gam,
"GDP.lag1 ~ s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0)", 25), 5)

t1.b.mse[1,6] <- signif(mse.bootstrap(5, macro.2005.lag, npreg,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0", 25), 5)
None CI HI Both GAM Kernel
7.75e-05 7.55e-05 7.68e-05 7.53e-05 5.04e-05 2.6e-06

## Choosing a Model

The summary of mean squared errors above are of our previous models with the inclusion of Productivity at time $$t - 1$$. The linear model with both interaction terms performed better than the other linear models while the generalized additive model and kernel performed better than the linear models. The mean squared errors of the three models were very close so we we predicted the post-2005 data on all three, but only summarized the generalized additive model as it was the best of the parametric models and would allow for easy comparison between the two previously chosen models.

#Linear Model
t1.lm <- lm(GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Productivity.lag0) +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0),
data = macro.2005.lag)

#GAM
t1.gam <- gam(GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.2005.lag)

#Kernel
t1.kernel <- npreg(GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0,
data = macro.2005.lag,
tol=1e-3, ftol=1e-4)
s.table.edf s.table.Ref.df s.table.F s.table.p.value
s(GDP.lag0) 1.000042 1.000082 883.7473465 0.0000000
s(Consumption.lag0) 1.000339 1.000663 1.1151825 0.2921059
s(Investment.lag0) 2.261376 2.867782 9.3160578 0.0000111
s(Hours.lag0) 1.004223 1.008275 0.1144802 0.7399933
s(Productivity.lag0) 7.063697 8.114147 2.8333840 0.0047328

## Summary of Chosen Model

The addition of Productivity did not dramatically change the partial response functions nor the significance of our variables. GDP at $$t - 1$$ was still understandably a positive, significant predictor, Investment was still a slightly negative significant predictor and Consumption and Hours were neither significant nor positive or negative. Our new variable, Productivity was significant and appears neither negative nor positive, but is less variable as the values grow. The addition of Productivity did not drastically change the partial response functions or the summary statistics, possibly indicating that while significant it does not add much to the model.

#GDP predictors
t1.pred.mse <- data.frame(LM = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))

#linear model predictions
t1.pred.mse[1,1] <- signif(
pred.bootstrap(
1000,
macro.2005.lag,
macro.lag[lag.rec.years,],
lm,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Productivity.lag0) +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0)",
25,
"GDP.lag1"),5)

#Gam predictions
t1.pred.mse[1,2] <- signif(
pred.bootstrap(
1000,
macro.2005.lag,
macro.lag[lag.rec.years,],
gam,
"GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0)",
25,
"GDP.lag1"),5)

#Kernel model predictions
t1.pred.mse[1,3] <- signif(
pred.bootstrap(
5,
macro.2005.lag,
macro.lag[lag.rec.years,],
npreg,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Productivity.lag0)",
25,
"GDP.lag1"),5)
LM GAM Kernel
0.0001118 0.0011308 0.0017418

## Model Predictions

Like our two previous set of predictions, the linear model outperformed the generalized additive model and the kernel. However, the difference between the mean squared error in this set of predictions, and the previous set is much smaller, possibly hinting at the role Productivity plays in the dataset.

## Comparison Between Chosen Models 1,2 and 3

We chose to compare the top performing linear model, generalized additive model, and kernel from each of our three sets of models. While the Productivity models outperformed the non-Productivity models on the pre-2005 data, only the generalized additive model and the kernel beat their pair on the post-2005 data. Of all of our models, our original kernel model was the top performer on the pre-2005 data, while our non-Productivity model was the top performer on the post-2005 data. The three models we chose, the generalized additive models were second best at predicting both pre and post-2005 data which meant smaller tradeoffs versus the linear models and the kernels.

## Role of Productivity

Our results do not demonstrate any specific role that Productivity plays as in our chosen models, the inclusion versus exclusion of Productivity produced results of $$2$$% and $$2$$% for the pre and post-2005 data respectably. In the next few sections, we will further investigate the role of Productivity in driving GDP.

#additive regressions looking at Productivity

gdp.t1.gam <- gam(GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)

con.t1.gam <- gam(Consumption.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)

inv.t1.gam <- gam(Investment.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)

hou.t1.gam <- gam(Hours.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)

par(mfrow = c(2,2))
plot.gam(gdp.t1.gam, select = 5)
plot.gam(con.t1.gam, select = 5)
plot.gam(inv.t1.gam, select = 5)
plot.gam(hou.t1.gam, select = 5)

gdp.t1.p <- (data.frame((summary.gam(gdp.t1.gam, signif.stars = FALSE))["s.table"]))[5,]
con.t1.p <- (data.frame((summary.gam(con.t1.gam, signif.stars = FALSE))["s.table"]))[5,]
inv.t1.p <- (data.frame((summary.gam(inv.t1.gam, signif.stars = FALSE))["s.table"]))[5,]
hou.t1.p <- (data.frame((summary.gam(hou.t1.gam, signif.stars = FALSE))["s.table"]))[5,]

pro.p <- (rbind(gdp.t1.p, con.t1.p, inv.t1.p, hou.t1.p))
rownames(pro.p) <- c("GDP", "Consumption", "Investment", "Hours")

kable(pro.p, format = "html")
s.table.edf s.table.Ref.df s.table.F s.table.p.value
GDP 5.734286 6.886623 2.584550 0.0144670
Consumption 3.246357 4.050438 6.642329 0.0000392
Investment 7.292083 8.285619 3.930789 0.0001941
Hours 5.388386 6.532995 1.847852 0.0843463

## Effect of Productivity

Above are the partial response functions and p-values of Productivity of the other four variables. All four functions have a hump around -$$0.05$$ which is most pronounced in the partial response function for Investment. Of the p-values, Productivity is a significant predictor for three of the four variables, all except Hours. We speculate that the insignificance of Productivity on Hours relate to its high negative correlation, $$-0.49$$. These results partly back up the theory that Productivity is an exogenous variable that drives the other variables as Productivity is a significant predictor of three of four of them, however the partial response functions do not indicate much to back up or detract from the theory.

# Model of Productivity

pro.b.mse <- data.frame(LM = numeric(1), GAM = numeric(1), Kernel = numeric(1))

pro.b.mse$LM <- mse.bootstrap(1000, macro.lag[,c(11:12)], lm, "Productivity.lag1 ~ Productivity.lag0", 25) pro.b.mse$GAM <- mse.bootstrap(1000, macro.lag[,c(11:12)], gam,
"Productivity.lag1 ~ Productivity.lag0", 25)
pro.b.mse$Kernel <- mse.bootstrap(5, macro.lag[,c(11:12)], npreg, "Productivity.lag1 ~ Productivity.lag0", 25) LM GAM Kernel 9.06e-05 8.97e-05 8.65e-05 ## Comparison of Models Above is a table of bootstrapped mean squared errors of the first-order autoregressive processes. We tested three different models, a linear model without interactions, a generalized additive model, and a kernel. All three models reported mean squared errors within 2% of each other, so we chose what we thought the best first-order autoregressive model was based on attributes of model type. In keeping with the theme of this report, we used a generalized additive model, due to the generalized additive model’s flexibility, use of parameters and ease of use and understanding. After identifying the best first-order autoregressive model, we have tested and summarized other models of Productivity. # Advanced Productivity Models pro.full.b.mse <- data.frame(None = numeric(1), CI = numeric(1), HI = numeric(1), Both = numeric(1), GAM = numeric(1), Kernel = numeric(1)) pro.full.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.lag, lm, "Productivity.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0+ Productivity.lag0", 25), 5) pro.full.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.lag, lm, "Productivity.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0+ Productivity.lag0 + (Consumption.lag0 * Investment.lag0)", 25), 5) pro.full.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.lag, lm, "Productivity.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0+ Productivity.lag0 + (Hours.lag0 * Investment.lag0)", 25), 5) pro.full.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.lag, lm, "Productivity.lag1 ~ GDP.lag0 + Consumption.lag0 + Investment.lag0 + Hours.lag0+ Productivity.lag0 + (Consumption.lag0 * Investment.lag0) + (Hours.lag0 * Investment.lag0)", 25), 5) pro.full.b.mse$GAM <- mse.bootstrap(1000, macro.lag, gam,
"Productivity.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0)", 25)

pro.full.b.mse\$Kernel <- mse.bootstrap(5, macro.lag, npreg,
"Productivity.lag1 ~
GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0+
Productivity.lag0", 25)
kable(pro.full.b.mse, format = "html")
None CI HI Both GAM Kernel
8.19e-05 8.22e-05 8.24e-05 8.03e-05 5.05e-05 2.4e-06

## Comparison of Models

In predicting Productivity, we tested four linear models, a generalized additive model, and a kernel. The kernel had the best bootstrapped mean squared error, $$2.4309962\times 10^{-6}$$, however, we chose to use our generalized additive model for ease of comparison between it and the previous model.

pro.full.gam <- gam(Productivity.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0), data = macro.lag)

#GAM summary
kable(data.frame((summary.gam(pro.full.gam, signif.stars = FALSE))["s.table"]))
s.table.edf s.table.Ref.df s.table.F s.table.p.value
s(GDP.lag0) 7.151625 8.173708 4.443886 0.0000368
s(Consumption.lag0) 7.698939 8.534404 2.203031 0.0249533
s(Investment.lag0) 6.910767 8.022228 3.142824 0.0020650
s(Hours.lag0) 1.000119 1.000230 11.132536 0.0009735
s(Productivity.lag0) 1.624114 2.022864 319.697634 0.0000000
par(mfrow = c(2,2))
plot(pro.full.gam,pages=1,residuals=TRUE,all.terms=TRUE,shade=TRUE,shade.col=2)

## Summary of Chosen Model

Above is the summary and partial response functions of our chosen model, the generalized additive model. Our summary indicated that all of the variables, except for Consumption are significant predictors. The partial response functions suggest that GDP, Consumption and Investment do not have strong linear relationships with Productivity at time $$t$$. Our partial response functions show Hours having a strong negative linear relationship with Productivity which is backed by the correlation, but interesting as the model where Hours was the response did not show the same relationship. Productivity at time $$t - 1$$ has a strong linear relationship with Productivity at time $$t$$ for obvious reasons.

## Comparison of First-order Autoregressive and Productivity Models

Based on our choice of models, the two generalized additive models, the model with the other variables outperforms the first-order autoregressive model by about $$2$$%. Since the difference between the two models is so marginal, we are hesitant to suggest that Productivity is an exogenous variable. If anything, since the model with other variable outperformed the first-order autoregressive model, Productivity may be an endogenous model.

# Conclusion

## Proposition

The researchers who’s dataset we have used and who’s questions we have sought to answer have proposed that “Exogenous changed in Productivity are the main driver of the macroeconomic fluctuations.”. Due to the results we have in the paper below, we feel without further evidence to the contrary; we must reject the researchers proposition. Our first-time series models showed little difference between models that included Productivity and models that did not. When we observed the partial response functions of four additive models, we did see Productivity as significant. However, we did not see trends in the functions. Lastly, the final models we tested, those of Productivity showed little difference between those predicted by all variables and those predicted by Productivity alone. Due to this evidence, we feel that we have demonstrated that Productivity is not an exogenous variable in our dataset and thus not the main driver in macroeconomic fluctuations.