One of the foundational theories of modern macroeconomics is the real business cycle theory, where the business cycle results from an economy reacting to external forces opposed to economic concerns. We are investigating the relationship between productivity, measured in output per hour worked for non-financial firms, and four measures of the economy, GDP, value of goods, investment spending and total hours worked. To this end, we have seven specific problems we will attempt to answer to best understand the theory. We will be using economic data from the United States from 1947 to 2016 in a dataset which we will refer to as `macro`

.

For the purpose of this paper, two rows have been removed from the dataset, as they contained NA’s. We felt that those rows would not contribute to our models as we would be unable to predict from them and unable to use them in many of our functions. For the purpose of our bootstrap, our parametric models have been bootstrapped one thousand times, while our non-parametric models have been bootstrapped five times, due to their lengthy run time. Both model types have used a block size of 24 for the purpose of block resampling as per the Professors hint.

I | J | Cor | |
---|---|---|---|

2 | GDP | Investment | 0.7700481 |

3 | GDP | Hours | 0.3708821 |

4 | GDP | Productivity | 0.4772074 |

5 | Consumption | GDP | 0.9463065 |

6 | Consumption | Investment | 0.7642131 |

7 | Consumption | Hours | 0.3942924 |

8 | Consumption | Productivity | 0.4209672 |

12 | Investment | Productivity | 0.2792879 |

15 | Hours | Investment | 0.5852466 |

16 | Hours | Productivity | -0.4909803 |

Before we began our modeling, we did some brief exploratory data analysis, paying special attention to the correlations between the variables. `GDP`

and `Consumption`

had the largest positive correlation at \(0.9463065\), while `Hours`

and `Productivity`

had the largest negative correlation at \(-0.4909803\). `Investment`

and `Productivity`

had the smallest absolute correlation at \(0.2792879\). Due to the high correlations between `Consumption`

and `Investment`

and `Hours`

and `Investment`

, we will be testing models with those interaction terms.

Above, we see `GDP`

plotted over time. There is an unsteady rise in GDP till ~2005, where GDP rapidly falls lower than its original levels. We believe the fall in GDP is associated with The “Great Recession,” a period associated with steady decline in the late 2000’s. While the Great Recession traditional start date is 2007, we speculate that some of our variables may have indicated decline prior to the official recession.

```
#Subset of data till 2005
macro.2005 <- macro[1:which(macro$X == "2005-09-30"), ]
gdp.b.mse <- data.frame(None = numeric(1),
CI = numeric(1),
HI = numeric(1),
Both = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))
gdp.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.2005, lm,
"GDP ~
Consumption +
Investment +
Hours +
Productivity", 25), 5)
gdp.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.2005, lm,
"GDP ~
Consumption +
Investment +
Hours +
Productivity +
(Consumption * Investment)", 25), 5)
gdp.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.2005, lm,
"GDP ~
Consumption +
Investment +
Hours +
Productivity +
(Hours * Investment)", 25), 5)
gdp.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.2005, lm,
"GDP ~
Consumption +
Investment +
Hours +
Productivity +
(Consumption * Investment) +
(Hours * Investment)", 25), 5)
gdp.b.mse[1,5] <- signif(mse.bootstrap(1000, macro.2005, gam,
"GDP ~
s(Consumption) +
s(Investment) +
s(Hours) +
s(Productivity)", 25), 5)
gdp.b.mse[1,6] <- signif(mse.bootstrap(5, macro.2005, npreg,
"GDP ~
Consumption +
Investment +
Hours +
Productivity", 25), 5)
```

None | CI | HI | Both | GAM | Kernel |
---|---|---|---|---|---|

0.0002148 | 0.0001981 | 0.000208 | 0.0001942 | 9.81e-05 | 7e-07 |

After our brief exploration of the data, we began modeling `GDP`

through 2005 using our other variables. The table above shows the bootstrapped mean squared errors of six models we tested; four linear models, a generalized additive model, and a kernel. Of our linear models, the model with both interaction terms faired the best. However, both the generalized additive model and the kernel faired considerably better. Since there is only a small difference in mean squared errors between the generalized additive model and the kernel, we have chosen the generalized additive model as our model of choice for this problem; its summary and partial response functions can be seen below.

```
#Comparing Models
gdp.lm <- lm(GDP ~
(Consumption) +
(Investment) +
(Hours) +
(Productivity) +
(Consumption * Investment) +
(Hours * Investment),
data = macro.2005)
gdp.gam <- gam(GDP ~
s(Consumption) +
s(Investment) +
s(Hours) +
s(Productivity),
data = macro.2005)
gdp.kernel <- npreg(GDP ~
Consumption +
Investment +
Hours +
Productivity,
data = macro.2005,
tol=1e-3, ftol=1e-4)
```

s.table.edf | s.table.Ref.df | s.table.F | s.table.p.value | |
---|---|---|---|---|

s(Consumption) | 6.320772 | 7.464546 | 91.743644 | 0.0000000 |

s(Investment) | 3.850478 | 4.825630 | 8.122124 | 0.0000007 |

s(Hours) | 4.489154 | 5.536719 | 2.104193 | 0.0705268 |

s(Productivity) | 6.389897 | 7.515334 | 19.807413 | 0.0000000 |

The summary and partial response functions of our chosen model, the generalized additive model, can be seen above. From the summary and the partial response functions, we see that`Consumption`

and `Productivity`

both appear to have significant positive relationships with `GDP`

, while `Investment`

and `Hours`

show weaker relationships and larger p-values (`Hours`

is not significant). While it was recommended that we predict on the post-2005 data using our chosen model, due to the marginal differences between the mean squared errors, we have predicted on three models, the results can be seen below.

```
#Predicting new data
#Great Recession data
rec.years <- (which(macro$X == "2005-12-31") + 1):nrow(macro)
#GDP predictors
gdp.pred.mse <- data.frame(LM = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))
#LM predictions
gdp.pred.mse[1,1] <- signif(
pred.bootstrap(
1000,
macro.2005,
macro[rec.years,],
lm,
"GDP ~
(Consumption) +
(Investment) +
(Hours) +
(Productivity) +
(Consumption * Investment) +
(Hours * Investment)",
25,
"GDP"),5)
#GAM predictions
gdp.pred.mse[1,2] <- signif(
pred.bootstrap(
1000,
macro.2005,
macro[rec.years,],
gam,
"GDP ~
s(Consumption) +
s(Investment) +
s(Hours) +
s(Productivity)",
25,
"GDP"),5)
#Kernel model predictions
gdp.pred.mse[1,3] <- signif(
pred.bootstrap(
5,
macro.2005,
macro[rec.years,],
npreg,
"GDP ~
(Consumption) +
(Investment) +
(Hours) +
(Productivity)",
25,
"GDP"),5)
```

LM | GAM | Kernel |
---|---|---|

0.0006498 | 0.0029124 | 0.0012088 |

As mentioned in the paragraph above, we predicted on the post-2005 data for three models: the best of our linear models, our generalized additive model, and our kernel. The bootstrapped mean squared errors can be seen in the table above. What’s most striking about these results, is the predicting power of the linear model. While it had performed worse on the pre-2005 data, it had the smallest mean squared error in predicting the post-2005 data. We speculate, the drastic change brought by the recession was best weathered by the relative simplicity of the linear model while the more complicated models suffered due to the sudden change.

```
#lagged data
macro.lag <- data.frame(apply(macro, 2, design.matrix.from.ts, 1, right.older = FALSE))
macro.lag <- data.frame(cbind(macro.lag[,c(1:2)],
apply(macro.lag[,c(3:12)], 2,
function(x) as.numeric(as.character(x)))))
#2005 data
macro.2005.lag <- macro.lag[1:which(macro.lag$X.lag1 == "2005-09-30"), ]
t1.noPro.b.mse <- data.frame(None = numeric(1),
CI = numeric(1),
HI = numeric(1),
Both = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))
t1.noPro.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0", 25), 5)
t1.noPro.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
(Consumption.lag0 * Investment.lag0)", 25), 5)
t1.noPro.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
(Hours.lag0 * Investment.lag0)", 25), 5)
t1.noPro.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0)", 25), 5)
t1.noPro.b.mse[1,5] <- signif(mse.bootstrap(1000, macro.2005.lag, gam,
"GDP.lag1 ~ s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0)", 25), 5)
t1.noPro.b.mse[1,6] <- signif(mse.bootstrap(5, macro.2005.lag, npreg,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0", 25), 5)
```

None | CI | HI | Both | GAM | Kernel |
---|---|---|---|---|---|

7.99e-05 | 7.79e-05 | 7.97e-05 | 7.84e-05 | 6e-05 | 2.5e-06 |

Since our five variables are time series variables, we re-ran our models of GDP at time \(t\) as a predictor of all variables excluding Productivity at time \(t - 1\). The bootstrapped mean squared errors are on the table above. The linear model with the interaction term between `Consumption`

and `Investment`

was the best linear model and was marginally better than the generalized additive model and the kernel. Like our last section, we chose to report and summarize the generalized additive model due to its flexibility and ease of use and understanding. Since the three models had negligible differences between mean squared errors, we will predict the post-2005 data for all three.

s.table.edf | s.table.Ref.df | s.table.F | s.table.p.value | |
---|---|---|---|---|

s(GDP.lag0) | 1.000000 | 1.000000 | 1101.986130 | 0.0000000 |

s(Consumption.lag0) | 3.126139 | 3.913003 | 1.066775 | 0.4141427 |

s(Investment.lag0) | 1.958425 | 2.474079 | 15.161657 | 0.0000001 |

s(Hours.lag0) | 2.189202 | 2.711123 | 3.862109 | 0.0113873 |

Above is the summary and the partial response functions of our generalized additive model. Our partial response functions indicate that there is a strong linear, significant relationship with `GDP`

at \(t - 1\), which is understandable as we are predicting `GDP`

from `GDP`

. There is a slightly negative significant relationship with `Investment`

and strongly neutral insignificant relationships with `Consumption and`

Hours`.

Comparing the two chosen models leads to some interesting speculation. In our second model, `Consumption`

is no longer a significant predictor and `Investment`

is not a negative predictor. Since there are some large differences between the two models, we can only offer speculation as to why the power of the predictors has changed. Since `GDP`

encompasses `Consumption`

and `Investment`

the change in significance may result from the its inclusion. We will not speculate on the removal of `Productivity`

until after we have done separate testing on it.

```
#Predicting new data
#Lagged Recession years
lag.rec.years<- (which(macro.lag$X.lag1 == "2005-09-30") + 1):nrow(macro.lag)
#GDP predictors
t1.noPro.pred.mse <- data.frame(LM = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))
#linear model predictions
t1.noPro.pred.mse[1,1] <- signif(
pred.bootstrap(
1,
macro.2005.lag,
macro.lag[lag.rec.years,],
lm,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Consumption.lag0 * Investment.lag0)",
25,
"GDP.lag1"),5)
#Gam predictions
t1.noPro.pred.mse[1,2] <- signif(
pred.bootstrap(
1000,
macro.2005.lag,
macro.lag[lag.rec.years,],
gam,
"GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0)",
25,
"GDP.lag1"),5)
#Kernel model predictions
t1.noPro.pred.mse[1,3] <- signif(
pred.bootstrap(
5,
macro.2005.lag,
macro.lag[lag.rec.years,],
npreg,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0)",
25,
"GDP.lag1"),5)
```

LM | GAM | Kernel |
---|---|---|

8.09e-05 | 0.0013816 | 0.0042868 |

The table above shows the mean squared errors of our three chosen models predictions of the post-2005 data. The linear model was far and ahead the top model at predicting the post-2005 data. We continue to speculate that the simplicity of the linear models allow it to predict better than the more computationally intensive models.

Having compared the partial response function above, we sought to compare between the performances of our chosen and not chosen models. Our parametric models performed better on the pre-2005 data and on predicting the post-2005 data with the time series model while our kernel performed better with the original model. In the next section, we will test a similar model with the addition of `Productivity`

in the hopes of finding a better model of predicting `GDP`

.

```
t1.b.mse <- data.frame(None = numeric(1), CI = numeric(1), HI = numeric(1),
Both = numeric(1), GAM = numeric(1), Kernel = numeric(1))
t1.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0", 25), 5)
t1.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0 +
(Consumption.lag0 * Investment.lag0)", 25), 5)
t1.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0 +
(Hours.lag0 * Investment.lag0)", 25), 5)
t1.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.2005.lag, lm,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0 +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0)", 25), 5)
t1.b.mse[1,5] <- signif(mse.bootstrap(1000, macro.2005.lag, gam,
"GDP.lag1 ~ s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0)", 25), 5)
t1.b.mse[1,6] <- signif(mse.bootstrap(5, macro.2005.lag, npreg,
"GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0", 25), 5)
```

None | CI | HI | Both | GAM | Kernel |
---|---|---|---|---|---|

7.75e-05 | 7.55e-05 | 7.68e-05 | 7.53e-05 | 5.04e-05 | 2.6e-06 |

The summary of mean squared errors above are of our previous models with the inclusion of `Productivity`

at time \(t - 1\). The linear model with both interaction terms performed better than the other linear models while the generalized additive model and kernel performed better than the linear models. The mean squared errors of the three models were very close so we we predicted the post-2005 data on all three, but only summarized the generalized additive model as it was the best of the parametric models and would allow for easy comparison between the two previously chosen models.

```
#Linear Model
t1.lm <- lm(GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Productivity.lag0) +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0),
data = macro.2005.lag)
#GAM
t1.gam <- gam(GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.2005.lag)
#Kernel
t1.kernel <- npreg(GDP.lag1 ~ GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0 +
Productivity.lag0,
data = macro.2005.lag,
tol=1e-3, ftol=1e-4)
```

s.table.edf | s.table.Ref.df | s.table.F | s.table.p.value | |
---|---|---|---|---|

s(GDP.lag0) | 1.000042 | 1.000082 | 883.7473465 | 0.0000000 |

s(Consumption.lag0) | 1.000339 | 1.000663 | 1.1151825 | 0.2921059 |

s(Investment.lag0) | 2.261376 | 2.867782 | 9.3160578 | 0.0000111 |

s(Hours.lag0) | 1.004223 | 1.008275 | 0.1144802 | 0.7399933 |

s(Productivity.lag0) | 7.063697 | 8.114147 | 2.8333840 | 0.0047328 |

The addition of `Productivity`

did not dramatically change the partial response functions nor the significance of our variables. `GDP`

at \(t - 1\) was still understandably a positive, significant predictor, `Investment`

was still a slightly negative significant predictor and `Consumption`

and `Hours`

were neither significant nor positive or negative. Our new variable, `Productivity`

was significant and appears neither negative nor positive, but is less variable as the values grow. The addition of `Productivity`

did not drastically change the partial response functions or the summary statistics, possibly indicating that while significant it does not add much to the model.

```
#GDP predictors
t1.pred.mse <- data.frame(LM = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))
#linear model predictions
t1.pred.mse[1,1] <- signif(
pred.bootstrap(
1000,
macro.2005.lag,
macro.lag[lag.rec.years,],
lm,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Productivity.lag0) +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0)",
25,
"GDP.lag1"),5)
#Gam predictions
t1.pred.mse[1,2] <- signif(
pred.bootstrap(
1000,
macro.2005.lag,
macro.lag[lag.rec.years,],
gam,
"GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0)",
25,
"GDP.lag1"),5)
#Kernel model predictions
t1.pred.mse[1,3] <- signif(
pred.bootstrap(
5,
macro.2005.lag,
macro.lag[lag.rec.years,],
npreg,
"GDP.lag1 ~
(GDP.lag0) +
(Consumption.lag0) +
(Investment.lag0) +
(Hours.lag0) +
(Productivity.lag0)",
25,
"GDP.lag1"),5)
```

LM | GAM | Kernel |
---|---|---|

0.0001118 | 0.0011308 | 0.0017418 |

Like our two previous set of predictions, the linear model outperformed the generalized additive model and the kernel. However, the difference between the mean squared error in this set of predictions, and the previous set is much smaller, possibly hinting at the role `Productivity`

plays in the dataset.

We chose to compare the top performing linear model, generalized additive model, and kernel from each of our three sets of models. While the `Productivity`

models outperformed the non-`Productivity`

models on the pre-2005 data, only the generalized additive model and the kernel beat their pair on the post-2005 data. Of all of our models, our original kernel model was the top performer on the pre-2005 data, while our non-`Productivity`

model was the top performer on the post-2005 data. The three models we chose, the generalized additive models were second best at predicting both pre and post-2005 data which meant smaller tradeoffs versus the linear models and the kernels.

Our results do not demonstrate any specific role that `Productivity`

plays as in our chosen models, the inclusion versus exclusion of `Productivity`

produced results of \(2\)% and \(2\)% for the pre and post-2005 data respectably. In the next few sections, we will further investigate the role of `Productivity`

in driving `GDP`

.

```
#additive regressions looking at Productivity
gdp.t1.gam <- gam(GDP.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)
con.t1.gam <- gam(Consumption.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)
inv.t1.gam <- gam(Investment.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)
hou.t1.gam <- gam(Hours.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0),
data = macro.lag)
par(mfrow = c(2,2))
plot.gam(gdp.t1.gam, select = 5)
plot.gam(con.t1.gam, select = 5)
plot.gam(inv.t1.gam, select = 5)
plot.gam(hou.t1.gam, select = 5)
```

```
gdp.t1.p <- (data.frame((summary.gam(gdp.t1.gam, signif.stars = FALSE))["s.table"]))[5,]
con.t1.p <- (data.frame((summary.gam(con.t1.gam, signif.stars = FALSE))["s.table"]))[5,]
inv.t1.p <- (data.frame((summary.gam(inv.t1.gam, signif.stars = FALSE))["s.table"]))[5,]
hou.t1.p <- (data.frame((summary.gam(hou.t1.gam, signif.stars = FALSE))["s.table"]))[5,]
pro.p <- (rbind(gdp.t1.p, con.t1.p, inv.t1.p, hou.t1.p))
rownames(pro.p) <- c("GDP", "Consumption", "Investment", "Hours")
kable(pro.p, format = "html")
```

s.table.edf | s.table.Ref.df | s.table.F | s.table.p.value | |
---|---|---|---|---|

GDP | 5.734286 | 6.886623 | 2.584550 | 0.0144670 |

Consumption | 3.246357 | 4.050438 | 6.642329 | 0.0000392 |

Investment | 7.292083 | 8.285619 | 3.930789 | 0.0001941 |

Hours | 5.388386 | 6.532995 | 1.847852 | 0.0843463 |

Above are the partial response functions and p-values of Productivity of the other four variables. All four functions have a hump around -\(0.05\) which is most pronounced in the partial response function for `Investment`

. Of the p-values, `Productivity`

is a significant predictor for three of the four variables, all except `Hours`

. We speculate that the insignificance of `Productivity`

on `Hours`

relate to its high negative correlation, \(-0.49\). These results partly back up the theory that `Productivity`

is an exogenous variable that drives the other variables as `Productivity`

is a significant predictor of three of four of them, however the partial response functions do not indicate much to back up or detract from the theory.

```
pro.b.mse <- data.frame(LM = numeric(1), GAM = numeric(1), Kernel = numeric(1))
pro.b.mse$LM <- mse.bootstrap(1000, macro.lag[,c(11:12)], lm,
"Productivity.lag1 ~ Productivity.lag0", 25)
pro.b.mse$GAM <- mse.bootstrap(1000, macro.lag[,c(11:12)], gam,
"Productivity.lag1 ~ Productivity.lag0", 25)
pro.b.mse$Kernel <- mse.bootstrap(5, macro.lag[,c(11:12)], npreg,
"Productivity.lag1 ~ Productivity.lag0", 25)
```

LM | GAM | Kernel |
---|---|---|

9.06e-05 | 8.97e-05 | 8.65e-05 |

Above is a table of bootstrapped mean squared errors of the first-order autoregressive processes. We tested three different models, a linear model without interactions, a generalized additive model, and a kernel. All three models reported mean squared errors within 2% of each other, so we chose what we thought the best first-order autoregressive model was based on attributes of model type. In keeping with the theme of this report, we used a generalized additive model, due to the generalized additive model’s flexibility, use of parameters and ease of use and understanding. After identifying the best first-order autoregressive model, we have tested and summarized other models of `Productivity`

.

```
pro.full.b.mse <- data.frame(None = numeric(1),
CI = numeric(1),
HI = numeric(1),
Both = numeric(1),
GAM = numeric(1),
Kernel = numeric(1))
pro.full.b.mse[1,1] <- signif(mse.bootstrap(1000, macro.lag, lm,
"Productivity.lag1 ~
GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0+
Productivity.lag0", 25), 5)
pro.full.b.mse[1,2] <- signif(mse.bootstrap(1000, macro.lag, lm,
"Productivity.lag1 ~
GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0+
Productivity.lag0 +
(Consumption.lag0 * Investment.lag0)", 25), 5)
pro.full.b.mse[1,3] <- signif(mse.bootstrap(1000, macro.lag, lm,
"Productivity.lag1 ~
GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0+
Productivity.lag0 +
(Hours.lag0 * Investment.lag0)", 25), 5)
pro.full.b.mse[1,4] <- signif(mse.bootstrap(1000, macro.lag, lm,
"Productivity.lag1 ~
GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0+
Productivity.lag0 +
(Consumption.lag0 * Investment.lag0) +
(Hours.lag0 * Investment.lag0)", 25), 5)
pro.full.b.mse$GAM <- mse.bootstrap(1000, macro.lag, gam,
"Productivity.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0)", 25)
pro.full.b.mse$Kernel <- mse.bootstrap(5, macro.lag, npreg,
"Productivity.lag1 ~
GDP.lag0 +
Consumption.lag0 +
Investment.lag0 +
Hours.lag0+
Productivity.lag0", 25)
```

`kable(pro.full.b.mse, format = "html")`

None | CI | HI | Both | GAM | Kernel |
---|---|---|---|---|---|

8.19e-05 | 8.22e-05 | 8.24e-05 | 8.03e-05 | 5.05e-05 | 2.4e-06 |

In predicting `Productivity`

, we tested four linear models, a generalized additive model, and a kernel. The kernel had the best bootstrapped mean squared error, \(2.4309962\times 10^{-6}\), however, we chose to use our generalized additive model for ease of comparison between it and the previous model.

```
pro.full.gam <- gam(Productivity.lag1 ~
s(GDP.lag0) +
s(Consumption.lag0) +
s(Investment.lag0) +
s(Hours.lag0) +
s(Productivity.lag0), data = macro.lag)
#GAM summary
kable(data.frame((summary.gam(pro.full.gam, signif.stars = FALSE))["s.table"]))
```

s.table.edf | s.table.Ref.df | s.table.F | s.table.p.value | |
---|---|---|---|---|

s(GDP.lag0) | 7.151625 | 8.173708 | 4.443886 | 0.0000368 |

s(Consumption.lag0) | 7.698939 | 8.534404 | 2.203031 | 0.0249533 |

s(Investment.lag0) | 6.910767 | 8.022228 | 3.142824 | 0.0020650 |

s(Hours.lag0) | 1.000119 | 1.000230 | 11.132536 | 0.0009735 |

s(Productivity.lag0) | 1.624114 | 2.022864 | 319.697634 | 0.0000000 |

```
par(mfrow = c(2,2))
plot(pro.full.gam,pages=1,residuals=TRUE,all.terms=TRUE,shade=TRUE,shade.col=2)
```

Above is the summary and partial response functions of our chosen model, the generalized additive model. Our summary indicated that all of the variables, except for `Consumption`

are significant predictors. The partial response functions suggest that `GDP`

, `Consumption`

and `Investment`

do not have strong linear relationships with `Productivity`

at time \(t\). Our partial response functions show `Hours`

having a strong negative linear relationship with `Productivity`

which is backed by the correlation, but interesting as the model where `Hours`

was the response did not show the same relationship. `Productivity`

at time \(t - 1\) has a strong linear relationship with `Productivity at time \(t\) for obvious reasons.

Based on our choice of models, the two generalized additive models, the model with the other variables outperforms the first-order autoregressive model by about \(2\)%. Since the difference between the two models is so marginal, we are hesitant to suggest that `Productivity`

is an exogenous variable. If anything, since the model with other variable outperformed the first-order autoregressive model, `Productivity`

may be an endogenous model.

The researchers who’s dataset we have used and who’s questions we have sought to answer have proposed that “Exogenous changed in `Productivity`

are the main driver of the macroeconomic fluctuations.”. Due to the results we have in the paper below, we feel without further evidence to the contrary; we must reject the researchers proposition. Our first-time series models showed little difference between models that included `Productivity`

and models that did not. When we observed the partial response functions of four additive models, we did see `Productivity`

as significant. However, we did not see trends in the functions. Lastly, the final models we tested, those of `Productivity`

showed little difference between those predicted by all variables and those predicted by `Productivity`

alone. Due to this evidence, we feel that we have demonstrated that `Productivity`

is not an exogenous variable in our dataset and thus not the main driver in macroeconomic fluctuations.