## Logistic Regression

The research question I wish to answer using logistic regression is: which variables in the cycle touring dataset can be used to predict blogging? I ran a logistic regression with all the variables. In the block 0 variables not in the equation table, see Table 12, I noted all the variables that were significant using p > 0.05. Specifically: Budget, TotalDist, Variety, andTotalDays. I ran the logistic regression a second time, with only the predictor variables listed as significant in the first run. I then validated the model, by checking that Omnibus Test of Model Coefficients ensuring that it was significant (p < 0.05). In the same table, I also noted that ?^{2} = 22.39 with df = 4. Next I validated that the Hosmer and Lemshow test was not significant (p > 0.05). I checked which variables were significant in the final solution: only TotalDays was significant with p < 0.05, see Table 15. I also noted that the B value is positive, meaning that the greater the TotalDays the more likely blogging will occur. I examined the odds ratio, indicating that for every day increase in TotalDays, the odds of blogging increased 1.013 with a 95% confidence interval of 1.001 to 1.025. I examined the model summary to determine how much variation in the dependent variable (blog) could be attributed to the model (and therefore to TotalDays). Cox & Snell R^{2} provides the lower bound and Negelkerke R^{2} provides the upper bound. So 15.6 – 20.8% of the amount of variability in blogging can be attributed to TotalDays. I examined the classification table to determine the sensitivity (the number of correct yes results) and specificity (the number of correct no results). The sensitivity is 50.7% and the specificity is 76.6%. I then calculated the positive predictive value using predicted-observed yes / predicted yes * 100%, 37 / (18 + 37) * 100% = 67.3%. Using a similar equation, I calculated the negative predictive value, 59 / (59 + 36) * 100% = 62.1%.

## Leave a Reply