Logistic Regression


For comparison, we also built a logistic regression model using stepwise regression. Predictive validation of this model on the same independent set produced 10 errors in the non-stroke subjects (false positive rate = 0.09) and 3 errors in the stroke subjects (false negative rate = 0.43), with an overall accuracy of 88%, compared to the 98.2% of the Bayesian network model.


Coefficients Summary statistics of the logistic regression model. Estimates, standard errors and p-values of the genotype effects that were found significantly associated with the stroke.
Box Plot Box plot of the predictive probability of stroke (risk in 5 years) in the independent test set obtained through logistic regression.


The two box plots on the right show, side by side, the difference in predictive discrimination between stroke (red) and non-stroke (blue) subjects using the Bayesian network model (above) and logistic regression (below). The greater overlapping of the two distributions in the box plot below compared to the distributions above shows a significantly decreased discriminatory power of the logistic regression model, consistent with its lower predictive accuracy.


Coefficients|Box Plot