in a Virtual Environment

Complete Analyses for Phase I Regression Analysis

Recall the following abbreviations for the independent and dependent variables used in these analyses:AGE: age of the individual

GENDER: gender of the individual; coded -1 for male and 1 for female

MRA: Mental Rotation Ability; assessed by score on Cube Comparison Test

PREPRO: mean of the two pre-exposure Prototype values

GENAGE: product of GENDER and AGE

GENMRA: product of GENDER and MRA

GENPRO: product of GENDER and PREPRO

AGEMRA: product of AGE and MRA

AGEPRO: product of AGE and PREPRO

MRAPRO: product of MRA and PREPRO

TOTAL: Total Severity score on SSQ

NAUS: score on Nausea subscale of SSQ

VIS: score on Oculomotor Discomfort subscale of SSQ

DIS: score on Disorientation subscale of SSQ

LNTOTAL: natural logarithm of (TOTAL+1)

LNNAUS: natural logarithm of (NAUS+1)

LNVIS: natural logarithm of (VIS+1)

LNDIS: natural logarithm of (DIS+1)

A significance level of a = .05 was used throughout except where noted. Only first-order variables and all two-way interactions were considered in the analyses. Thus, there were ten possible regressors.

In order to meet the assumptions underlying the use of linear regression, the regression analyses were conducted using transformed sickness scores (e.g., LNTOTAL = ln[TOTAL+1] ) rather than the actual SSQ scores (e.g., TOTAL). Furthermore, because of additional violations of the assumptions due to the categorical-like nature of the subscale scores, a formal regression analysis was conducted only for the transformed Total Severity scores and none of the transformed subscale scores.Finally, because of the extreme AGE value associated with the individual, one point was eliminated from the regression analysis. Thus, the regression analysis for LNTOTAL was conducted using only 39 observations.

The first step in the analysis was to produce scatter plots of LNTOTAL versus each of the ten regressors. Examination of these scatter plots revealed no clear relationships between LNTOTAL and any of the regressors although some relationships were suggested. Specifically, there appeared to be a slight negative relationship between LNTOTAL and MRAPRO and slight positive relationships between LNTOTAL and GENDER, PREPRO, GENAGE, GENMRA, GENPRO, and AGEPRO.

Pearson correlations among regressors were examined. Aside from correlations between variables which were functions of the same variable (e.g., GENAGE and GENMRA) and correlations in which one of the variables was a function of the other (e.g., GENDER and GENAGE), there were several significant correlations between regressors. GENDER was significantly correlated with both PREPRO (r = .3437, p = .032) and AGEPRO (r = .3416, p = .033) and PREPRO was significantly correlated with GENAGE ( r = .3658, p = .022). In addition, two other correlations were significant at the a = .10 level: the correlation between PREPRO and GENMRA (r = .2711, p = .095) and the correlation between GENMRA and AGEPRO (r = .2781, p = .086). Because none of these correlations were especially strong, they were not expected to cause problems with multicollinearity in a regression model.

Correlations between LNTOTAL and each regressor were also examined. Significant correlations (at or near the a = .10 level) and their corresponding p-values are given in Table 5.

**Table 5:Significant Correlations Between LNTOTAL and the Reggresors**

GENDER | GENAGE | GENMRA | GENPRO | |
---|---|---|---|---|

LNTOTAL | .2835 (.080) | .2947 (.069) | .2621 (.107) | .3426 (.033) |

Note that all of these slight positive relationships had been suggested by
the scatter plots. Not all of the relationships suggested by the scatter
plots, however, represented significant correlations. The four significant
correlations suggested a direct relationship between GENDER and LNTOTAL.
Relationships between LNTOTAL and AGE, MRA, and PREPRO, however, appeared to
occur only through their interaction with GENDER.

The next step was to let Minitab try to select the best model using sequential variable selection procedures. Although the results of such model selection techniques should not be accepted as final, the results can be examined in conjunction with other more detailed techniques. Used in this capacity, they can be helpful in providing an additional view on the total picture and this is the way they were used in the analyses presented here. Stepwise, backward elimination, and forward selection methods were tried. The stepwise and forward selection procedures both stopped having selected the model of LNTOTAL on GENPRO. The backward elimination procedure stopped having selected the model of LNTOTAL on AGEMRA, AGEPRO, and MRAPRO.

This model was then investigated further. The equation for this model is as follows:

LNTOTAL = 3.27 - 0.162 AGE + 0.0191 GENMRA + 0.00656 AGEMRA + 0.0277 AGEPRO - 0.0323 MRAPRO

Some statistics associated with this model are given in Table 6.

**Table 6:Statistics for Model of LNTOTAL**

F Value (p-value) |
p-values for the coefficients | R2 | MSE | |
---|---|---|---|---|

3.45 (.013) |
bAGE: .069 bGENMRA: .048 bAGEMRA: .002 bAGEPRO: .003 bMRAPRO: .001 |
34.3% | 1.284 |

As can be seen, this model is significant. It explains 34.3% of the variance
in LNTOTAL. All coefficients are significant or approach significance.

Variance Inflation Factor (VIF) values were computed for all regressors in the model. These values provide an indication of linear associations among regressors which might lead to multicollinearity problems. If any VIF value exceeds 10, Myers (1986) suggests that there may be cause for concern. The VIF values for the model ranged from 1.1 to 7.5 which do not suggest a problem with multicollinearity in the model.

As a check on the underlying normality and equal variance assumptions of the model, four diagnostic residual plots were produced: a normal probability plot of the residuals, an I chart of residuals, a histogram of residuals, and a scatter plot of the residuals versus the fitted values. These plots suggested no problems with the underlying assumptions of the linear regression model.

Finally, an analysis of residuals was performed to identify high-influence points. Residual diagnostics used were standardized residuals, hii values, Cook's D, and DFITS values. Standardized residuals are helpful in identifying data points which are extreme in their y value. The hii values - or HAT values - are the diagonal elements of the X(X'X)-1X' matrix and are used to identify data points which are extreme in their x value(s). Cook's D is used to identify data points which have high influence on the b's. Finally, the DFITS values represent the change in the fitted value, in standard deviation units, if the ith point is removed. As such, they represent a combination of diagnostics which forms a measure of how unusual an observation is (Minitab, 1994).

There is no unmistakable criteria with which to declare that a residual diagnostic value implies a high influence point. Several guidelines, however, are suggested. Because standardized residuals have variance 1, observations with absolute values exceeding 2 may be unusual (Minitab, 1994). HAT values exceeding (2*p)/n to (3*p)/n; Cook's D values exceeding the .50 percentage point of an F distribution having p numerator and n-p denominator degrees of freedom; and absolute values of DFIT exceeding 2*sqrt(p/n) may all suggest unusual observations (Myers, 1994). Note that for these criteria, p refers to the number of coefficients in the model (i.e., for the model presented here, p = 6) and n refers to the sample size (n = 39 for this model).

Comparison of the diagnostics obtained for the model to these criteria yielded six points which exceeded the recommended criteria. Those points and their diagnostic values appear in Table 7.

**Table 7:RESIDUAL DIAGNOSTICS FOR MODEL OF LNTOTAL**

observation CRITERIA |
standardized residual 2 |
Hii .308 - .462 |
Cook's D .910 |
DFITS |
---|---|---|---|---|

3 | 1.94252 | 0.133815 | 0.097157 | 0.79891 |

10 | -0.66757 | 0.342586 | 0.038705 | -0.47778 |

11 | -1.75498 | 0.432961 | 0.391949 | -1.58593 |

18 | 0.52572 | 0.310910 | 0.020783 | 0.34920 |

26 | -2.08841 | 0.073395 | 0.057577 | -0.62130 |

28 | 1.96622 | 0.254048 | 0.219442 | 1.20257 |

None of the six points had extreme Cook's D values so it did not appear that any of the points were exerting undue influence on the regression coefficients. Observation 26 stood out for the value of the standardized residual. At
-2.08841, however, it only barely exceeded the criteria and was not given any
further attention. Observations 10, 11, and 18 stood out for their HAT values,
which exceeded the conservative criteria. Because none of these values
exceeded the more liberal criteria, none were deemed problematic. Observation
11 also had an unusual DFITS value. This individual was a male who, at 32
years old, was slightly older than the rest of the sample. His MRA and PREPRO
values - 13 and 2.0, respectively - were both within one standard deviation of
their respective means. His TOTAL score, however, was 0.00. The combination
of these values for the independent variables and TOTAL were likely somewhat
unusual given the model but were not considered problematic.

Observations 3 and 28 also stood out for their DFITS values. At 0.79891, the DFITS value for observation 3 only barely exceeded the criteria and was not given any further attention. The DFITS value for observation 28 was 1.20257. This individual was a 22 year-old male. His MRA value - 38 - was the highest obtained in the sample. His PREPRO value - 5.0 - was within one standard deviation of the mean. His TOTAL score, however, was 26.18. As with observation 11, it was likely the combination of his values for the independent variables and his sickness score was somewhat unusual for the model but did not appear to be problematic.

The final conclusion was that the model of LNTOTAL on AGE, GENMRA, AGEMRA, AGEPRO, and MRAPRO performs and fits the data very well.