Prediction of Simulator Sickness
in a Virtual Environment

[Previous Chapter][Table of Contents][Next Chapter]


Results

The data analyzed in this experiment appear in Appendix C.

Phase I

Summary Statistics for the Independent Variables

Age. The first independent variable was age. The age of research participants ranged from 19 to 46 years with a mean of 22.7 years (standard deviation = 4.74) and both a median and mode equal to 22 years (frequency = 11).

Gender. The second independent variable was gender. Of the 40 participants, there was an equal number of males and females.

Mental Rotation Ability. The third independent variable was mental rotation ability as assessed by the Cube Comparison Test (CCT). For the individuals used in this analysis, score on the CCT ranged from 2 to 38 with a mean of 18.4 (standard deviation = 9.46), a median of 19.5 and modes of both 11 and 24 (frequency = 10 for each). It should be noted that Ekstrom, French, Harman, and Dermen (1976) reported a mean of 22.7 (standard deviation = 9.4) on the CCT in the 1963 Kit for a sample of 46 college students. Thus, the data obtained in this research appears to be fairly representative.

Postural Stability. The final independent variable was pre-exposure postural stability. The measure of pre-exposure postural stability used in this research was the average of the two pre-exposure Prototype values. For the 40 participants used in this analysis, this measure ranged from 1 to 8.5 with a mean of 3.71 (standard deviation = 1.93) and a median of 3.

Summary Statistics for the Dependent Variables

There were four dependent variables of interest in this study: the Total Severity score and the Nausea, Oculomotor Discomfort, and Disorientation subscale scores, all computed from the SSQ. For the 40 research participants used in this study, the summary statistics for these four measures are given in Table 1.

Table 1:
Summary Statistics for Post-Exposure Sickness Measures

Range Mean Median Standard Deviation Lower Quartile Upper Quartile
Total Severity 0.00 - 138.38 21.22 13.09 26.81 3.74 28.99
Nausea 0.00 - 114.48 18.13 9.54 25.36 0.00 28.62
Oculomotor Discomfort 0.00 - 90.96 16.11 7.58 19.36 0.00 22.74
Disorientation 0.00 - 180.96 22.97 13.92 35.18 0.00 27.84


Summary Statistics for Additional Variables

Reported Pre-Exposure Symptoms. A pre-exposure SSQ was also administered. This was for the purpose of establishing that participants were asymptomatic prior to exposure. Summary statistics for the pre-exposure SSQ scores appear in Table 2.

Table 2:
Summary Statistics for Pre-Exposure Sickness Measures

Range Mean Median Standard Deviation Lower Quartile Upper Quartile
Total Severity 0.00 - 18.70 3.74 0.00 5.08 0.00 6.55
Nausea 0.00 - 28.62 3.10 0.00 5.87 0.00 9.54
Oculomotor Discomfort 0.00 - 22.74 4.36 0.00 3.79 0.00 7.58
Disorientation 0.00 - 13.92 1.39 0.00 4.23 0.00 0.00


Of the 40 participants included in the study, 19 (47.5%) reported at least one pre-exposure symptom. The most commonly reported pre-exposure symptom was fatigue, reported by 10 participants. This is not surprising, given that participants were all college undergraduates. The only SSQ symptoms not reported on the pre-exposure questionnaire were blurred vision, dizzy (eyes open or closed), vertigo, and burping. It was concluded that the pre-exposure symptoms reported by participants included in the analysis were inconsequential given the experimental situation and conditions under which the data were collected. It should be noted that precedence for the use of participants in VR research (especially college undergraduates) who are not completely asymptomatic prior to exposure has already been established in the VR research community (e.g., Singer, Ehrlich, Cinq-Mars, & Papin, in press). Although it would be logical to want participants to be as asymptomatic as possible, in terms of generalizability to the real world, it would seem unreasonable to expect that all users of VR systems would be entirely asymptomatic prior to exposure.

Final Level Reached in Ascent. The game Ascent consists of 10 levels. The level reached by the participants in this study ranged from 1 to 5. The median level reached was 3. Reached by over half of the participants in this study (frequency = 22), level 3 was also the modal level reached.

HMD Setting. The i*glasses!™ have an adjustment for the stereoscopic display. This setting adjusts which eye sees the image first and the preferred setting often differs from person to person. When participants were first introduced to the VR system, they viewed the image under both settings and indicated which setting they preferred. Six of the 40 participants (15%) played the game using setting 3D-1 and 34 of the participants (85%) played the game using setting 3D-2.

Time in VE. Participants were asked to play Ascent for 20 minutes. All but one of the participants played for this amount of time. The other participant stopped play after 7.07 minutes due to illness.

Temperature. Temperature in the interior experimental room was fairly constant over most of the experiment, usually ranging from 68 to 72 degrees Fahrenheit. For two sessions, however, temperature in the building housing the laboratory was unusually high. For these two sessions, the temperatures during the experimental session were 76 and 78 degrees.

Results of the Analysis

It was hypothesized that sickness could successfully be modeled on characteristics of an individual using linear regression techniques. Because there were four measures of sickness under investigation, four regression analyses were attempted. For presentation of the results of these analyses, the following abbreviations for the independent and dependent variables will be used:

AGE: age of the individual
GENDER: gender of the individual; coded -1 for male and 1 for female
MRA: Mental Rotation Ability; assessed by score on Cube Comparison Test; higher values indicate better mental rotation ability
PREPRO: mean of the two pre-exposure Prototype values; lower values indicate better postural stability
GENAGE: product of GENDER and AGE
GENMRA: product of GENDER and MRA
GENPRO: product of GENDER and PREPRO
AGEMRA: product of AGE and MRA
AGEPRO: product of AGE and PREPRO
MRAPRO: product of MRA and PREPRO
TOTAL: Total Severity score on SSQ; higher scores indicate more or more severe sickness
NAUS: score on Nausea subscale of SSQ; higher scores indicate more or more severe sickness
VIS: score on Oculomotor Discomfort subscale of SSQ; higher scores indicate more or more severe sickness
DIS: score on Disorientation subscale of SSQ; higher scores indicate more or more severe sickness
LNTOTAL: natural logarithm of (TOTAL+1)
LNNAUS: natural logarithm of (NAUS+1)
LNVIS: natural logarithm of (VIS+1)
LNDIS: natural logarithm of (DIS+1)

Only the final results of the regression analyses are presented here. Details are presented in Appendix D.

Prior to presentation of the results, three points should be noted. First, preliminary analyses with the data indicated that both assumptions underlying the use of linear regression techniques - normality of the errors and homogeneous variance - would not be met with models which used the actual SSQ scores (TOTAL, NAUS, VIS, and DIS). Thus, the natural logarithm transformation was applied to the SSQ scores. Because the natural logarithm is undefined for the value zero, a value of 1 was added to each SSQ score before the logarithm was taken so that the natural logarithm of all sickness scores were defined. It should be noted that other research using SSQ scores has also employed the use of the natural logarithm transformation to stabilize skew in the data (e.g., Kennedy, Berbaum, Dunlap, & Smith, 1995). It should be kept in mind that the log-transformed scores were used only in the regression analysis - all other analyses report the original un-transformed sickness scores.

Second, as an initial step in the regression analyses, scatter plots were examined for each sickness measure (LNTOTAL, LNNAUS, LNVIS, and LNDIS) versus each of the ten regressors. It was found that, for the subscale scores (LNNAUS, LNVIS, and LNDIS), there was a very clear separation in the scatter plots representing those individuals who were asymptomatic on a particular subscale and those who were symptomatic. Thus, with the small-sized data set used in this analysis, the subscale scores were essentially categorical measures. Preliminary analysis with these subscale scores revealed problems with meeting the assumptions associated with linear regression techniques due to the categorical-like nature of the scores. In order to properly analyze the data in this research, generalized linear modeling procedures such as loglinear or logistic regression would be necessary. Linear regression techniques could be used to analyze subscale data if the data set was large enough so as to have sufficient spread of the subscale score values (as was found in this research data with the TOTAL scores). It was concluded that linear regression techniques could not be used to properly model the subscale data in this experiment. Thus, a formal regression analysis was conducted only for the transformed Total Severity scores.

Third, scatter plots also revealed the presence of an outlier which corresponded to an extreme AGE value - a 46-year old female. Because of the significant weight this point could potentially carry in a regression equation, it was eliminated from the regression analysis. Thus, the regression analysis on LNTOTAL was conducted using only 39 observations.

For prediction of the transformed Total Severity score, it was concluded that the best linear model was

LNTOTAL = 3.27 - 0.162 AGE + 0.0191 GENMRA + 0.00656 AGEMRA + 0.0277 AGEPRO - 0.0323 MRAPRO

This model was significant (F = 3.45, p = .013) and, based on the R2 value, explains 34.3% of the variance in the transformed Total Severity score. Details about the selection and performance of this model can be found in Appendix D.

The equation of this model indicates a complicated relationship between transformed Total Severity score (which will be referred to simply as "sickness") and the four independent variables. The model suggests no clear relationships between sickness and any one variable. Instead, the relationships are complex interactions among several variables. Conclusions from this model are discussed in the Discussion section.

Additional Findings

Establishing the Occurrence of Sickness. By combining the results presented in Tables 1 and 2, some conclusions can be drawn regarding the occurrence of sickness in this experiment.

The mean pre- and post-exposure Total Severity subscale scores were 3.74 and 21.22, respectively (standard deviation = 5.08 and 26.81, respectively). These means differ significantly with a one-tailed paired t-test of sample means (t=4.21, p=.0001). Thus, it can be concluded that the mean post-exposure score on the Total Severity subscale is significantly greater than the mean pre-exposure score.

The mean pre- and post-exposure Nausea subscale scores were 3.10 and 18.13, respectively (standard deviation = 5.87 and 25.36, respectively). These means differ significantly with a one-tailed paired t-test of sample means (t=3.73, p=.0003). Thus, it can be concluded that the mean post-exposure score on the Nausea subscale is significantly greater than the mean pre-exposure score.

The mean pre- and post-exposure Oculomotor Discomfort subscale scores were 4.36 and 16.11, respectively (standard deviation = 6.16 and 19.36, respectively). These means differ significantly with a one-tailed paired t-test of sample means (t=4.02, p=.0001). Thus, it can be concluded that the mean post-exposure score on the Oculomotor Discomfort subscale is significantly greater than the mean pre-exposure score.

The mean pre- and post-exposure Disorientation subscale scores were 1.39 and 22.97, respectively (standard deviation = 4.23 and 35.18, respectively). These means differ significantly with a one-tailed paired t-test of sample means (t=3.87, p=.0002). Thus, it can be concluded that the mean post-exposure score on the Disorientation subscale is significantly greater than the mean pre-exposure score.

These four results indicate that the post-exposure sickness score for all four sickness measures significantly exceeded the corresponding pre-exposure score. Thus, it was concluded that sickness, as measured by the SSQ, did, in fact, occur in this experiment.

Most Severe Symptoms. One participant had to withdraw from the VE after only 7.07 minutes due to illness. This participant - a 23-year old female - experienced the most severe symptoms of all participants in this research. Her post-exposure Total Severity score was 138.38 and her scores were 114.48, 90.96, and 180.96 on the Nausea, Oculomotor Discomfort, and Disorientation subscales, respectively. Shortly after exiting the VE, she induced vomiting and vomited three times. She induced vomiting again later. After vomiting, she reported feeling much better and indicated having only a slight headache and "grogginess". She reported having eating immediately prior to arriving at the experimental session but did not attribute her symptoms to eating or to what she ate. She reported that she has a long history of motion sickness in both cars and airplanes and felt that moving her head left and right while playing the game was the most problematic for her. After exiting the VE, she remained at the experimental site for less than one hour. During this time she walked around outside and rested.

When she was called for the follow-up six hours after the session, she reported that, after the experiment, she went home, took some aspirin, and was then feeling fine with only some lingering nausea. This participant was called for two additional follow-ups. The first was the following day. At that time she reported still having nausea and headache but not as bad as the previous day. The second additional follow-up was six days after the session. At that time, she reported that she started feeling better two to three days after the session. She also reported that her experience was "probably [her] worst case of motion sickness ever."

Additional Symptoms Reported. In addition to the sixteen symptoms on the version of the SSQ used in this experiment, there was a place on the questionnaire where the individual could indicate additional symptoms. No participants indicated any immediate additional symptoms after exiting the VE. However, one participant experienced some delayed effects. On the post-exposure SSQ, this participant indicated moderate eye strain and slight general discomfort, fatigue, headache, fullness of head, and blurred vision. Approximately ten minutes after exiting the VE, however, this participant indicated feeling dizzy and having severe difficulty concentrating and moderate headache and nausea. Over the course of the 30-minute waiting time, these feelings subsided.

Reported After-Effects. During the follow-up call, participants were asked if they had experienced any after effects which they thought might be due to their exposure to the VE. Anything mentioned by participants was noted without judgment of whether it was due to VR exposure. Fourteen participants indicated some type symptom or condition.

Most participants were able to clearly describe their symptoms. Two participants reported that they felt dizzy or "foggy" afterward. For one of these participants, the feeling subsided shortly thereafter; for the other it lasted for about 1 hours after leaving the experimental site.

Four participants indicated having a headache. The first participant - who reported having only a slight headache - suggested that it might have been due to not having glasses on. The second participant - who also reported only a slight headache - also reported having eyestrain afterward. The third participant reported having a headache off and on for the rest of the day but did not know if it was related to the VR experience. The fourth participant - who reported being prone to migraines - reported having a "terrible" headache for rest of day.

Four participants reported some form of stomach upset, nausea, or motion sickness. The first participant reported feeling nauseous for about an hour afterward but indicated feeling better after eating. The second participant reported feeling nauseous and "not that great" afterward and rested for about a half hour after returning home and eating lunch. The third participant reported having an upset stomach immediately after leaving the experimental site and indicated that it lasted about 6 hours. The fourth participant reported being "pretty motion sick" for the rest of the day and noted that a drive of about 100 miles later that evening was "tough." This participant reported being fine the next day.

Two participants reported other symptoms. The first reported sleeping 12 hours the next day. A personal friend of this participant attributed it to the VR experience and insisted that the participant mention it during the follow-up call. When called for the follow-up - nine hours after participating in the research - the second participant reported still feeling dizzy and having a "heavy-feeling" head. This participant reported starting to get sick around two to three hours after the experiment and eventually vomited. This participant was called for an additional follow-up five days later. At that time, it was reported that the participant's son was feeling bad and, thus, the participant attributed the earlier sickness to a "bug" which was going around at the time.

Finally, two participants had difficulty describing their symptoms. The first reported feeling "kinda funny" while walking away from the experimental site but noted that it was not "discomforting." The second participant reported going scuba diving on a boat later the day of the experiment and indicated feeling "a little funny" but did not attribute it to the VR exposure.

Gender Differences in Mean Sickness Scores. The observed Total Severity means for males and females were 13.28 and 29.17, respectively (standard deviation = 17.35 and 32.28, respectively). These means do not differ significantly with a two-sided two-sample t-test (t=1.94, p=.062). The observed Nausea subscale means for males and females were 10.97 and 25.28, respectively (standard deviation = 17.85 and 29.89, respectively). These means also do not differ significantly with a two-sided two-sample t-test (t=1.84, p=.076). The observed Oculomotor Discomfort subscale means for males and females were 11.37 and 20.84, respectively (standard deviation = 15.06 and 22.25, respectively). These means do not differ significantly with a two-sided two-sample t-test (t=1.58, p=.12). Finally, the observed Disorientation subscale means for males and females were 12.53 and 33.41, respectively (standard deviation = 18.57 and 44.34, respectively). These means do not differ significantly with a two-sided two-sample t-test (t=1.94, p=.063).

Thus, although females had higher mean sickness scores than did males on every measure of sickness, it was concluded that the males and females in this study did not significantly differ on any of the four sickness measures. Furthermore, for almost every measure, the female mean score was at least twice that of the male mean score. In addition, there was much greater variance in the female scores.

Relationship Between Inter-pupillary Distance and Visual Symptoms. The relationship between inter-pupillary distance (IPD) and two variables - score on the Oculomotor Discomfort subscale and the symptom "Eyestrain" on the SSQ - was investigated. The linear correlations for both relationships were negative (-.150 for the correlation between IPD and the Oculomotor Discomfort subscale score; -.394 for the correlation between IPD and the symptom). Only the correlation between IPD and the symptom, however, was significant (p = .012).

Relationship Between Sickness and Final Level Reached in Ascent. The relationship between sickness and the final level reached in Ascent was investigated. The correlations between level and the Total Severity and Nausea, Oculomotor Discomfort, and Disorientation subscale scores were -.572, -.565, -.555, and -.468, respectively. All four of these correlations were significant (p < .0012 for each).

Phase II

Summary Statistics

The measures of pre- and post-exposure postural stability were the average of the two pre- and two post-exposure Prototype values. In the previous subsection, it was noted that the average pre-exposure Prototype value in this study ranged from 1 to 8.5 with a mean of 3.71 (standard deviation = 1.93) and a median of 3. The average post-exposure Prototype value in this study also ranged from 1 to 8.5 with a mean of 3.58 (standard deviation = 1.91) and a median of 3.There was a significant linear correlation between these two variables (r = .507, p = .001).

Results of the Analysis

It was hypothesized that ataxic decrements in postural stability would be associated with exposure to VR. Because lower values of the Prototype measure reflect better postural stability, quantitatively the hypothesis was that the mean post-exposure value would exceed the mean pre-exposure value. For the paired t-test, this translates to the following hypotheses:

Ho: post - pre = 0
Ha: post - pre > 0

The mean difference value (post-pre) in this study was found to be -0.138 (standard deviation = 1.91). Thus, because the value is not positive, the hypothesis that ataxic decrements would occur in conjunction with VR exposure was not supported. The mean difference value also does not differ from 0 with two-sided paired t-test (t=-0.46, p=.65). A 95% confidence interval for the true difference between the pre- and post-exposure mean Prototype values is (-0.747, 0.472). Note that this interval does contain some positive values. Thus, the hypothesis of ataxic decrements is not wholly untenable despite the fact that this research did not support it.

Additional examination of the data in this study lends further support to the hypothesis that ataxic decrements may occur. The mean difference between only the first post-exposure trial and the second pre-exposure trial was 0.575 (standard deviation = 2.934). Although this value is not statistically significant (t=1.24, p=.11), it is in the hypothesized negative direction. The mean difference between the first post-exposure trial and the second pre-exposure trial for only the 10 participants with the lowest Total Severity scores is 0.300 (standard deviation = 1.77) and the analogous measure for the 10 participants with the highest Total Severity scores is 0.90 (standard deviation = 3.54). Again, these values are both positive as would be hypothesized if ataxic decrements occur with VR exposure.

Comments on the Power Analysis

The pre-experimental Phase II power analysis was conducted assuming that a different measure of postural stability would be used. The measure which was used in the final analysis is likely much less sensitive than the measure whose use was originally proposed. Thus, the original power analysis likely represents an overestimation of power for the analysis conducted.

A post-experimental power analysis was conducted to assess the level of power actually available. Because the two power analyses were for different measures of postural stability, the effect sizes are not directly comparable in original units. However, by specifying the effect size as Cohen's (1988) effect size index, the two analyses can be compared. Cohen's effect size index is the effect of interest (in this case the difference between the post- and pre-exposure postural stability means) divided by the standard deviation of the measure. In other words, it is the effect size in standard deviation units.

The pre-experimental power analysis revealed that a sample of size 40 would be sufficient to detect an effect of index size .589 with power = .839. It was found in this study that the mean difference between the post- and pre-exposure measures was -0.138 with standard deviation 1.91 and correlation between the values of .507. Using the formulas presented in section 2.3.5 of Cohen (1988) for one sample of n differences between paired observations, it was determined that the observed effect size index in this study was .103. Power to detect an effect of this size, was about .11. It should be noted that using Cohen's formulas it is found that a sample of 1000 would yield power of only .72 to detect an effect with index .10, if an effect of that size actually exists.

These power analyses provide several pieces of information. The pre-experiment analysis indicated that this study had a power level of .839 to detect an effect of index .589 if an effect of that size actually existed. Thus, if the conclusion is to accept the null hypothesis of no difference between post- and pre-exposure postural stability measures, the probability is .161 that such a conclusion would be incorrect if an effect of index .589 actually exists. An effect of index .103 was actually found. The post-experiment analysis indicated that this study only had power of about .11 to detect an effect of this size if an effect of that size actually existed.

Thus, two conclusions are possible. First, it could be that an effect of index .589 actually exists but, by nature of chance or because of the use of a less sensitive measure, it was not found. Under such conditions, if it is concluded that there is no difference between the post- and pre-exposure postural stability measures, the probability is .161 that such a conclusion would be incorrect. Second, it could be that the actual effect is much smaller than anticipated, perhaps on the order of index .10 as found in this study. This experiment lacked sufficient power to detect an effect of that size. Under these conditions, the conclusion that there is no difference between the post- and pre-exposure postural stability measures would be erroneous and would largely be due to insufficient power.