Tables 1 and 2 show the means of the test scores for the various media. Figures 14 and 15 show a summary of all of the media treatments. In these figures, the box on the left represents the pre test score and the box on the right represents the post test score for each treatment. These box plots graph the median levels and the range of the data points. The horizontal line inside of the box represents the median, where half the data points are above and below that value. The box edges represent the quartile median lines. For the upper box edge, one quarter of the data points are above that value and three quarters are below that value. The lower box edge is figured similarly. The whiskers represent the range of the data, unless there are outliers. If there are outliers, the whiskers represent the range of the non-outlier data. Outliers are defined as data points that are more than the length of the whole box outside of one edge of the box.
Table 1: Oral Test Means and Variances
| Medium | Number of People | Oral Pre Mean | Oral Pre Variance | Oral Post Mean | Oral Post Variance | Oral Extended Mean |
|---|---|---|---|---|---|---|
| VR | 36 | 3.306 | 7.990 | 6.583 | 13.164 | 4.333 |
| Mac Int | 13 | 2.071 | 3.833 | 6.231 | 8.192 | |
| Mac Run | 13 | 4.462 | 5.269 | 6.786 | 4.603 | |
| Video | 20 | 3.100 | 14.200 | 5.350 | 24.239 | 3.824 |
| Control | 7 | 4.429 | 9.952 | 5.143 | 8.476 | 5.500 |
Table 2: Written Test Means and Variances
| Medium | Number of People | Written Pre Mean | Written Pre Variance | Written Post Mean | Written Post Variance | Written Extended Mean |
|---|---|---|---|---|---|---|
| VR | 36 | 6.528 | 13.313 | 7.722 | 18.462 | 7.028 |
| Mac Int | 13 | 4.885 | 8.340 | 7.385 | 5.590 | |
| Mac Run | 13 | 5.730 | 11.410 | 7.692 | 11.561 | |
| Video | 14 | 6.572 | 8.981 | 6.893 | 10.276 | 7.618 |
| Control | 7 | 7.785 | 10.988 | 7.214 | 14.255 | 7.750 |
Figure
14: Oral Pre and Post Scores
Figure
15: Written Pre and Post Scores
The independency of the media variable is the important assumption. To examine possible colinearity, a Pearson correlation analysis was run. Table 3 shows the Pearson correlation among the independent variables on a scale from -1 to +1 where a large number implies a large correlation. The values in Table 4 show the probabilities that the correlations are not significant, where a small number implies higher significant correlation. At an alpha of 0.05, Media is not correlated with any of the other independent variables, but all of the rest are significantly correlated.
The lack of independency among the oral, written, and DAT tests is an expected result and does not pose a problem. These three factors cannot be used as separate variables in the same analysis, but they were not intended to be used in that manner. The oral scores and written scores can be analyzed separately or combined into one factor. The DAT score can be used to predict the delta increase on the oral or written test, which eliminates the correlated pre test scores. The important assumption of the independency of the media variable was successfully met.
Table 3: Independent Variable Correlations
| MEDIA | Oral Pre Test | Written Pre Test | DAT | |
|---|---|---|---|---|
| MEDIA | 1.000 | |||
| Oral Pre Test | 0.056 | 1.000 | ||
| Written Pre Test | 0.072 | 0.570 | 1.000 | |
| DAT | -0.121 | 0.493 | 0.469 | 1.000 |
Table 4: Independent Variable Correlation Probabilities
| MEDIA | Oral Pre Test | Written Pre Test | DAT | |
|---|---|---|---|---|
| MEDIA | 0.000 | |||
| Oral Pre Test | 0.680 | 0.000 | ||
| Written Pre Test | 0.593 | 0.001 | 0.000 | |
| DAT | 0.370 | 0.001 | 0.001 | 0.000 |
Another assumption is that the data is normal with equal variances. Referring to Figure 14, the variance and normality for the oral pre test look problematic. Variance is particularly important in experimental designs which have unequal cell sizes. To test the seriousness of the problem for the oral pre test, a Bartlett's test for unequal variances was run with a resulting chi-distribution statistic of 7.71 with 4 degrees of freedom. This value is significant at an alpha of 0.10, which means that the variances are significantly different at that level. Although an alpha of .05 is a typical cut-off value for significance, this significance is close enough to not be ignored and therefore, the assumption of equal variances for the oral test was not met.
For the written pre test shown in Figure 15, the data looks fairly normal. The Bartlett test statistic is 1.62 and is not significantly different at an alpha of .10.
Table 5 shows the result for the media main effect using the Kruskal-Wallis method. Tables 6 and 7 shows the effect using the ANOVA method. The delta score which is the post score minus the pre score was used as the dependent variable. According to both the Kruskal-Wallis and ANOVA methods, media is a significant variable at an alpha of .05. Which medium is better can now be asked.
Table 5: Oral and Written K-W Delta/Media
| Dependent Variable | Ind Variable | Number of Levels | N | K-W Test Statistic | Prob |
|---|---|---|---|---|---|
| Oral Delta | Media | 5 | 89 | 15.842 | 0.003 |
| Written Delta | Media | 5 | 83 | 10.515 | 0.033 |
Table 6: Oral Test ANOVA Delta/Media
| SOURCE | SUM-OF-SQUARES | DF | MEAN-SQUARE | F-RATIO | P |
|---|---|---|---|---|---|
| MEDIA | 79.651 | 4 | 19.913 | 3.354 | 0.013 |
| ERROR | 498.708 | 84 | 5.937 |
Table 7: Written Test ANOVA Delta/Media
| SOURCE | SUM-OF-SQUARES | DF | MEAN-SQUARE | F-RATIO | P |
|---|---|---|---|---|---|
| MEDIA | 62.239 | 4 | 15.560 | 3.024 | 0.023 |
| ERROR | 401.388 | 78 | 5.146 |
To determine how the media were significant, a series of comparisons were made. The non-parametric analysis of Kruskal-Wallis was used again, but in a slightly different manner than before. Instead of examining all of the media treatments in one Kruskal-Wallis analysis, two groups at a time were analyzed. This method is appropriate, because the overall difference among groups was already determined to be significant. As before, while only the non-parametric analysis is necessary, the t-tests were also run to give a sense of the impact that non-normality and unequal variances had on the results. Table 8 shows the results for the oral delta test variable. The media are in order of largest oral delta score to lowest instead of being in order from most immersive/interactive to least.
Table 8: Oral Delta Test Comparisons
| Media | Mac-Int | VR | Video | Mac-Run | Control |
|---|---|---|---|---|---|
| Delta Score | 4.231 | 3.278 | 2.250 | 2.000 | 0.714 |
| Comparison | Oral t-test | T sig dif | Oral K-W | K-W sig dif |
|---|---|---|---|---|
| M-Int/VR | 0.213 | No Dif | 0.169 | No Dif |
| M-Int/Video | 0.060 | No Dif | 0.022 | Dif |
| M-Int/Mac-Run | 0.012 | Dif | 0.016 | Dif |
| Mac-Int/Control | 0.001 | Dif | 0.004 | Dif |
| VR/Video | 0.239 | No Dif | 0.049 | Dif |
| VR/Mac-Run | 0.045 | Dif | 0.063 | No Dif |
| VR/Control | 0.001 | Dif | 0.003 | Dif |
| Video/Mac-Run | 0.788 | No Dif | 0.600 | No Dif |
| Video/Control | 0.104 | No Dif | 0.382 | No Dif |
| M-Run/Control | 0.078 | No Dif | 0.114 | No Dif |
The Kruskal-Wallis and t-tests agree in most cases. They disagree for the Mac-Int/Video comparison, the VR/Video comparison, and the VR/Mac-Run comparison. For the Mac-Int/Video comparison, the t-test shows significance, while the Kruskal-Wallis does not and for the Video/Mac-Run the Kruskal-Wallis finds significance while the t-test does not. However, for these two comparisons, both methods are near the cut-off value of alpha = .05. Since an alpha of .05 is not a sacred number, there is justification to conclude that the delta score is significant. Therefore, the conclusion is that the delta score for the Mac-Int treatment is significantly larger than the one for Video and the delta score for the VR treatment is significantly larger than the one for the Mac-Run treatment.
The difference in the results between the Kruskal-Wallis method and the t-test method is more problematic for the comparison of VR/Video, since the difference in results is large. The Kruskal-Wallis shows a significant difference between VR and Video at an alpha of 0.049 while t-testing results in no significance with a probability of 0.239. The Kruskal-Wallis result will be accepted in this case because of a large variance difference between the VR and Video groups (4.4 vs. 12.1).
The results of Table 8 are illustrated in Figure 16 by drawing a line linking groups which are not significantly different from each other. The Mac-Interactive delta score is not significantly different than the VR delta score, but both of those groups have delta scores which are significantly larger than the other three groups. The delta scores of the Video, Mac-Run, and Control groups are not significantly different from one another.

Figure 16: Oral Delta Score Significance
Table 9: Written Delta Test Comparisons
| Media | Mac-Int | VR | Video | Mac-Run | Control |
|---|---|---|---|---|---|
| Delta Score | 2.500 | 1.962 | 1.194 | 0.321 | -0.571 |
| Comparison | Writ t-test | T sig dif | Writ K-W | K-W sig dif |
|---|---|---|---|---|
| M-Int/Mac-Run | 0.625 | No Dif | 0.587 | No Dif | v
| M-Int/VR | 0.130 | No Dif | 0.094 | No Dif |
| M-Int/Video | 0.027 | Dif | 0.056 | Dif |
| Mac-Int/Control | 0.002 | Dif | 0.007 | Dif |
| M-Run/VR | 0.390 | No Dif | 0.350 | No Dif |
| M-Run/Video | 0.103 | No Dif | 0.113 | No Dif |
| M-Run/Control | 0.012 | Dif | 0.023 | Dif |
| VR/Video | 0.188 | No Dif | 0.458 | No Dif |
| VR/Control | 0.006 | Dif | 0.030 | Dif |
| Video/Control | 0.121 | No Dif | 0.271 | No Dif |
The same methods are used for exploring the written data. For the written delta scores, the Kruskal-Wallis and t-test methods agree in every case as can be seen in Table 9. Figure 17 shows a summary of the data. For the written delta scores, there is no significant difference between the Mac-Int, Mac-Run, and VR groups. Mac-Int is significantly better than the Video and Control groups. Mac-Run and VR are not significantly different than Video, but they are significantly better than the Control group. For the written delta, Video is not significantly better than the Control groups.

Figure 17: Written Delta Score Significance
The media with high levels of interaction seem to be the most successful at increasing the oral test scores with VR and Mac Interactive being significantly better than the other treatments. Immersion did not seem to be important. This will be discussed further in a later section with conjectures as to why immersion may truly be important, even though they were not illustrated in this particular designed experiment.
Regarding the written test, interactivity seems to be important, but not as clearly as with the oral test. Mac-Int and VR are still significant, but the Mac-Run treatment was also as effective as the high interaction groups. This result agrees with the assumptions of the difference between the oral and written tests. Mental model building is not necessarily required for the written test and therefore interaction is not required to do well on the written test.
Due to the logistics of the experimental design, the DAT scores were gathered only for the VR, Video, and Control groups. These groups were chosen as being on the extremes of the immersion/interactivity continuum. Table 10 shows the correlation of DAT to the oral and written pre, post, and delta scores regardless of media. Figure 18 shows the DAT scores plotted against the oral pre scores, oral delta scores (post score minus pre score), written pre scores, and written delta scores. The important information in this data is that the DAT score was not correlated to increase in score (the delta) for both the written and oral tests. This implies that DAT alone is not a significant indicator as to how a student will improve on the chemistry tests.
| DAT | |
|---|---|
| Oral Pre Test | .462 |
| Oral Post Test | .488 |
| Oral Delta Test | .163 |
| Written Pre Test | .483 |
| Written Post Test | .604 |
| Written Delta Test | .342 |

A noteworthy observation is that no one who scored poorly in spatial ability scored well on the oral pre test, while some of these students performed well on the written test. There could be an interesting reason for this difference. Possibly, people who do not naturally have a good spatial ability can learn what is expected of them in the traditional classroom, but underneath do not really have a working concept of what they are allegedly learning. So, the students can learn how to fill out an orbital fill diagram, but without spatial ability, they do not really understand what that diagram means. This suggestion is offered in the spirit of exploration throughout this dissertation. Of course, there are many data points in those two graphs that represent students who have a high level of spatial ability but who also did poorly on the pre tests.
To look for possible DAT/media interactions, Table 11 shows the DAT correlations when the data is divided into VR and Video groups. Figure 19 shows the DAT graphs split by VR and Video treatments. For both the oral and written tests, the correlations for the delta scores are small and similar across the two groups. The graphs show similar scattering of data points. This implies that there is no significant DAT/media interaction.
Table 11: DAT Correlations VR versus Video
| VR-DAT | Video-DAT | |
|---|---|---|
| Oral Pre Test | .572 | .298 |
| Oral Post Test | .532 | .444 |
| Oral Delta Test | .126 | .283 |
| Written Pre Test | .558 | .486 |
| Written Post Test | .643 | .748 |
| Written Delta Test | .342 | .403 |

With a lack of significance both as a main effect and as an interaction effect for both the oral and written tests, the DAT score does not seem to be a significant factor.
Although not included as part of the original hypothesis section, the interesting question for educators is whether the gains experienced by the students are retained after a period of time. To explore this, students who were in the VR, Video, and Control treatments were tested 3 months after their participation in the original study. The oral test was the same as before and the written test was very similar in form to the pre and post written tests.
All students who participated in the VR, Video, or Control treatments were invited to take another oral and written test battery. Most of the students in the control group participated again. However, not all of the other students who had participated in the other treatments were available or interested in continuing their involvement. The sub population who did participate were a representative sample of the whole potential population. Table 12 shows the results of a t-test comparison between the earlier scores of the group who participated in the long term study versus the earlier scores of people who did not participate. All of the numbers are far above an alpha of .05 which means that there is no significant difference between the two groups. All participating students were paid ten dollars for their involvement and were released from one class period. The testing was done at the students' high school during regular school hours. The oral test was the same one used for the pre and post oral tests previously taken. The written test was similar to the previous two written tests with the names of the atoms and molecules changed. Participation in this experiment consisted solely of taking the oral and written tests.
Table 12: Long-Term T-Test Probabilities
| Medium | Oral Pre Mean | Oral Post Mean | Written Pre Mean | Written Post Mean |
|---|---|---|---|---|
| VR | .863 | .964 | .964 | .820 |
| Video | .268 | .692 | .751 | .502 |
The means of the long term oral and written scores by media are shown in Tables 13 and 14. Graphs illustrating these numbers are shown in Figures 20 and 21. The long term means for the various media are not significantly different from each other for either the oral or written tests as shown in Tables 15 and 16. However, since this study is of an exploratory nature, there are some noteworthy points.
Table 13: Oral Long Term Means
| Oral Test | D. of F. | Oral Pre Mean | Oral Post Mean | Oral Long Term Mean |
|---|---|---|---|---|
| VR | 17 | 3.389 | 6.556 | 4.333 |
| Video | 16 | 3.412 | 5.059 | 3.824 |
| Control | 5 | 4.667 | 5.667 | 5.500 |
Table 14: Written Long Term Means
| Written Test | D. of F. | Written Pre Mean | Written Post Mean | Written Long Term Mean |
|---|---|---|---|---|
| VR | 17 | 6.556 | 7.556 | 7.028 |
| Video | 11 | 6.441 | 6.583 | 7.618 |
| Control | 5 | 7.417 | 7.083 | 7.750 |

Figure 20: Oral Long Term Means

Figure 21: Written Long Term Means
| SOURCE | SUM-OF-SQUARES | DF | MEAN-SQUARE | F-RATIO | P |
|---|---|---|---|---|---|
| MEDIA | 12.517 | 2 | 6.259 | 0.481 | 0.622 |
| ERROR | 493.971 | 38 | 12.999 |
Table 16: Written Long Term Test ANOVA
| SOURCE | SUM-OF-SQUARES | DF | MEAN-SQUARE | F-RATIO | P |
|---|---|---|---|---|---|
| MEDIA | 4.014 | 2 | 2.007 | 0.172 | 0.843 |
| ERROR | 444.376 | 38 | 11.694 |
For the oral tests, the averages of all the groups are lowest at the pre test, rise with the post test, and then drop with the extended test, but not down to the original level of the pre test score. The group who were in the VR treatment suffered the largest drop in extended test score, but they were also the group who had gained the most immediately after the treatment. Levels of significance comparing pre, post, and long term tests are shown in Table 17.
Table 17: Oral Long Term T-tests
| Oral Test | D. of F. | Pre/ Post T | Pre/ Post Prob | Pre/ Ext T | Pre/ Ext Prob | Post/ Ext T | Post/ Ext Prob |
|---|---|---|---|---|---|---|---|
| VR | 17 | 6.333 | 0.001 | 2.411 | 0.027 | -5.547 | -0.001 |
| Video | 16 | 2.313 | 0.034 | 0.503 | 0.622 | -1.883 | 0.078 |
| Control | 5 | 2.236 | 0.076 | 0.773 | 0.474 | -0.222 | 0.833 |
The significance of the pre, post, and extended oral tests for VR are important to note. While the extended test score dropped significantly from the immediate post test score, there is still significant improvement over the original pre treatment state of knowledge. Furthermore, although video showed an immediate improvement on the post test score, no long term improvement was maintained. From this data, the conclusion is that while no treatment caused the students to significantly maintain the increase on test score that happened immediately after exposure to the treatment, VR did result in a significantly higher test score than the students had before the study.
Table 18 shows the written test data. The only significance in this test data is the gain in test score between the pre and post tests for the VR treatment. The gain that was seen after the VR treatment was not maintained in the long term. The data from the extended written tests is more puzzling than the oral test data. The VR written data follows the trend of the oral data with a drop in test score for the extended test, but not to the original level of the pre test as seen in Figure 21. However, for the video and control treatments, a slight gain in written test score happens three months after the treatment. Fortunately, this confusing manifestation is not significant at an alpha level of 0.05 and can be contributed to variance in the data.
Table 18: Written Long Term T-tests
| Writ Test | D. of F. | Pre/ Post T | Pre/ Post Prob | Pre/ Ext T | Pre/ Ext Prob | Post/ Ext T | Post/ Ext Prob |
|---|---|---|---|---|---|---|---|
| VR | 17 | 2.766 | 0.013 | 0.833 | 0.416 | -0.913 | 0.374 |
| Video | 11 | 0.420 | 0.683 | 2.018 | 0.061 | 1.767 | 0.105 |
| Control | 5 | 0.791 | 0.465 | 0.319 | 0.763 | 0.594 | 0.579 |