Mcnemar test and null hypothesis




















For example, it is used to analyze tests performed before and after treatment in a population. The null hypothesis is. The alternative hypothesis is.

The Odds Ratio is. Examples Top. Example 1. To determine whether a drug has an effect on the disease, the result of diagnosis before and after the treatment is tabulated on a 2x2 contingency table. Then the test statistic is 8. The Odds Ratio is Example 2: Matched Case-Control Study. A study was carried out on post-menopausal women in City A. Cases of women with endometrial cancer were identified from this city. A control group was selected matched to the case on age and lengh of residence in city A.

Children took a test to evaluate their knowledge of the core concepts; they then participated in the game and retook the test. The authors were interested in identifying whether, for each question in the text, the number of students who answered the questions correctly was different before or after an outdoor game had been carried out.

We feel this is the type of data and research question for which the McNemar test is ideally suited; however, the paper used the chi-squared test. As we emphasise above, these two tests test different hypotheses. Magris et al.

The key experiment presented in this paper has a paired design: 24 females interacted sequentially in random order with a control male and a male that had been sterilised by receiving a sublethal dose of radiation. In part of the analysis, the two types of males were compared in terms of six aspects of their mating behaviour: shudder number, shudder duration, rock number, gap duration, copulation duration, and whether or not sexual cannibalism by the female occurred.

The last of these measures was investigated with a chi-squared test, whereas the other five were evaluated by tests that acknowledge the paired nature of the data either a paired t -test or Wilcoxon signed-rank test. By selecting the chi-squared test, the authors chose to ignore the paired nature of the data that they considered relevant for the other five measures of mating.

As a result, all males were treated as being statistically independent for this one measure. If the authors had instead used a McNemar test, which would have treated the female as the independent unit of measurement, this would have fully accounted for the paired nature of the data that was considered in all other analyses. Similarly, a recent paper by Haines et al.

In this example, for each female song paper included in the study, a matching general song paper was identified and selected. By using a chi-squared test, the paired nature of the data was ignored, and the assumptions of the statistical test did not match those of the experimental design. As the results in this paper provide two clear concordant and discordant results, we feel that the McNemar test would have been a more appropriate test with which to analyse this data.

We have selected examples where, although we think the analyses could be improved with use of the McNemar test, it highly unlikely that this would change the major conclusions of the paper. Therefore, in highlighting these papers, we do not imply any criticism of the authors and reviewers and do not suggest that the published analysis should be corrected nor a caveat applied. However, they do support our contention that the McNemar test being unjustly neglected. Some authors do, however, use the McNemar test effectively in similar studies.

Chen and Pfennig used an experimental arena with two sound sources mimicking different types of male courtship call and investigated which attracted female toads placed individually in the arena. Each female was tested under two conditions: deep and shallow water.

McNemar tests were instrumental in demonstrating that water depth affected female choice in a way that acknowledged the paired nature of the data. There are four versions of the McNemar test classical, continuity corrected, exact, and mid- P see Fagerland et al. The classical and continuity corrected versions of the test are available through the function mcnemar.

The exact version can be obtained from the function mcnemar. The mid- P version is not directly available in any package but can be obtained relatively simply using the recipe below, where b and c are the values in the two discordant cells of the contingency table. The P value is given by:. In Appendix 1 , we provide an example of the R implementation of all four McNemar methods.

Until recently, the common advice e. However, this advice has recently been challenged on the evidence of extensive simulations Fagerland et al. The conclusions of these studies are threefold:. The exact version of the test and other corrections that have been suggested do control the type I error rate below the nominal value, but they actually produce type I error rates that are considerably below the nominal value and thus offer low statistical power to detect real effects that may exist.

On this basis, these alternatives cannot be recommended. The classical asymptotic version of the McNemar test does frequently exceed the nominal type I error rate when sample sizes are low, but apparently never by much—never above 5.

On this basis, the classical version could be used routinely, if this slight inflation of the type I error rate is considered when interpreting the results in cases where sample sizes are low. A mid- P version of the test seems to offer the best combination of properties. Its power seems always to be very similar to the classical version, but its control of type I error rate is better.

While it has not been demonstrated analytically to preserve the nominal type I error rate in all circumstances, it never exceeded the nominal level in the extensive set of simulations provided in these two papers. On this basis, this version can also be recommended for routine use. Its calculation is relatively simple, as detailed above.

We surveyed a range of journals in behaviour, ecology, and evolution that allowed electronic searching of the whole text of papers. Journals were included in our survey subjectively on the basis of our interpretation of their subject areas. We found that all journals that we considered for inclusion allowed searching of their text and had relevant papers. Thus, all journals considered in our survey can be found in Table 4 and ESM 1.

This produced no false positives, that is, this word was always associated with our focal statistical test. We focussed only on the first use of the McNemar test in any paper but excluded articles where the data on which the test was applied was not provided in the paper, supplements, or data repositories.

For the 50 papers produced by this method, we identified the sample size and subsequently obtained or inferred the values in the concordant cells which we label a and d and the discordant cells labelled b and c. The P values for each version of the test: exact, classical, corrected or mid- P were then calculated, using the formula and packages described above R Core Team , in order to identify 1 which version was used by the authors of the original study; 2 if the correct version of the test had been chosen based on the recent research advice Fagerland et al.

In the 50 studies analysed, the papers rarely stated which of the four possible variants of the test were used to calculate the P value given. In a small minority of cases, inference could be made on the basis of information provided on statistical packages used, but generally, this was not conclusive.

However, we could perform the test by all four methods, and on this basis were unambiguously able to identify the method used in all cases. We sampled 50 recent papers published between and which used the McNemar test see Table 4 and ESM 1 for full citations of these papers. We found that 17 had used the classical method and 33 the corrected method. None had used the mid- P method recommended by Fagerland et al. Of the remaining 42 papers with small samples, 16 used the classical method and 26 the corrected.

Our data, and the observation that the papers in our survey rarely provided readers with information on which of the four methods was actually implemented, suggests that current practice in terms of test version selection and how it is reported is far from optimal. We can use the data in the surveyed papers to explore how much choice of method influences the calculated P value for the datasets typically generated by researchers.

Table 4 suggests choice of method can have a strong impact on the P value obtained: in 39 out of the 50 cases, the highest of the 4 alternative P values was more than twice the lowest value. This number remains substantial even when ignoring the 6 out of these 39 cases where all the P values where less than 0. Hence, overall, our survey does suggest that, for the types of datasets generated by researchers, selection of the appropriate version of the McNemar test is of practical importance.

When using the McNemar test, it is not the total sample size that determines power, but the total number of discordant pairs. In 20 out of the 50 data sets studied, the number of discordant pairs was less than 10, 10 of which were less than 5.

Our table suggests that, in order to obtain a significant P value, the frequency of discordant pairs should be greater than 4, with a very large effect size required when the frequency is between 4 and 10 Table 4. As such, although we recommend the mid- P version of the test when sample sizes are small, its power is dependent on discordant pair sample size. We caution against implementation of the test when the number of discordant pairs is lower than ten, to avoid issues of interpretation resulting from low statistical power.

Using these approaches, the null hypothesis tested is slightly different from that of the McNemar test. Returning to our hypothetical example outlined in Table 1 , the McNemar test generates a P value associated with the null hypothesis that for those chimpanzees than only open one box, that box is just as likely to be one type as the other.

Abhilash Nelson. Statistics - Mcnemar Test Advertisements. Previous Page. Next Page. Useful Video Courses. Class 11th Statistics for Economics 40 Lectures 3. More Detail. Statistics 40 Lectures 2 hours Megha Aggarwal.



0コメント

  • 1000 / 1000