Am J Perinatol 2021; 38(12): 1223-1230
DOI: 10.1055/s-0041-1731345
SMFM Fellowship Series Article

Impact of the p-Value Threshold on Interpretation of Trial Outcomes in Obstetrics and Gynecology

Ann M. Bruno
1   Department of Obstetrics and Gynecology, University of Utah Health, Salt Lake City, Utah
2   Department of Obstetrics and Gynecology, Intermountain Healthcare, Salt Lake City, Utah
,
Ashley E. Shea
1   Department of Obstetrics and Gynecology, University of Utah Health, Salt Lake City, Utah
,
Brett D. Einerson
1   Department of Obstetrics and Gynecology, University of Utah Health, Salt Lake City, Utah
2   Department of Obstetrics and Gynecology, Intermountain Healthcare, Salt Lake City, Utah
,
Torri D. Metz
1   Department of Obstetrics and Gynecology, University of Utah Health, Salt Lake City, Utah
2   Department of Obstetrics and Gynecology, Intermountain Healthcare, Salt Lake City, Utah
,
Amanda A. Allshouse
1   Department of Obstetrics and Gynecology, University of Utah Health, Salt Lake City, Utah
,
James R. Scott
1   Department of Obstetrics and Gynecology, University of Utah Health, Salt Lake City, Utah
,
Nathan R. Blue
1   Department of Obstetrics and Gynecology, University of Utah Health, Salt Lake City, Utah
2   Department of Obstetrics and Gynecology, Intermountain Healthcare, Salt Lake City, Utah
› Author Affiliations
Funding T.D.M. reports receiving a stipend for fulfilling the role of associate editor of obstetrics for Obstetrics and Gynecology, and reports other from American College of Obstetricians and Gynecologists outside the submitted work.

Abstract

Objective Randomized controlled trials (RCTs) are considered the highest level of evidence to inform clinical practice. However, the reproducibility crisis has raised concerns about the scientific rigor of published RCT findings. Some advocate for a lower p-value threshold. We aimed to review published OB/Gyn topical RCTs in three representative OB/Gyn journals and three high impact non-OB/Gyn journals to determine if their interpretations would change with adoption of a p-value threshold for significance of 0.005. Secondarily, we evaluated if there were differences in methodologic characteristics between those that did and did not lose significance.

Study Design A manual search was performed to identify all OB/Gyn RCTs published in the selected journals between July 2017 and June 2019. Data were collected on primary outcome(s), methodology, and p-values. We determined the proportion of primary outcomes that would remain statistically significant with adoption of a p-value significance threshold of 0.005 versus be reinterpreted as “suggestive” (defined as p-value between 0.005 and 0.05). Chi-square or Fisher's exact test were used to compare study characteristics.

Results Overall, 202 RCTs met inclusion criteria; 52% in obstetrics and 48% in gynecology. Of 90 studies considered significant with p <0.05 at the time of publication, 54.4% (n = 49) would maintain significant (p < 0.005), while 45.6% (n = 41) would become suggestive using the lower threshold. Most RCTs utilized a single (90.1%) versus composite (8.9%) primary outcome type, used an intent-to-treat analysis (73.3%), and studied a drug intervention (46.5%). Methodologically, 23.7% did not prespecify analysis type, 28.2% did not meet the pre-determined sample size, and 9.4% did not report an a priori sample size calculation. Studies maintaining significance were more likely to be international and report a funding source.

Conclusion Adopting a p-value significance threshold of 0.005 would require reinterpretation of almost half of RCT results in the OB/Gyn literature. Highly variable methodological quality was identified.

Key Points

  • New p-value threshold results in reinterpretation of nearly half of RCT results in OB/Gyn literature.

  • Highly variable methodological quality was identified.

  • Reduced use of binary interpretations of significance is necessary.

Supplementary Material



Publication History

Received: 02 November 2020

Accepted: 20 May 2021

Article published online:
24 June 2021

© 2021. Thieme. All rights reserved.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA

 
  • References

  • 1 Baker M. 1,500 scientists lift the lid on reproducibility. Nature 2016; 533 (7604): 452-454
  • 2 Fisher R. Statistical Methods for Research Workers. Edinburgh, UK: Oliver & Boyd; 1925
  • 3 Fisher R. The Design of Experiments. Edinburgh, UK: Oliver & Boyd; 1935
  • 4 Gigerenzer GK, Stefan K, Vitouch O. The null ritual: what you always wanted to know about significance testing but were afraid to ask. In: Kaplan D. ed. The Sage Handbook of Quantitative Methodology for the Social Sciences. Thousand Oaks, California: 2004: 391-408
  • 5 Fisher R. Statistical Methods and Scientific Inference. Edinburgh, UK: Oliver & Boyd; 1956
  • 6 Grimes DA, Schulz KF. An overview of clinical research: the lay of the land. Lancet 2002; 359 (9300): 57-61
  • 7 Ioannidis JPA. The proposal to lower P value thresholds to .005. JAMA 2018; 319 (14) 1429-1430
  • 8 Wasserstein RL, Lazar NA. The ASA statement on p-values: context, process, and purpose. Am Stat 2016; 70: 129-133
  • 9 Benjamin DJ, Berger JO, Johannesson M. et al. Redefine statistical significance. Nat Hum Behav 2018; 2 (01) 6-10
  • 10 Wayant C, Scott J, Vassar M. Evaluation of lowering the P value threshold for statistical significance from. 05 to. 005 in previously published randomized clinical trials in major medical journals. JAMA 2018; 320 (17) 1813-1815
  • 11 Johnson AL, Evans S, Checketts JX. et al. Effects of a proposal to alter the statistical significance threshold on previously published orthopaedic trauma randomized controlled trials. Injury 2019; 50 (11) 1934-1937
  • 12 Schulz KF, Altman DG, Moher D. CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340: c332
  • 13 Nuzzo R. Scientific method: statistical errors. Nature 2014; 506 (7487): 150-152
  • 14 Ioannidis JP. Why most published research findings are false. PLoS Med 2005; 2 (08) e124
  • 15 Chavalarias D, Wallach JD, Li AH, Ioannidis JP. Evolution of reporting P values in the biomedical literature, 1990-2015. JAMA 2016; 315 (11) 1141-1148
  • 16 Whitley E, Ball J. Statistics review 4: sample size calculations. Crit Care 2002; 6 (04) 335-341
  • 17 Grobman WA, Rice MM, Reddy UM. et al; Eunice Kennedy Shriver National Institute of Child Health and Human Development Maternal–Fetal Medicine Units Network. Labor induction versus expectant management in low-risk nulliparous women. N Engl J Med 2018; 379 (06) 513-523