Subscribe to RSS
DOI: 10.1055/s-0040-1712460
APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools
Funding This study was supported by the program “Ayudas para la contratación de personal investigador en formación de carácter predoctoral, programa VALi + d” under grant number ACIF/2018/148 from the Conselleria d'Educació of the Generalitat Valenciana and the “Fondo Social Europeo” (FSE). The authors would like to thank the Spanish “Ministerio de Economía, Industria y Competitividad” for the project “BigCLOE” with reference number TIN2016–79951-R and the European Commission, Horizon 2020 grant agreement No 826494 (PRIMAGE). The MRI prostate study case used in this article has been retrospectively collected from a project of prostate MRI biomarkers validation.Abstract
Background Scientific publications are meant to exchange knowledge among researchers but the inability to properly reproduce computational experiments limits the quality of scientific research. Furthermore, bibliography shows that irreproducible preclinical research exceeds 50%, which produces a huge waste of resources on nonprofitable research at Life Sciences field. As a consequence, scientific reproducibility is being fostered to promote Open Science through open databases and software tools that are typically deployed on existing computational resources. However, some computational experiments require complex virtual infrastructures, such as elastic clusters of PCs, that can be dynamically provided from multiple clouds. Obtaining these infrastructures requires not only an infrastructure provider, but also advanced knowledge in the cloud computing field.
Objectives The main aim of this paper is to improve reproducibility in life sciences to produce better and more cost-effective research. For that purpose, our intention is to simplify the infrastructure usage and deployment for researchers.
Methods This paper introduces Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools (APRICOT), an open source extension for Jupyter to deploy deterministic virtual infrastructures across multiclouds for reproducible scientific computational experiments. To exemplify its utilization and how APRICOT can improve the reproduction of experiments with complex computation requirements, two examples in the field of life sciences are provided. All requirements to reproduce both experiments are disclosed within APRICOT and, therefore, can be reproduced by the users.
Results To show the capabilities of APRICOT, we have processed a real magnetic resonance image to accurately characterize a prostate cancer using a Message Passing Interface cluster deployed automatically with APRICOT. In addition, the second example shows how APRICOT scales the deployed infrastructure, according to the workload, using a batch cluster. This example consists of a multiparametric study of a positron emission tomography image reconstruction.
Conclusion APRICOT's benefits are the integration of specific infrastructure deployment, the management and usage for Open Science, making experiments that involve specific computational infrastructures reproducible. All the experiment steps and details can be documented at the same Jupyter notebook which includes infrastructure specifications, data storage, experimentation execution, results gathering, and infrastructure termination. Thus, distributing the experimentation notebook and needed data should be enough to reproduce the experiment.
Publication History
Received: 13 February 2020
Accepted: 09 April 2020
Article published online:
10 August 2020
© 2020. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial-License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 Donoho DL, Maleki A, Rahman IU, Shahram M, Stodden V. Reproducible research in computational harmonic analysis. Comput Sci Eng 2009; 11 (01) 8-18
- 2 Freedman LP, Cockburn IM, Simcoe TS. The economics of reproducibility in preclinical research. PLoS Biol 2015; 13 (06) e1002165
- 3 Baker M. Is there a reproducibility crisis? A nature survey lifts the lid on how researchers view the ‘crisis’ rocking science and what they think will help. Nature 2016; 533 (7604): 452-455
- 4 European Commission. Open Innovation Open Science Open to the World. European Commission; 2016
- 5 Goals of research and innovation policy. European Commission; . Accessed March 5, 2019 at: https://ec.europa.eu/info/research-and-innovation/strategy/goals-research-and-innovation-policy
- 6 European Open Science Cloud (EOSC). European Commission; . Accessed March 5, 2019 at: https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
- 7 BITSS. Available at: https://www.bitss.org/ . Accessed December 5, 2019
- 8 Public Library of Science. Available at: https://www.plos.org/ . Accessed December 5, 2019
- 9 COS. Available at: https://cos.io/ . Accessed December 5, 2019
- 10 Onedata. Available at: https://onedata.org/#/home . Accessed March 5, 2019
- 11 Chillarón M, Vidal V, Verdú G. CT image reconstruction with suite sparse QR factorization package. Radiat Phys Chem 2020; 167: 108289
- 12 Reader AJ, Ally S, Bakatselos F. , et al. One-pass list-mode em algorithm for high-resolution 3-d pet image reconstruction into large arrays. IEEE Trans Nucl Sci 2002; 49: 693-699
- 13 Giménez-Alventosa V, Antunes PC, Vijande J, Ballester F, Pérez-Calatayud J, Andreo P. Collision-kerma conversion between dose-to-tissue and dose-to-water by photon energy-fluence corrections in low-energy brachytherapy. Phys Med Biol 2017; 62 (01) 146-164
- 14 Wilkinson MD, Dumontier M, Aalbersberg IJ. , et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 2016; 3: 160018
- 15 RunMyCode.org. Available at: http://www.runmycode.org/ . Accessed March 5, 2019
- 16 CodeOcean. Available at: https://codeocean.com/ . Accessed December 5, 2019
- 17 Ipol. Available at: https://www.ipol.im/ . Accessed March 5, 2019
- 18 Nüst D, Konkol M, Schutzeichel M. , et al. Opening the publication process with executable research compendia. Dlib Mag 2017; 23 (1/2): 1082-9873
- 19 Kluyver T, Ragan-Kelley B, Pérez F. , et al. Jupyter notebooks—a publishing format for reproducible computational workflows. Concurr Comput 2016
- 20 Galaxy. Available at: https://galaxyproject.org . Accessed: March 5, 2019
- 21 Calatrava A, Romero E, Caballer M, Moltó G, Alonso JM. Self-managed cost-efficient virtual elastic clusters on hybrid cloud infrastructures. Future Gener Comput Syst 2016; 61: 13-25
- 22 Caballer M, Blanquer I, Moltó G, de Alfonso C. Dynamic management of virtual infrastructures. J Grid Comput 2015; 13: 53-70
- 23 Open Science Framework. Available at: https://osf.io . Accessed: March 5, 2019
- 24 Wolstencroft K, Owen S, Krebs O. , et al. SEEK: a systems biology data and model management platform. BMC Syst Biol 2015; 9 (01) 33
- 25 REANA. Available at: http://www.reanahub.io/ . Accessed November 4, 2019
- 26 Stencila. Available at: https://stenci.la . Accessed March 5, 2019
- 27 de Alfonso C, Caballer M, Calatrava A, Moltó G, Blanquer I. Multi-elastic datacenters: auto-scaled virtual clusters on energy-aware physical infrastructures. J Grid Comput 2019; 17 (01) 191-204
- 28 EOSC portal. Available at: https://marketplace.eosc-portal.eu/services/ . Accessed October 30, 2019
- 29 Ansible. Available at: https://www.ansible.com . Accessed April 4, 2019
- 30 OpenNebula. Available at: https://opennebula.org/ . Accessed: May 17, 2019
- 31 Rawla P. Epidemiology of prostate cancer. World J Oncol 2019; 10 (02) 63-89
- 32 Wu X, Reinikainen P, Kapanen M, Vierikko T, Ryymin P, Kellokumpu-Lehtinen PL. Dynamic contrast-enhanced imaging as a prognostic tool in early diagnosis of prostate cancer: correlation with PSA and clinical stage. Contrast Media Mol Imaging 2018; 2018: 3181258
- 33 Bratan F, Niaf E, Melodelima C. , et al. Influence of imaging and histological factors on prostate cancer detection and localisation on multiparametric MRI: a prospective study. Eur Radiol 2013; 23 (07) 2019-2029
- 34 Le JD, Tan N, Shkolyar E. , et al. Multifocality and prostate cancer detection by multiparametric magnetic resonance imaging: correlation with whole-mount histopathology. Eur Urol 2015; 67 (03) 569-576
- 35 Kety SS. The theory and applications of the exchange of inert gas at the lungs and tissues. Pharmacol Rev 1951; 3 (01) 1-41
- 36 Tofts PS, Wicks DA, Barker GJ. The MRI measurement of NMR and physiological parameters in tissue to study disease process. Prog Clin Biol Res 1991; 363: 313-325
- 37 Brix G, Semmler W, Port R, Schad LR, Layer G, Lorenz WJ. Pharmacokinetic parameters in CNS Gd-DTPA enhanced MR imaging. J Comput Assist Tomogr 1991; 15 (04) 621-628
- 38 Larsson HBW, Stubgaard M, Frederiksen JL, Jensen M, Henriksen O, Paulson OB. Quantitation of blood-brain barrier defect by magnetic resonance imaging and gadolinium-DTPA in patients with multiple sclerosis and brain tumors. Magn Reson Med 1990; 16 (01) 117-131
- 39 Tofts PS, Kermode AG. Measurement of the blood-brain barrier permeability and leakage space using dynamic MR imaging. 1. Fundamental concepts. Magn Reson Med 1991; 17 (02) 357-367
- 40 Donahue KM, Weisskoff RM, Burstein D. Water diffusion and exchange as they influence contrast enhancement. J Magn Reson Imaging 1997; 7 (01) 102-110
- 41 Flouri D, Lesnic D, Sourbron SP. Fitting the two-compartment model in DCE-MRI by linear inversion. Magn Reson Med 2016; 76 (03) 998-1006
- 42 Rene Brun and Fons Rademakers. Root—an object oriented data analysis framework. Nucl Instrum Methods Phys Res A 1997; 389 (1–2): 81-86
- 43 Liu X, Comtat C, Michel C, Kinahan P, Defrise M, Townsend D. Comparison of 3-d reconstruction with 3D-OSEM and with fore+OSEM for pet. IEEE Trans Med Imaging 2001; 20 (08) 804-814
- 44 Singh S, Kalra MK, Hsieh J. , et al. Abdominal CT: comparison of adaptive statistical iterative and filtered back projection reconstruction techniques. Radiology 2010; 257 (02) 373-383
- 45 Shepp LA, Vardi Y. Maximum likelihood reconstruction for emission tomography. IEEE Trans Med Imaging 1982; 1 (02) 113-122
- 46 Goo JM, Tongdee T, Tongdee R, Yeo K, Hildebolt CF, Bae KT. Volumetric measurement of synthetic lung nodules with multi-detector row CT: effect of various image reconstruction parameters and segmentation thresholds on measurement accuracy. Radiology 2005; 235 (03) 850-856
- 47 Ravenel JG, Leue WM, Nietert PJ, Miller JV, Taylor KK, Silvestri GA. Pulmonary nodule volume: effects of reconstruction parameters on automated measurements—a phantom study. Radiology 2008; 247 (02) 400-408
- 48 Hu Y-H, Zhao B, Zhao W. Image artifacts in digital breast tomosynthesis: investigation of the effects of system geometry and reconstruction parameters using a linear system approach. Med Phys 2008; 35 (12) 5242-5252
- 49 Lyra M, Ploussi A. Filtering in SPECT image reconstruction. Int J Biomed Imaging 2011; 2011: 693795
- 50 Salvat F, Fernández-Varea JM, Sempau J. Penelope. A Code System for Monte Carlo Simulation of Electron and Photon Transport. Issy-Les-Moulineaux: OECD Nuclear Energy Agency; 2014