MALDI-MS Analyses of Natural Products
MALDI technique
MALDI is a soft ionization technique initially developed for macromolecule analyses, which is greatly expanding due to its advantages, although more research is required to understand the processes involved, primarily the reactions in the ionization steps and fragmentation, especially for methods with a high energy transfer [11 ], [12 ], [13 ]. Basically, the ionization processes in MALDI can be explained through three basic steps: 1) the incorporation and isolation of analytes in a matrix, 2) the excitation of the matrix producing the plume by physics desorption/ablation, and 3) the ionization of the analytes by ion-molecules reactions [11 ], [12 ], [13 ]. There are different models that try to explain the MALDI ionization process, but they are crude, and no method is universally accepted. One widely discussed model is the cluster model, originally called the “lucky survivors” model. One common agreement is the function of the matrix, which includes the isolation of analyte molecules, the absorption of laser energy, the co-desorption of analytes and the charge transfer. In addition, the matrix can be useful to protect the analyte molecules, reducing their in-source dissociations [3 ].
After laser energy deposition, photons from the laser are absorbed by the matrix molecules and a conversion of most of the energy to heat occurs. Subsequently, the matrix-analyte solid disintegrates and a plume is produced, in which the secondary chemical reactions occur, promoting the ionization of the analytes by charge transfer. It can be of three types: proton, electron, and cation transfer. These ion-molecule reactions happen in the plume and they are reduced significantly with the plume expansion, thus the secondary reactions did not run to completion and many matrix ions are not neutralized, being observed in the spectra. The high-plume density is an important parameter to increase the secondary reactions and to reduce the matrix ions in the spectra. It can be targeted by a higher laser fluence (related to the concentration of primary ions) [3 ], [11 ], [12 ], [13 ]. Besides laser intensity, the secondary plume reactions can be additionally controlled by analyte concentration and matrix choice [3 ], [11 ], [12 ].
The matrix excitation occurs electronically by ultraviolet lasers or vibrational stages by infrared lasers. And there are different kind of used lasers in MALDI, such as nitrogen, Nd : YAG, CO2 , Er : YAG, ArF, KrF, and others. The lasers work using different wavelengths and amount of single-photon energy that influences the ionization efficiency, mainly for LDI analyses (without matrix). The most common lasers are nitrogen (337 nm, single-photon energy of 3.6 V) and frequency-tripled Nd : YAG (355 nm, single-photon energy of 3.3 V). Different to the nitrogen laser, Nd : YAG has a higher laser frequency, enabling very fast data acquisition, and the Gaussian energy profile (not equally distributed) that could represent some problems of sensitivity and resolution. However, a modified Nd : YAG laser was developed without this disadvantage, showing a well-structured energy profile [13 ], [14 ], [15 ].
Moreover, the choice of mass analyzer is an important decision to get good results according its specific features, which include the following: resolution, mass accuracy, mass range extension, sensitivity, dynamic range, quantification, speed, and handling. The most common analyzers applied for MALDI are IT, quadrupole (Q), Orbitrap, FT-ICR, and TOF, and there also are hybrid mass spectrometers from these analyzers. As important reviews and books have described the analyzers [6 ], [16 ], [17 ], an extensive description is not shown here, and only the special applications in the area are reported. In addition, there are mass spectrometers coupled to ion mobility spectrometry; an electrophoretic technique that separates the ions based on their mobilities in the gas phase, allowing the chemical study of conformations, separation of isomers, and isobars ions. The ion mobility spectrometry is a theme of many review articles, describing the mass spectrometers available, their advantages, and disadvantages [18 ], [19 ].
Different conventional matrices have been used for several purposes, such as DHB, CHCA, DHAP, SA, 4NA, THAP, nicotinic acid, picolinic acid, ferulic acid, and others, but few matrices were well characterized and many points are still unclear, for example, a deep explanation about their efficiencies, the chemical species produced in the plume for each matrix, and the information about the physics and chemistry properties of plume expansion and reactions [12 ], [20 ]. The good features for matrices are linked to their solubility, absorptivity, reactivity, volatility, and desorption and a considerable number of reports provide details on preparation methods of different matrices, which includes dried droplet, crushed crystal, fast evaporation, overlayer method, spin coating, and electrospray [20 ]. Some matrices have known applications, such as for oligonucleotides, proteins, lipids, polymers, and carbohydrates ([Table 1 ]), while indication guides for specific natural product classes have not been reported [11 ] that stimulates more studies in this area, since there are innumerous advantages of MALDI (described on next item). The main analyses of natural products by MALDI, including the applied matrices, are summarized in [Tables 1 ] and [2 ]. Another important point for choice matrix is variable amounts of energy are transferred to analytes, which depends on the matrix and can represent different effects on ion fragmentation due to changes of their internal energy [21 ], but these effects are underexplored, mainly for natural products.
Table 1 Typical commercial MALDI matrices and their applications and characteristics.
Matrix
MW (Da)
Excitation wavelength
Applications
Natural product classes analyzed
MW: molecular weight
9-Aminoacridine (9-AA)
194
337
Peptides, sugars, amino acids, sulfated sugars, phospolipids
Flavonoid, naphtodianthrone, phloroglucinol, glucosinolate
α -Cyano-4-hydroxycinnamic acid (CHCA)
189
337, 355
Peptides, lipids, nucleotides, small proteins, oligosaccharides, sugars
Flavonoid, isoflavonoid, aflatoxin, alkaloid, anthocyanin, tannin, curcuminoid, rotenoid, phenylpropenoid, saponin, spirolide, theaflavin, thearubigin
2,5-Dihydroxybenzoic acid (DHB)
154
266, 337, 355
Peptides, small proteins, carbohydrates, glycans, glycopeptides, glycoproteins, sugars, lipids, nucleotides, oligonucleotides, oligosaccharides
Acetogenin, alkaloid, anthocyanin, carotenoid, tannin, curcuminoid, flavonoid, limonoid, phospholipid, phenylpropenoid, saponin, spirolide
Dithranol (DIT)
226
337, 355
Polymers, Lipids
Tannins
2-(4-Hydroxyphenylazo) benzoic acid (HABA)
242
266, 337
Protein mixtures, negatively charged proteins, glycoproteins, polymers, oligosaccharides
Flavonoid, isoflavonoid
3-Hydroxy picolinic acid (3-HPA)
139
266
Oligonucleotides
Alkaloid, flavonoid, isoflavonoid
4-Nitroaniline (4NA)
138
337
Peptides, lipids, oligosaccharides, phosphatidylcholines
Flavonoids
Sinapinic acid (SA)
224
266, 337, 355
Proteins (MW higher than 10 kDa), glycoproteins, hydrophobic proteins
Alkaloid, tannin, flavonoid, spirolide
2,4,6-Trihydroxy-acetophenone (THAP)
168
337, 355
Nucleic acids, proteins contaminated with salts, acidic peptides, dendrimers, acidic glycans
Alkaloid, anthocyanin, curcuminoid, flavonol, isoflavonoid, glycoalkaloid, tannin, theaflavin, thearubigin
Table 2 MALDI MS applications for examination of natural products.
Classes of natural products
Source
Matrix
Study goal
Ref.
* Based on MS/MS data
Acetogenins
Annona muricata
DHB
Quantification
[40 ]
Aflatoxins
Peanuts
CHCA
Screening
[63 ]
Alkaloid
Urine
CHCA, DHB
Quantification
[35 ]
Alkaloids
Corydalis yanhusuo, Coptis chinensis and Aconitum Carmichaeli
CHCA and DHB
Metabolite profiling and identification
[56 ]
Alkaloids
Aconitum carmichaeli
SA, DHB and CHCA
Metabolite profiling and quantification
[57 ]
Alkaloids
Strychnos nux-vomica
SA, THAP, 3-aminoquinoline (3-AMQ), 3-hydroxy picolinic acid (3-HPA), CHCA and DHB
Metabolite profiling
[58 ]
Alkaloids
Sinomenium acutum
CHCA, SA, and DHB
Identification and differentiation of herbal samples
[60 ]
Alkaloids
Berberis baradana
Matrix-free
Identification
[80 ]
Amino acid
Solanum melongena
DHB
Spatial analysis*
[109 ]
Anthocyanins
Vaccinium corymbosum
THAP
Identification and quantification (comparison between HPLC and MALDI)
[36 ]
Anthocyanins
Red wine, fruit juice, and syrup
THAP
Identification and quantification
[37 ]
Anthocyanins
Arabidopsis thaliana
caffeic acid (CAF), ferulic acid (FER), DHB, CHCA and THAP
Quantification
[38 ]
Anthocyanins
Oryza sativa
DHB
Identification and spatial analysis*
[168 ]
Carotenoids
Lycopersicon esculentum, Arabidopsis thaliana and Capsicum species
DHB
Metabolite profiling and identification
[64 ]
Carotenoids
Citrus reticulata and Citrus sinensis
DHB
Profiling and identification*
[136 ]
Condensed tannins
Pinus pinaster and Pinus radiata
DHB
Profiling and identification
[65 ]
Condensed tannins
Salix alba, Picea abies, Tilia cordata and Fagus sylvatica
CHCA, SA, DIT, 3-β -indole acrylic acid (IAA) and DHB
Metabolite profiling and identification*
[66 ]
Condensed tannins
Quebracho wood, mimosa, and cacao
DHB
Fragmentation study*
[128 ]
Condensed tannins
Theobroma cacao
DHB
Identification and fragmentation study*
[153 ]
Condensed tannins
Cinnamomum zeylanicum
DHB
Identification*
[154 ]
Condensed tannins
Eugenia dysenterica
DHB
Identification*
[155 ]
Condensed tannins
Anadenanthera colubrina, Commiphora leptophloeos and Myracrodruon urundeuva
DHB
Identification*
[156 ]
Condensed tannins
Pityrocarpa moniliformis
DHB
Identification*
[157 ]
Condensed tannins
Prunus dulcis
DHB
Qualitative profiling and identification*
[158 ]
Curcuminoids (diarylheptanoid)
Curcuma longa
CHCA, DHB and THAP
Detection and quantification
[42 ]
Diverse classes
Psoralea corylifolia
oxidized carbon nanotubes
Identification
[50 ]
Flavonoids
Onion and green tea
THAP, HABA, DHB, and CHCA
Targeted analysis and identification*
[68 ]
Flavonoids
Lychnophora species
Matrix-free
Identification and spatial analysis*
[74 ]
Flavonoids
Apple (delicious golden)
CHCA and DHB
Spatial analysis
[96 ]
Flavonoids
Purified compounds
CHCA and DHB
Fragmentation studies*
[122 ]
Flavonoids
Isolated standards
DHB
Ion Cluster formation studies*
[124 ]
Flavonoids
Isolated standards
CHCA, DHB, 4NA, vanillin (VAN), nicotinic acid (NI), SA, SAM, LiDHB and NALDI
Ionization and in-source dissociation mechanisms*
[22 ]
Flavonoids
Isolated standards
Norharman
Fragmentation and in-source dissociation studies*
[125 ]
Flavonoids, fatty acids, and others
Arabidopsis thaliana, Drosophila melanogaster, Acyrthosiphon pisum
1,8-bis(dimethylamino)naphthalene (DMAN)
Targeted analysis and identification
[72 ]
Flavonoids, naphtodianthrones and phloroglucinols
Arabidopsis thaliana and Hypericum species
9-AA
Spatial analysis
[107 ]
Flavonols
Prunus dulcis
THAP
Identification and quantification
[39 ]
Glucosinolates
Arabidopsis thaliana
9-AA
Spatial analysis
[106 ]
Glycoalkaloids
Solanum tuberosum
THAP
Quantification
[34 ]
Hydrolysable tannins
Chinese gall
DHB, THAP, and CHCA
Identification*
[59 ]
Hydrolysable tannins
Castanea sativa, Caesalpinia spinosa, Quercus infectoria (galls)
DHB
Fragmentation study*
[137 ]
Hydrolysable tannins
Rosa chinensis
Matrix-free
Identification*
[159 ]
Hydrolysable tannins
Mangifera indica
DHB
Identification*
[160 ]
Hydrolysable tannins
Purified compounds
DHB
Identification and evaluation of cationization reagents*
[161 ]
Hydrolysable tannins
Astronium urundeuva and Astronium graveolens
DHB
Identification*
[162 ]
Isoflavonoids
Soy beans
HCCA, THAP, DHB, HABA
Targeted analysis and identiification*
[135 ]
Limonoids
Azadirachta indica and Melia azedarach
DHB
Detection and identification
[207 ]
Lipids
Saccharomyces cerevisiae
DHB
Targeted analysis and identification
[67 ]
Lipids
Isolated standards
DHB
Fragmentation study*
[138 ]
Phenylpropenoid and flavonoids
Scutellaria barbata, Angelica sinensis and Scutellaria baicalensis
CHCA, DHB, graphene and graphene oxide
Metabolite profiling and ionization studies*
[25 ]
Phenolics
Different Lichen species
Matrix-free
Dereplication studies
[208 ]
Phospholipids
Egg yolk
DHB
Identification
[77 ]
Polyphenols
cranberry, grape, sorghum, and pomegranate
IAA
Identification*
[163 ]
Quaternary alkaloids
Corydalis yanhusuo (rhizoma)
DHB
Identification and quantification*
[33 ]
Rotenoids
Brassica napus
CHCA
Quantification*
[43 ]
Saponins
Balanites aegyptiaca
DHB
Metabolite profiling
[51 ]
Saponins
Panax ginseng and Panax quinquefolius
CHCA, SA and DHB
Identification and differentiation of the species*
[55 ]
Saponins
Bacopa monnieri
CHCA
Comparison of methods and identification
[134 ]
Saponins
Quillaja saponaria
DHB
Targeted analysis*
[146 ]
Saponins
Holothuria forskali
CHCA
Identification and spatial analysis*
[147 ]
Saponins
Holothuria lessoni
CHCA
Structural elucidation*
[150 ], [151 ]
Saponins and triterpenes
Centella asiatica
CHCA
Identification
[78 ]
Spirolides
Phytoplankton
CHCA, DHB, and SA
Identification and quantification*
[41 ]
Steroidal lactones and alkaloid
Withania somnifera and Nicotiana tabacum
Matrix free
Screening*
[62 ]
Steroids and lipopeptides
Standards
Coumarins
Evaluation of matrix efficiency
[209 ]
Theaflavins, thearubigins
Yunnan black tea
THAP, CHCA
Identification
[169 ]
Thearubigins and flavan-3-ol derivatives
Black tea leaves
DHAP
Structural elucidation
[170 ]
Undermined
Echinacea species
CHCA, SA
Metabolite profiles, differentiation of species
[61 ]
LDI analyses without a matrix have also been described, mainly for conjugated compounds, which is possible for ionization without a matrix since they can absorb the laser radiation. Some natural products were already analyzed by LDI, such as flavonoids and carotenoids, but the role of the matrix is fundamental in reducing the in-source dissociation as well as increasing the efficiency of ionization and thereby increasing sensitivity [22 ], [23 ], as confirmed in the investigation of ionization and in-source dissociation of twenty-six flavonoids [22 ] and others studies (section Application – for more examples and details).
Drawbacks and advantages
The main disadvantages reported for MALDI analyses include the background matrix ions, which are observed in the mass range used for small compounds (< 1000 Da), the possibility of sample photodegradation, the in-source dissociations of analytes, and the difficulty in working on line with liquid chromatography, which is not yet commonly applied but yields additional information about isomers [6 ], [11 ]. Some alternatives have been described to reduce this background, for example, carbon nanotubes, ionic liquids, graphene, surfactants with traditional matrices, and a low proportion of matrix (matrix : analyte) [24 ], [25 ], [26 ]. In addition, there have been great efforts to reduce the fragmentation of analytes in the source [22 ]. This information is relevant for improving MALDI results and applications in many study areas, such as metabolomics, dereplication, quantification, biological fingerprinting analyses, and IMS, stimulating its utility in natural products research. The low reproducibility is another drawback already reported, but several approaches have been described to improve it, many of which are related to crystal homogeneity in different matrices, sample preparation methods, and/or increasing the number of runs per sample [24 ], [26 ], [27 ], [28 ], [29 ], [30 ]. The methods to improve the cocrystallization of the analyte-matrix mixture include procedures such as fast evaporation preparation, electrospray sample deposition, and others [6 ], [26 ], [31 ], [32 ]. So MALDI has also been successfully applied to quantitative studies for alkaloids [33 ], [34 ], [35 ], anthocyanins [36 ], [37 ], [38 ], flavonoids [39 ], acetogenins [40 ], spirolides [41 ], curcuminoids [42 ], and rotenoids [43 ].
From new technologies overcoming the bottlenecks of MALDI analyses, the scientific community can have access to all its benefits and the application of it can be enlarged in the natural products area, mainly because the MALDI technique exhibits several advantages. They include the ability to analyze complex mixtures with less ion suppression compared to ESI, high sensitivity, high tolerance of salts and contaminants, low sample consumption, high throughput, simple and rapid sample preparation, low time consumption to obtain the spectra (≈ 60 s), and the production of singly charged species [6 ], [27 ].
Applications
MALDI-MS has been applied to determine the molecular weight of some natural products, to identify the chemical structures, as well as for metabolomic studies, quantification and others, however, recently it has also been used to establish tissue distribution of the metabolites by MALDI imaging [6 ], [25 ], [27 ], [31 ] ([Fig. 2 ]).
Fig. 2 Some applications of MALDI in natural products. (Color figure available online only.)
Although there are several applications of MALDI in natural products chemistry, the higher number of published articles is related to its use for molecular weight determination by low or high resolution. Several glycosylated and non-glycosylated secondary metabolites have been analyzed, such as hydrolysable and condensed tannins, anthocyanins, alkaloids, flavonoids, saponins, rotenoids, carotenoids, xanthophylls, glycosylated triterpenes, theaflavins, thearubigins, phenolics, sesquiterpene lactones, steroids, diterpenes, sesterterpenes, cyanogenic glycosides, and others ([Fig. 3 ], [Table 2 ]) [22 ], [23 ], [36 ], [37 ], [44 ], [45 ], [46 ], [47 ], [48 ], [49 ], demonstrating its huge applicability and ability to analyze secondary metabolites. Among glycosides analyzed by MALDI-MS from the period 1999 to 2010 [44 ], [45 ], [46 ], [47 ], [48 ], [49 ], saponins, steroids, triterpenes, and flavonoids are the glycosylated metabolites predominantly described ([Fig. 4 ]).
Fig. 3 Some examples of natural products analyzed by MALDI: cyanogenic glycosides (1, 2 ), triterpenoid saponins (3, 4 ), aflatoxins (5, 6 ), carotenoid (7 ), flavones (12, 13 ), typical linear condensed tannins possessing B-type linkage (8, 9 ), rotenoids (14, 15 ), sesterterpenes (16, 17 ), and theaflavins (18, 19 ).
Fig. 4 Application of MALDI-MS for the study of glycosylated natural products from the period 1999–2010. (Color figure available online only.)
New studies have shown successful applications of MALDI for metabolic profiling in comparison to other techniques [27 ], [50 ], [51 ], for example, the analyses of fractions from Psoralea corylifolia L. (Fabaceae). They were analyzed by different analytical methods, such as LC-DAD, LC-APCI-MS, and MALDI-TOF MS, with the use of oxidized carbon nanotubes as the matrix. The LC-APCI-MS increased the number of detected substances compared to the UV detector because some peaks were observed only by APCI-MS and coeluting peaks could be identified. MALDI-TOF MS showed a high ability to detect the substances, even without previous chromatographic separation, and with sensitivity for low mass substances. In this study, a total of 188 components were identified from enriched fractions using all the techniques, 65 % of all identified components could be identified by MALDI, and almost 50 % of them were identified only by MALDI analyses [50 ]. This superiority of MALDI for metabolic profiling studies compared with LC-ESI-MS was also verified from analyses of Balanites aegyptiaca (L.) Del. (Zygophyllaceae) extracts and the identification of additional saponins was only observed by MALDI analyses [51 ]. Thus, the clear superiority for the detection and identification of compounds in a complex sample has been confirmed for it. Other interesting MALDI applications include structural identification, chemical screening, single plant cell analyses, molecular interactions with target molecules, and the determination of the identity of medicinal plants.
Although the technique shows some drawbacks for quantification studies, Wang and collaborators [36 ] compared the quantification by HPLC-DAD and MALDI-TOF MS of glycosylated anthocyanins ([Fig. 5 A ]) and demonstrated the substantial potential of high-throughput MALDI quantification. The MALDI technique proved to be faster at accurately identifying and quantifying the anthocyanins; therefore, MALDI-TOF is a faster alternative to HPLC analyses. In addition, the quantification of glycosylated flavonols ([Fig. 5 B ]) by MALDI-TOF MS was also highly correlated with the results obtained by HPLC-UV, which indicates that MALDI-TOF is a valid system for these evaluations [39 ]. The important tools for quantification by MALDI are the high repetition rate laser, which improves precision and sensitivity, and the use of an internal standard to average out variations in instrument response. Ideally, the internal standards must be chemically similar to the target substances, and the best approach is the use of the same target substance that is isotopically labeled [31 ], [52 ], [53 ].
Fig. 5 Glycosylated anthocyanins (A ) and flavonols (B ) quantified by MALDI-TOF MS.
In addition, MALDI-MS has also been applied to identify and classify microorganisms based on proteomic fingerprints [54 ]. This strategy was demonstrated by Ernst and collaborators [27 ] upon examination of low mass (< 1200 Da) metabolites, but using plant extracts. In this paper, the first protocol for creating a metabolic fingerprint of plants by MALDI-TOF MS was proposed. Three different MALDI matrices and subsequent multivariate data analysis by in-house algorithms implemented in the R environment were employed to taxonomically classify plants from different genera, families, and orders. Initially, analyses without or with only one matrix were performed, but the results did not provide sufficient chemical information to correctly classify the plants. Then, several matrices were evaluated to select the best matrices that yield a higher number of ionized compounds and have less dissociation in the source in both ion modes. 4NA and CHCA matrices were selected for negative and positive ion modes, respectively. However, the nonpolar compounds, such as some triterpenes and diterpenes, were not ionized using these common matrices, and they could only be ionized with the LiDHB matrix, a synthesized matrix. Thus, the chemical information from plant extracts was enlarged with the nonpolar compounds that, using careful algorithms and parameter selections, allowed a close taxonomic classification with 92 % similarity to the taxonomic classifications found in the literature [27 ].
Recently, another relevant MALDI-MS application was reported to identify different plants. It was also successfully used to differentiate the herbs Panax ginseng C. A. Meyer and Panax. quinquefolium L. (Araliaceae) because they have similar chemical and physical properties, including a problematic botanic differentiation, but with substantially different therapeutic effects that highlight the importance of a correct identification. The methodology allowed the unambiguous differentiation between the two species, required a small quantity of material, and was fast, robust, and simple [55 ]. Consequently, MALDI-MS can potentially characterize adulterants within the plants and perform rapid dereplication and quantification studies. The characterization of medicinal plants by MALDI-MS has been described for other plants, such as for the species Aconitum carmichaeli Debx. (Ranunculaceae), Corydalis yanhusuo W. T.Wang (Papaveraceae), Echinacea species (Asteraceae), and others [56 ], [57 ], [58 ], [59 ], [60 ], [61 ]. MALDI can also be applied directly to the analysis of the powdered plant material [62 ], representing an important tool for quality control, since secondary metabolites have been successfully screened by MALDI-MS, such as aflatoxins [63 ], saponins [51 ], anthocyanins [36 ], [37 ], carotenoids [64 ], tannins [59 ], [65 ], [66 ], lipids and phospholipids [67 ], flavonoids [68 ], and others ([Fig. 4 ]). In general, metabolomics studies can be obtained by MALDI-MS directly from extracts, tissue, or single cells [6 ], [69 ], [70 ], [71 ], [72 ], which highlights the potential to define the tissue distribution of the metabolites by MALDI imaging [73 ], [74 ], [75 ]. This application is extremely useful in metabolomics, representing a promising future in studies such as metabolic compartmentalization [6 ], [69 ], [70 ] (see MALDI imaging section below).
Another promising methodology is TLC combined directly with MALDI-MS to identify compounds. The TLC is an easy and fast technique to separate compound mixtures and it is widely used in natural products laboratories, facilitating its implementation for this purpose [76 ]. TLC-MALDI MS has been applied to analyze phospholipids from chicken eggs [77 ], centellosides from Centella asiatica (L.) Urb. (Apiaceae) [78 ], siderophores from microbial samples [79 ], and alkaloids from Berberis barandana S. Vid. (Berberidaceae) [80 ]. Despite the advantages of TLC-MALDI MS, such as low cost for chromatographic separation, low time consumed, and direct analysis (without the extraction from TLC), its use has been restricted to natural products chemistry due to the yet low expansion of the technique [76 ].
The TLC-LDI-MS (analysis without the matrix) was applied to the analysis of quaternary protoberberine alkaloids from B. barandana ([Fig. 6 A ]). The compounds suffered in-source dissociations [80 ], which were likely intensified due to the matrix absence; therefore, more studies are required to understand the matrix role for secondary metabolites analyses and their chemical reactions in the source, such as the evaluations performed with aromatic carboxylic acids and flavonoids [22 ], [81 ], [82 ]. The in-source dissociations of twenty-six flavonoids (glycosylated and non-glycosylated), including flavanones, flavones and flavonols ([Fig. 6 B ]–[D ]), were evaluated without a matrix and with different matrices. The in-source dissociations, the influence of laser intensity, and the applied matrix type were investigated. The flavonoid O -glycosides eliminated the sugar in-source, even in the presence of the matrix and in both ion modes, producing radical ions ([M – H-sugar]−• ) in the negative ion mode, similar to ESI [83 ]. The MALDI conventional matrix application reduced the in-source fragmentation of C - and O -glycosylated flavonoids, but they were not eliminated. In addition, the methyl radical losses from methoxylated flavonoids were completely eliminated only with the LiDHB matrix; besides this, all the retro Diels-Alder fragmentations in the source were also eliminated [22 ]. Thus, the use of a specific matrix may be required, depending on the drawn objectives and the type of molecular target.
Fig. 6 Natural products analyzed by MALDI: quaternary protoberberine alkaloids (A ) and some flavonoids (B flavanones, C flavones, and D flavanols).
Moreover, the screening and selection of bioactive compounds from extracts can be performed by MALDI-MS, including the search for specific enzymatic inhibitors and the establishment of the enzyme reaction kinetics constants [84 ], [85 ]. A quantitative method by MALDI-FT MS was developed and used to analyze the product of the acetylcholinesterase enzymatic reaction and the potential inhibitors from Rhizoma Coptidis extracts. The assay interferents (acetylcholinesterase and cholinesterase) were eliminated by a high-resolution analyzer [85 ]. In addition, carbon nanotubes were used as a matrix to obtain the fingerprint spectra of Angelica sinensis (Oliv.) Diels Radix (Apiaceae) after metabolism with liver homogenate, and the quantitative differences of each metabolized compound were also studied [86 ]. The carbon nanotubes have been recently applied as a MALDI matrix for analyses of small compounds because of the absence of background matrix ions and efficiency in the ionization, but their low solubility in organic solvents and water, as well as ionization problems due to the presence of impurities (graphite pieces, metal particles, and amorphous carbon), in raw products and the contamination of the ion source represent the main disadvantages. However, the oxidation and chemical functionalization of carbon nanotubes has improved some of these problems, facilitating the sample preparation, since the water solubility is higher, and improving the efficiency of desorption/ionization analytes and reproducibility [87 ], [88 ], [89 ]. Finally, the adduction profiles of quinone-thioether metabolites by cytochrome C have also been established by MALDI-TOF [90 ], as well as the identification of protein interactions, such as for hydrolysable and condensed tannins with bovine serum albumin [91 ], [92 ]. Thus, the biological fingerprinting analyses by MALDI can be used for screening targets and assisting with studies of absorption, distribution, metabolism, elimination, and toxicity [85 ], [86 ], [93 ].
MALDI imaging
MALDI imaging, another important technique in natural products chemistry, was introduced by Caprioli in 1997 and was initially applied to proteins and peptides [94 ]. The technique directly analyzes tissue using MS, producing maps of the ion tissue distribution. The highlighted advantage is not a necessity of the previous extraction of the compounds for analyses, avoiding the losses of information about spatial distribution [95 ]. This method was first used to analyze animal tissue, but currently includes other tissue analyses, such as plants, microorganisms, and insects [9 ], [95 ], [96 ], [97 ], [98 ]. There are some protocols of sample preparation from animal tissue for protein and peptide analyses; however, no universal methods of sample preparation and data processing for others tissues have been reported, such as for plants [9 ], [94 ]. In addition, plant tissue analyses by MALDI imaging are far from being a routine technique, since there are many difficulties in preparing the samples and the requirements to adapt protocols for specific samples.
IMS has been successfully applied to diverse studies due to its high sensitivity (compounds of very low concentration can be analyzed; attomole to low femtomole range), selectivity (similar compounds can be differentiated), and ability to identify structurally the metabolites, since mass spectrometric data give chemical information useful to identify them [9 ], [69 ], [73 ], [75 ], [99 ]. The high mass accuracy obtained by high-resolution analyzers can be used to differentiate similar compounds that show differences only in their exact masses, as well as a reduction of the interference of matrix ions in the images. In addition, the images of tissue distributions of metabolites produced from MS/MS data have a higher selectivity, since the isomers (compounds with the same molecular formula) with unlike fragmentation pathways can be distinguished, producing images with high reliability [9 ], [73 ], [100 ]. Although IMS has been performed by other ionization methods, such as desorption electrospray ionization (DESI), laser ablation electrospray ionization (LAESI), and secondary ion mass spectrometry (SIMS), most studies applied MALDI and LDI techniques that are explainable because of their advantages, such as good spatial resolutions (around 20 µm) and a high speed of data acquisition (lasers with high frequency) [9 ], [75 ]. The huge challenge of MALDI analysis is still the background matrix ions in the low mass range, the size crystals of the matrix, since it is directly related to spatial resolution, as well as the scarce information about the ionization and in-source dissociation of secondary metabolites to improve the data and result qualities.
The tissue distribution of specific metabolites can be established by MALDI imaging, unlike classical histochemical methods (unspecific) applied for plant tissue, making it possible to distinguish between individual compounds, which can be confirmed by MS/MS data, and it has a high accuracy, which reduces the problems related to isobaric matrix ions and creates more reliable image data. However, most studies did not use MS/MS data and they only operate in full scan mode [9 ]. Some articles, including relevant reviews, have shown the application of MALDI imaging for plant tissues, food, and microorganisms (for example, to elucidate the interaction between them). They described diverse details about sample preparation, spectra acquisition, post-acquisition analysis, and data evaluation related to biosynthesis, spatial dynamic, ecology, physiology, and morphology [9 ], [73 ], [75 ], [101 ], [102 ], [103 ], [104 ], [105 ]. So, new issues are reported here to extend the information not yet addressed in other review articles.
MALDI and LDI imaging have been used for analyses of fruits, leaves, stems, sepals, seeds, roots, tubers, flowers, pollen grains, and rhizomes [9 ], [73 ], [75 ], such as from the leaves of Arabidopsis thaliana (L.) Heynh. (Brassicaceae – glucosinolates) [106 ], Lychnophora species (Asteraceae) [74 ], Hypericum perforatum L. and Hypericum reflexum L. f. (Hypericaceae) [107 ], apples [108 ], eggplant [109 ], and others. The details of MALDI imaging of plant tissue are not described here because excellent review articles have already been published in this area [9 ], [73 ], [75 ], [101 ], [102 ], [110 ], [111 ].
Recently, Bjarnholt and collaborators performed an extensive revision (up to 2013) about IMS from plant tissue, and almost 65 % applied MALDI and LDI for image acquisition using mainly CHCA and DHB as the matrix and cryosectioning to slice the materials [9 ]. The published articles, applying MALDI imaging from plants, between 2014 and 2015 are summarized in [Table 3 ] and some special review articles have been published from analytical strategies for data obtainment, including higher spatial resolution and sample preparation [112 ], [113 ], [114 ], [115 ], [116 ]. In addition, a recent published review summarized the advances of IMS for lipidomics, showing the insights related to the spatial compartmentalization of lipids and their metabolism in plants, and the matrices DHB and 9-AA are predominantly used for these analyses [117 ].
Table 3 Data published between 2014 and 2015 for MALDI imaging from plant materials.
Species
Sample
Class of compounds
Sample preparation
Ref.
Accent Grape
Fruit
Anthocyanins
Cryosectioning (10 µm); DAN matrix
[211 ]
Allium cepa
Bulb
Flavonoids and others
Hand-cut sections; gold nanoparticle as the matrix
[212 ]
Arabidopsis thaliana
Leaves
Glucosinolates
Intact surface; 9-AA matrix (sublimation)
[213 ]
Arabidopsis thaliana
Seedlings
Phospholipids
Intact surface; CHCA and DHB, LDI (comparison)
[214 ]
Arachis hypogaea (peanut)
Skin
Aflatoxins and stilbenoids
Imprinted in silica, matrix-free
[215 ]
Capsicum annuum
Fruit
Alkaloids (capsaicinoids)
–
[216 ]
Citrus sinensis
Leaves and petioles
Flavonoids
Sectioning by microtome (20 µm); CHCA : DHB (1 : 1) matrix
[217 ]
Eucalyptus globulus and E. grandis
Stem
Phenylpropenoids
Hand-cut sections; silica matrix
[218 ]
Glycyrhiza glabra
Rizhome
Flavonoids and saponins
Cryosectioning (20 µm); DHB matrix
[219 ]
Gossypium hirsutum (cotton)
Seed
Polyphenols and others
Cryosectioning (30–50 µm); DHB matrix
[220 ]
Gunnera manicata (symbiosis with Blasia pusilla )
Stem
Nostopeptolides
Sectioning; universal MALDI matrix
[221 ]
Hordeum vulgare
Grain
Sugars
Cryosectioning (30 µm); DHB matrix
[222 ]
Hypericum perforatum, H. olympucum and H. patulum
Leaves
Naphtodianthrone
Intact surface; CHCA matrix
[223 ]
Linum usitatissimum
Flowers
Cyanogenic glucosides and lignans
Cryosectioning (20 µm); DHB matrix
[118 ]
Lupine
Roots (interface soil-root)
Pesticides
CHCA matrix
[224 ]
Lychnophora species
Leaves
Flavonoids
Sectioning by microtome (50 µm); matrix-free
[74 ]
Medicago truncatula (symbiosis with Sinorhizobium meliloti )
Root nodules
Amino acids and others
Cryosectioning (16 µm); DHB matrix
[225 ]
Musa acuminata
Fruit (epidermis)
Phenylphenalenones
Microdissection; matrix-free
[226 ]
Pisum sativum
Seed (pea)
Phenylpropanoids, lignans and pterocarpans
Pea (epidermal layer); DHB matrix
[227 ]
Raphanus sativus
Bulbs and leaves
Anthocyanin and others
Cryosectioning (12 µm); CHCA and DHB matrix
[228 ]
Solanum habrochaites
Leaves
Flavonoids, sugars and glycoalkaloids
Intact surface; matrix-free (carbon)
[229 ]
Soybean
Leaves
Flavonoids and others
Intact surface; matrix-free (2D graphene)
[230 ]
Tomato, apple and nectarine
Cuticle
Fatty acids (chemical hydrolysis of cutin)
Cutin peaces; LiDHB matric
[231 ]
Wheat
Leaves
Fungicides residues
Intact surface; DHB matrix
[210 ]
Zea mays (maize)
Leaves
Flavonoids and phenylpropanoides
Cryosectioning (10 µm); DAN matrix
[232 ]
A critical step in MALDI imaging experiments is the sample preparation, which requires some special care to avoid the degradation or metabolization of the compounds. In addition, the vacuum in the source (not for atmospheric pressure MALDI) makes the analyses of transversal plant sections (in natura ) difficult because of water losses in vacuum due to the tissue contraction, which makes the correlation between the MALDI imaging and anatomical data difficult, as well as the flatness of the tissue that may interfere in the accuracy. Another important point is about the spatial resolution of images, since the higher resolution (few microns) only has been presented by prototype machines produced in specialized research laboratories. The images at resolutions of 10–20 µm are possible for commercial equipment, but the focus laser beam should be considered an experimental parameter and the size of the matrix crystals are also relevant [9 ], [73 ]. Nowadays, the use of MALDI imaging from plants is still limited, but it can give valuable information about surface metabolite and tissue distribution maps, which can assist in understanding the pathway biosynthetic, metabolite translocations, defense of plants, and others [9 ], [73 ], [118 ]. Although there are huge challenges for small compound analyses and plant tissues, MALDI imaging is a technique with many advantages and gives new perspectives in the natural products area, helping to explore and understand diverse issues, such as ecological and physiological.
MALDI-MS/MS to identify natural products
MALDI has been coupled to different analyzers such as quadrupole (Q), IT, orbitrap, TOF, and FT-ICR. The mass accuracy, resolution, m/z range, sensitivity, speed, and other characteristics are different for each analyzer and should be selected to best fit the specific requirements of each experiment [6 ], [16 ], [102 ], [119 ], [120 ]. MS/MS is performed in two stages: the first includes ion precursor isolation and its activation, which subsequently leads to the fragment ions being separated and detected. The activation step involves increasing the internal energies of ions, resulting in the rupture of chemical bonds by homolytic and/or heterolytic fissions. Different ion activation methods are available such as low- and high-energy CID, electron capture dissociation (ECD), blackbody infrared dissociation (BIRD), surface-induced dissociation (SID), ultraviolet photodissociation (UVPD), electron transfer dissociation (ETD), electron-induced dissociation (EID), infrared multiphoton dissociation (IRMPD), and PSD [121 ]; however, not all of the methods are available for the MALDI system, and CID is most often applied.
Although MALDI-MS/MS is widely used to identify and characterize peptides, there is little information on its use for small compound natural products, especially on the in-source fragmentation and the gas reactions for ion activation methods with a high energy transfer. In addition, the influence of the amount of energy transferred in the ionization processes by MALDI and the subsequent fragmentation is poorly understood. Therefore, this review explored the use of MALDI-MS/MS for natural products identification and the principal considerations reported, which stimulates future studies to expand its applications.
The fragmentation of nine flavonoids (two aglycone flavones, one isoflavone, three aglycone flavonols, two O -glycosylated flavonols, and one flavanone) were compared by ESI-QTOF, MALDI-QIT, and MALDI TOF ReTOF; this latter method refers to the curved field reflectron or post-source decay, using the conventional MALDI matrices CHCA and DHB. The fragmentation using low energy, such as ESI-QTOF and MALDI-QIT, was similar, showing many similarities in the spectra related to the product ion and the relative ion signal intensity. MALDI TOF ReTOF applies high energy to ions, since the center-of-mass energy (E
cm ) of the precursor ions produced by MALDI are reaccelerated to 20 000 eV (E
cm ≈ 150–310 eV). Thus, MALDI TOF ReTOF works at a higher energy than ESI-QTOF (accelerated to 30 eV, E
cm = 1–4 eV) and MALDI-QIT (E
cm ≤2 eV) when compared to a specific ion [122 ]. The rare loss of a hydrogen radical from typical flavonoid fragments (for example, from ions produced by retro-Diels-Alder cleavages) was only observed in MALDI TOF ReTOF at high center-of-mass energies and included other radical ions, and no competing fragmentation processes at low center-of-mass energies have been observed [121 ], [122 ]. The high center-of-mass energies have been described to induce charge-remote fragmentation and diverse fragmentation pathways, which may be useful in dereplication studies because they offer additional information for structural elucidation.
Through previous fragmentation studies of standards, compounds such as ferulic acid, wogonin (a flavone), and scutellarin (a glycoside flavone) could be identified in extracts of traditional Chinese medicine herbs by MALDI-MS/MS with graphene and graphene oxide matrices, which improved the limit of detection and reduced the in-source fragmentation. The graphene belongs to the multidimensional carbon nanomaterial family, which is composed of two dimensional layers of sp2 bonded carbon and recently it has been applied as a MALDI matrix for small compound analyses, since a low background interference of matrix ions is also observed in the spectra. In addition, the graphene oxide is easily produced by oxiding graphite, showing hydroxyl and epoxide groups on the base of the structural plane of graphene oxide sheets that confer strong hydrophilic properties and help water dispersion and swelling. The excellent results could be related to their properties such as thermal, electronic, and mechanical [25 ], [123 ]. Specific matrices can reduce the in-source dissociation and significantly influence the final results. However, there is little information about ionization processes, in-source dissociations, and fragmentation by MALDI with high-energy CID, PSD, and LIFT. Fragmentation studies using high-energy CID, PSD, and LIFT are in demand because, as reported by March and coworkers [122 ], only fragmentation data obtained by low-energy CID are similar to ESI and MALDI for flavonoids, and such data are widely available for ESI.
Silva and Lopes [22 ] evaluated the in-source dissociations of various glycosylated and non-glycosylated flavonoids (flavanones, flavones, flavonols) without a matrix and with different matrices in addition to the influence of the laser intensity on these in-source reactions. The flavonoid-matrix cluster ions, which depend on the structure of the flavonoid, were elucidated by MALDI-MS/MS, confirming the formation of cluster ions involving fragments produced by retro-Diels-Alder fragmentation [124 ]. The formation of radical fragments in-source and MS/MS experiments are evident [22 ], [81 ], [122 ], [125 ], and increasing the understanding of these reactions will enlarge the application of MALDI as a tool in high-throughput chemical analysis and the identification of natural products [25 ], [27 ], [51 ], [55 ], [68 ], [126 ].
Although the O -glycosylated flavonoids easily lose the sugar in the source, Wang and Sporns [68 ] thoroughly studied the fragmentation pathway of glycosylated flavonols with up to three glycosides. The best matrix reported by the authors was THAP, because it had good repeatability (spot-to-spot), produced ions in both positive and negative ion modes, and exhibited a high affinity for alkali metals in the ionization process. The main fragment ions were generated from sugar losses, as observed in ESI [127 ]. However, the fragment ions produced from the aglycone were not described [68 ], which is important for its elucidation.
PSD for TOF analyzers and the LIFT system have been employed in MS/MS experiments, and both can be associated with CID to increase the internal energy of the ions, thus increasing fragment ions [128 ], [129 ]. Metastable fragments are produced from ions with excess internal energy during ion acceleration from the source in the free field region, and the reflector voltage is set to detect these fragments in the PSD technique. However, PSD is time consuming, and difficulties exist in mass calibration and the detection of fragments with a low m/z (< 150) [129 ]. In LIFT experiments, a low voltage of 8 kV is applied to a precursor ion for acceleration; the ion is then isolated and raised to a higher potential in the LIFT cell. Subsequently, the fragment ions are reaccelerated toward the detector, and changes in the reflector voltage are not required [129 ]. PSD and LIFT were used to analyze clusters of the CHCA matrix, demonstrating the many advantages of the LIFT technique, with a better detection of ions with a low abundance and mass, and a relatively fast acquisition time [130 ].
Initially, PSD was applied in peptides [131 ], [132 ] and carbohydrate sequencing [133 ], and currently, there are some applications in natural products studies. It was used to identify saponins [134 ], flavonoids [122 ], isoflavones [135 ], carotenoids [136 ], and hydrolysable and condensed tannins [66 ], [137 ] as well as to examine triacylglycerols and phosphatidylethanolamines from plants, algae, and animal tissue [138 ], [139 ], [140 ].
Different carotenes (polyenes without oxygen) and xanthophylls (polyenes with oxygen) were analyzed by MALDI and produced radical ions (molecular ions) through the removal (M•+ ) of one electron ([Fig. 7 ]) [136 ], which is similar to ESI and FAB. The increase of a conjugation extension is a relevant factor for electron loss and radical stability, and produces spectra complexes from sequential in-source homolytic cleavages, as observed in the MALDI source [4 ], [23 ], [141 ], [142 ], [143 ]. The fragmentation patterns of carotenoids and xanthophylls were proposed based on MALDI-PDS data [23 ], [136 ]. The loss of one molecule of water [M-18]•+ confirms the presence of a hydroxyl group (such as for luteolin), and [M-92]•+ and [M-106]•+ ions are diagnostic of toluene and xylene losses, respectively. The intensity relation between these fragment ions (I[M-92]•+ /I[M-106]•+ ) is important to define the presence of double bonds, and when this relation is > 10, the carotenoids contain nine double bonds. The loss of fatty acids ([M-RCOOH]•+ ) is common for carotenoid fatty acid esters (such as for β -cryptoxanthin palmitate) and characterizes their presence in the carotenoid structures, while the epoxy species can be confirmed by the fragment ion [M-80]•+ . Other diagnostic carotenoid fragment ions have been described in detail and are useful for elucidating their chemical structures [23 ], [136 ].
Fig. 7 Chemical structures of some xanthophylls and carotenes analyzed by MALDI-PSD MS/MS.
The O -glycosylated isoflavones from soy [Glycine max L. (Fabaceae)] showed fragments with structural importance by PSD, including the main fragment that occurs through the loss of sugars, which is similar to ESI. In addition, the DHB and THAP matrices were evaluated, and DHB showed better ionization results and a higher number of fragments in MS/MS spectra [135 ]. PSD was also applied in the study of peracetylated isoflavone glycosides from their protonated and cationized different metal ions to understand the gas phase molecule cluster complexes. A reduction in the number of fragments was detected in the order of Li+ > Na+ > Ag+ > Cu+ > H+ > K+ > Rb+ ≈ Cs+
[144 ]. Fragmentation was lower for Ag3
+ clusters than Ag+ , demonstrating a stronger gas phase interaction with Ag3
+
[145 ].
Saponins have been studied using different fragmentation techniques and, in some cases, the fragmentation was applied in dereplication studies. MALDI has limited applications in this area and is sometimes used together with LC-ESI-MS/MS data because more saponins can be identified and detected by MALDI [51 ], [146 ], [147 ]. ESI-IT, MALDI-IT, MALDI-IT/TOF, and MALDI-TOF/TOF, including high-energy CID, were applied for dammarane-type triterpenoid saponins from Bacopa monnieri (L.) Wettst. (Plantaginaceae) [134 ]. In low energy techniques, the spectra were similar for ESI and MALDI ionization, and their product ions are applied to elucidate the sequence and branching of the sugar moieties [134 ], [147 ]. The useful fragment ions from the aglycones and the glycosidic moiety were obtained from high-energy CID, which yielded the same fragments observed by PSD [134 ] and showed a similar charge-remote fragmentation described for high-energy CID for conjugated steroids and others [148 ], [149 ]. These diagnostic fragment ions were also observed by the LIFT system, which assisted in the structural identification of saponins from the sea cucumber Holothuria forskali
[150 ], [151 ].
Hydrolysable and condensed tannins are another group of secondary metabolites analyzed by MALDI due to its advantages, such as sensitivity, the formation of singly charged molecular ions, and lower ion suppression for complex mixture analysis [66 ], [128 ], [137 ], [152 ]. Several articles have reported the potential to identify such compounds by MALDI-MS, which contains different combinations and connectivity between the monomers and produces tannins of the same molecular weight. However, unambiguous identification can only be completed with MS/MS data, establishing the linkage between the specific monomers. Monomer losses involving fragment ions by hydrogen transfer and typical retro-Diels-Alder fragmentations are observed from each oligomer, followed by water elimination, but the radical fragment ions are less visible in the spectra [153 ], [154 ], [155 ], [156 ], [157 ]. These observations were applied to identify condensed tannins of A-type and B-type linkages from different species described from Theobroma cacao L. (Sterculiaceae) [153 ], Salix alba L. (Salicaceae), Picea abies (L.) H. Karst (Pinaceae), Fagus sylvatica L) (Fagaceae), Tilia cordata Mill. (Malvaceae) [66 ], Cinnamomum zeylanicum L. (Lauraceae) [154 ], Eugenia dysenterica DC. (Myrtaceae) [155 ], quebracho wood [128 ], Commiphora leptophloeos (Mart.) J. B. Gillett (Burseraceae), Anadenanthera colubrina (Vell.) Brenan var. colubrina (Fabaceae), Myracrodruon urundeuva Allemão (Anacardiaceae) [156 ], Pityrocarpa moniliformis (Benth.) Luckow & R. W. Jobson (Fabaceae) [157 ], almond [158 ], and others. In addition, it allowed for the identification of a rare polymeric series of up to eight flavan-3-ol units with pentose and hexose sugars by MALDI-TOF/TOF LIFT [153 ].
MS/MS data obtained by MALDI QIT-TOF, PSD, and CID have been applied to determine the chemical structure of hydrolysable tannins from Rosa chinensis Jacq. (Rosaceae) [159 ], Chinese galls [59 ], tara, Turkey gall, chestnut woods [137 ], Mangifera indica L. (Anacardiaceae) [160 ], and others, which are preferentially acquired in the positive ion mode due to the visualization of fewer signal peaks in the negative ion mode for gallotannins [59 ]. However, an extensive application of different matrices must be evaluated to confirm this statement. The hydrolysable tannins were also evaluated by MALDI-TOF/TOF (positive ion mode) with two cationizing agents, sodium and cesium, but significant differences in the fragmentation pathway were not observed [160 ], [161 ]. First, the galloyltannins lost one or more galloyl moieties (152 Da), accompanied by the loss of one or more molecules of H2 O. The molecules of gallic acid (170 Da) or ellagic acid, when present, can be lost with the hydrogen transfer, and a double bond in the glucose is formed [137 ]. Many important fragment ions were not observed by ESI-MS/MS, which could complicate their structural identification [162 ], [163 ], [164 ], [165 ]. A complex fragmentation pattern obtained from < 300 Da is observed, which complements the information to identify them [59 ], [137 ], [160 ]. The fragment ions from sugar core tannins can be observed, occurring by the internal cleavage of the glycosidic ring, with a similar fragmentation already reported for carbohydrates by MALDI-CID [166 ], [167 ].
MALDI-MS is an advantageous technique for macromolecule analyses due to the characteristics described previously, and currently, its application to small molecules is just beginning [24 ], [26 ], [28 ]. Although more research in this area is required, MALDI-MS has demonstrated valorous contributions to secondary metabolites over other analytical techniques [50 ]. Many publications described the use of MALDI-MS for establishing metabolic fingerprints [6 ], [27 ], [168 ], [169 ]. A chemical profile of the compounds and the use of MALDI-MS/MS to identify the chemical structure of natural products are recent and not extensively applied in the identification of tannins (hydrolysable and condensed) and saponins. In addition, few studies by MALDI-MS/MS have been described for a restricted number of natural compounds, such as anthocyanins, carotenoids, xanthophylls, betaines, glycosylated isoflavones, phenolic acids, flavonoids, theaflavins and thearubigins, limonoids, alkaloids, and others, which are sometimes combined with LC-ESI-MS/MS data [23 ], [25 ], [68 ], [74 ], [122 ], [125 ], [126 ], [136 ], [143 ], [153 ], [159 ], [168 ], [169 ], [170 ], [171 ].
All the statements appointed here confirm the relevant MALDI applications in the natural products area, stimulating its applications in wide studies such as the research of biomarkers, hierarchical clustering, taxonomy, imaging from tissues, and others. However, the data processing is fundamental to improve the quality of results and to avoid mistaken conclusions. So, the data analysis can be divided into the steps of preprocessing and statistical analysis. The preprocessing will be described here, since it is extremely important and restrictedly described in the literature.
Data Processing
In the previous section, we have shown a broad range of MALDI ionization applications to natural products. Due to the mass spectrometer hardware and the advances in applications, many authors consider that data analysis is going to play an important role in most of these applications and will become a bottleneck for many inquiry fields [172 ], [173 ].
In many cases, specific techniques are required to analyze a sample and to answer a scientific question, for example, by MALDI imaging or a MALDI-MS/MS data set. Each spectrum in an experimental data set, usually composed of many samples and replicates, should be processed to improve the results and thus guide the study to correct conclusions [174 ]. For this reason, a common modular analysis flow diagram was proposed here, and some applications to MALDI-MS data were described to exemplify the addressed points. Following this flux analysis, here the similarities with MS data obtained by other ionization methods and more generically for other large-scale data sampling techniques are highlighted. The optimization of data analysis steps described below is much more critical to correctly associate patterns in data to the studied biological changes.
In the following, each step presented in [Fig. 8 ] will be briefly explored, highlighting its importance to an unbiased analysis, reviewing applications for MALDI-MS data and software availability. Some steps, such as smoothing and baseline removal, may switch their locations in the analysis flux [175 ]. In [Fig. 8 ], the position of visualization is presented as a parallel procedure to all analysis steps, highlighting the importance to plot spectra summaries before and after preprocessing. Morris and collaborators [175 ] showed how simple heat maps allowed the discovery of unexpected data patterns.
Fig. 8 A flow diagram of the modular analysis of MALDI-MS data. Different modular elements in different ordinations can be used for a specific purpose. (Color figure available online only.)
The intensity of a given m/z value is proportional to the relative abundance of an ion represented by this m/z value [174 ]. Given a set of S spectra containing information from a total of n small molecules, the goal of our analysis flux is to generate an S × m matrix, whose rows correspond to the individual spectrum, and the columns contain some (relative) quantification of a molecule (or mass signal associated with a molecule) [175 ]. This matrix, which represents a multidimensional data set, is generated through the preprocessing described in [Fig. 7 ], and the final goal is to associate changes or patterns in this data set with our studied problem.
According to Ge and Wong [176 ], a mass spectrum can theoretically be decomposed into three components: the baseline value, the true signal, and the noise. Therefore, baseline and noise can be potentially associated with inter- and intra-sample variability, and data preprocessing is required to reduce the baseline and noise in the raw data before any multivariate analysis can occur [177 ], [178 ].
Morris and collaborators [175 ] showed illustrative examples that ignore the basic assumption of experimental planning, namely, blocking and randomization, and how it can completely compromise an experiment. The authors also showed how a change in the biological sample collection protocol after the first 20 subjects had been collected resulted in changes between the subtypes of cancer studied that could not be attributed to the biological question, instead it could be easily attributed to systematic and reproducible changes due to the experimental procedure. Oberg and Vitek [179 ] discuss how randomization, replication, and blocking helped to avoid systematic biases due to the experimental procedure and assisted in optimizing its ability to detect true quantitative changes between groups in mass spectrometry-based proteomic experiments. The importance of experimental design in mass spectrometry has been reviewed [180 ], [181 ]. The experimental design is also commonly used to test the combination of two or more factors of interest. Zhang et al. [182 ] obtained optimum automated data acquisition settings that yielded the highest reproducibility of replicate mass spectra based on experimental design statistical analysis of intact cell MALDI-MS spectra of Pseudomonas aeruginosa .
According to Xiang and Prado [183 ], the calibration methods are usually divided into two classes for high-resolution analyzers (mostly TOF analyzers): external and internal. The authors recorded a standard sample in a separate spectrum, in the external calibration process, and the calibration parameters were drawn from this standard sample and used to calibrate samples in other spectra. In the internal calibration process, standard molecules were mixed with the sample, a spectrum of the mixture acquired, and standard molecules peaks were identified and used to calibrate the entire spectrum. Fraser and coworkers [64 ] have used a single point external calibration and achieved mass accuracies of 10–70 ppm in MALDI-TOF analyses. When the same authors used an internal calibration, using two carotenoid calibrants flanking the m /z of interest, they achieved mass accuracies below 10 ppm. Saideman and coworkers [184 ], using internal calibration, have determined peptide mass accuracies ranging from 0.124 to 5.17 ppm (n = 3).
The baseline is associated with a series of matrix materials, impurities, ionization by-products, electronic signal noise, and sample preparation contamination [177 ], [185 ], [186 ]. Baseline correction methods were reviewed by Hilario and collaborators [187 ]. The MALDIquant package [188 ] implements different approaches to adjust the baseline. The packageʼs default is the Statistics-sensitive Non-linear Iterative Peak-clipping (SNIP) algorithm proposed by Ryan and collaborators [189 ] that yields a smooth baseline and leads to positive corrected intensities.
Systematic shifts can appear in repeated experiments, and the spectra for two identical metabolites can have different m/z values. The spectral alignment consists of aligning corresponding peaks across samples to address this problem [190 ]. For a given m/z value selected in different spectra, the peak position can differ slightly, and generally, this effect cannot be corrected by the instrument calibration [174 ].
The spectral alignment is a theme of constant research. A simple approach for spectral alignment is presented by Pham and Jimenez [172 ], in which peaks in individual spectra are matched against the closest peak in a mean spectrum. He and coworkers [191 ] compared several algorithms currently in use for mass spectrometry and showed that their correlations-based algorithm performed better in a SELDI-MS dataset. The alignment method described by He and coworkers [191 ] is also implemented in the MALDIquant package. Bloemberg and coworkers [192 ] wrote a comprehensive tutorial explaining the working principles of several methods and provided implementations in R [193 ] and Matlab (The Mathworks Inc.) languages.
Many spectral smoothing methods generate an intensity value by averaging the values within a span of data points. Smoothing is directly proportional to the span size (i.e., the segment size containing a specific number of m/z ratio data points); therefore, care must be taken when choosing the span size, as a large span size may lead to information loss. Smoothing is important to correctly estimate the noise and to improve the peak picking process [187 ].
Several algorithmic procedures for informative peak identification have been proposed, named as peak detection or peak picking, that correspond to true signals. A comparative review of some of these algorithms, their principles, implementations, and performance is presented by Yang and collaborators [194 ]. The author concluded that continuous wavelet transform (CWT)-based algorithms provided the best performance. The peak picking process is important in extracting the relevant signals from a mass spectrum and for its significant reduction of the data set dimensionality. Different implementations are available in the R package MALDIquant, as well as on XCMS [195 ], which can be straightforwardly adapted for MALDI-MS data.
In repeated experiments, it is common to observe differences arising from sample preparation and instrumental measure variation, which cannot be attributed to differences in the biological samples. Normalization is the transformation of sample-wise mass intensities to the same scale, which enables the comparison of different samples [196 ]. The impact of normalization was addressed by Ejigu and collaborators [197 ] in an LC-MS study. The authors state that normalization should be regarded more as a remedial measure, not to correct for all sources of variability introduced by different sources of biases. The authors also recommend data-driven (assuming that a large amount of the metabolites stay constant) normalization methods over model-driven (based on internal standards or intermediate quality control runs) normalization methods. Deininger and coworkers [198 ] have shown the importance of normalization in MALDI imaging experiments, where the incorrect normalization could completely change the potential biomarker spatial distribution. According to the same authors, median and noise level normalizations were significantly more robust than TIC normalization.
The success of multivariate data analysis depends heavily on the correct execution of the previous analysis steps. For example, incorrect peak picking can leave out peaks from important molecules, incorrect alignment can merge peaks from different molecules, and incorrect normalization can show intensity differences caused by the experimental procedure in place of biological differences. These mistakes impact the conclusions drawn from multivariate methods.
The multivariate analyses are commonly classified into two groups: unsupervised and supervised methods. Unsupervised methods are used on exploratory data analysis and make use of the data latent structure, without any previous assumption of sample classes. Supervised methods use information from the data structure and sample classes to estimate the model parameters [199 ].
From a large dataset, it is possible to predict the desired outcome only by chance. There are good confidence measures available to check the consistency of this analysis. For the most common unsupervised analysis, the hierarchical clustering, resampling, and regrouping techniques provide a consistence check [200 ]. For classification models (supervised analysis), care must be taken. A very popular model in the natural products community is PLS (partial least squares), which is prone to overfitting [201 ]. To avoid overfitting, an important technique in the classification models is the K-fold cross validation (CV) [175 ]. Many hybrid exploration/classification approaches have been proposed on MALDI. Alexandrove and coworkers [202 ] developed a spatially aware clustering approach for MALDI imaging. Briefly, the spectra are grouped by a measure of similarity taking into account its position, and then all pixels are pseudo-color coded according to the classification assignment. Pham and Jimenez [172 ] employed a grid search with exponential spacing to find the optimal values for support vector machine model selection. The authors report a final accuracy of 83 % on a separate validation set (Breast Cancer Study with MALDI-MS Data) of 78 samples.
Tong and coworkers [177 ] argue that there is a lack of guidelines for data processing, and unexperienced practitioners can add variation when handling complex models. The authors also state that the functionalities of many tools designed for SELDI-MS or LC-ESI-MS are not transferable to MALDI-MS data analysis. As there is constant development of new computational statistical methods, many scientific communities are fostering the development of modular software in such a way that the modules can be combined to achieve different goals [203 ]. There is also an increasing realization of data standards, data storage, and exchange [204 ], [205 ]. These good experimental practices will ultimately allow improvement in experimental reproducibility among different research groups and promote faster improvements in the groups that successfully adopt these practices.
He and coworkers [185 ] have a similar opinion regarding the difficulties of experimentalists in using software toolkits. The authors recommend the use of flexible software platforms that enable new users to construct new fit-to-purpose workflows. The authors cite case-based reasoning (CBR), a process of solving new problems based on the solutions of similar past problems [206 ].
There is also an agreement that most of the complex experiments should be performed by multidisciplinary groups and that data analysis in particular could benefit from the participation of statisticians, computer scientists, or more generally quantitative scientists. Their participation would be effective if they could be present in all phases from study design to interpretation of information extracted from experimental data [175 ], [177 ], [185 ].