Keywords congenital coagulation and platelet disorders - LR-PCR - NGS - high throughput sequencing
Schlüsselwörter hereditäre gerinnungs- und thrombozytenstörungen - LR-PCR - NGS - hochdurchsatzsequenzierung
Hemorrhagic diathesis can be caused by disorders in primary hemostasis (such as platelet
disorders) and in secondary hemostasis (e.g. coagulation factor deficiencies). They
present a heterogeneous group of bleeding disorders with clinical manifestations ranging
from mild to severe.[1 ] The prevalence of the individual disease in the general population is low and strongly
influenced by ethnicity and rate of consanguinity.
For many hereditary coagulation and platelet disorders, the diagnostic request focusses
on only one specific gene (e.g. F7 or F10 ). The identification of the causative variant of an affected patient does not only
confirm the diagnosis but can also influence the therapeutic regimen. Furthermore,
targeted testing of at-risk family members becomes feasible.
In the past two decades, Sanger technology was considered to be the “gold standard”
for sequencing. The method provides a precise tool for routine molecular diagnostics,
but the capacity of this technique in terms of multiplexing and high-throughput analyses
is limited. Furthermore, it is known to be cost and time consuming. In July 2016,
next generation sequencing (NGS)-based molecular testing became remunerable for routine
diagnostics of hereditary coagulation and platelet disorders according to the reimbursement
catalogue of Germany‘s statutory health insurance system (EBM).
However, while NGS-based multigene panel analyses achieve high-performance and are
suitable for medium to large target regions (≥ 100 kb in settings of less well defined
clinical phenotypes), they are not always proficient for the analysis of small genes
in which mutations are known to be associated with a specific phenotype.[2 ] Thus, additional strategies for targeted enrichment[3 ] must be considered especially in cases in which smaller target regions are of interest
for a limited number of samples and time to diagnosis is critical.
Long-range PCR (LR-PCR) has the advantage that it does not require a customized design
by commercial vendors. In this proof-of-principle study, we intended to establish
LR-PCR target enrichment for single gene analyses of F7 , F10 , F11 , F12 , GATA1 , TUBB1 , and WAS on a MiSeq platform. This approach proved to be reliable, highly flexible, and also
appropriate for even larger target regions such as MYH9 .
Material and Methods
DNA Isolation and Conventional Mutation Analyses
DNA samples of all 43 study probands were extracted with written informed consent
according to the German Gene Diagnostic Act from peripheral blood lymphocytes using
standard techniques. The coding regions and adjacent exon-intron boundaries (± 50 bp) of the following genes were amplified by PCR:
F7 (Locus Reference Genomic sequence LRG_554; transcript LRG_554t1; number of probands
[n] = 6)
F10 (LRG_548, LRG_548t1; n = 5)
F11 (LRG_583, LRG_583t1; n = 3)
F12 (LRG_145, LRG_145t1; n = 5)
GATA1 (LRG_559, LRG_559t1; n = 5)
TUBB1 (LRG_581, LRG_581t1; n = 6)
WAS (LRG_125, LRG_125t1; n = 5)
MYH9 (LRG_567, LRG_567t1; n = 8)
Primer sequences are available upon request. Conventional sequencing was performed
on an ABI 3130xl sequencer using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied
Biosystems, Carlsbad, CA, USA). Sequence data were analyzed with the SeqPilot software
(Version 4.3.1, JSI Medical Systems, Ettenheim, Germany). SALSA MLPA Kits P207, P440
and P432 were used for detection of copy number variations in F7 , F10 , F11 , and MYH9 (MRC-Holland, Amsterdam, The Netherlands).
Target Enrichment and High-Throughput Sequencing
Primers for LR-PCRs covering the entire genomic regions of F7 , F10 , F11 , F12 , GATA1 , TUBB1 , WAS , and MYH9 were designed with Primer3 (v4.0.0; http://bioinfo.ut.ee/primer3/ ; [Table S1 ]). PrimeSTAR® GXL DNA polymerase (Takara Bio Europe/ Clontech, Saint-Germain-en-Laye,
France) was used for PCR amplification according to the manufacturer̀s instructions.
The upstream region of MYH9 was amplified with the GC-RICH PCR System (Hoffmann-La Roche, Basel, Switzerland).
All LR-PCR products were purified with Agencourt® AMPure® XP system (Beckman Coulter,
Pasadena, USA) and quantified with the dsDNA HS Assay Kit on a Qubit® 2.0 (Thermo
Fischer Scientific, Waltham, USA).
Equimolar amounts of the purified PCR amplicons were combined in 14 pools containing
non-overlapping target regions of up to seven individual probands. The Nextera XT
kit was used to prepare sequencing libraries with 1 ng of each pool (Illumina®, San
Diego, USA). Individually barcoded DNA libraries were combined and sequenced on a
MiSeq instrument with 2 × 150 or 2 × 250 cycles (Reagent Nano Kit v2 or Reagent Kit
v3; Illumina®, San Diego, USA).
Bioinformatics Analyses
The MiSeq Reporter Software was used for demultiplexing and primary data analysis
(MSR v2.5.1; Illumina®, San Diego, USA). The quality of sequencing reads was analyzed
with the FASTQC toolkit (http://www.bioinformatics.babraham.ac.uk/index.html ) and sequencing coverage was visualized with the GVIZ package for R software (https://www.r-project.org/ ). The SeqNext module of SeqPilot software (JSI Medical Systems) was used for read
alignment against the human reference assembly GRCh37/hg19 and variant calling. Only
coding regions, exon-intron boundaries (± 50 bp), and known promoter regions of the genes were analyzed (diagnostic region of
interest). 30 × was defined as the minimum diagnostic sequencing depth.
Results
Fast and Comprehensive Enrichment of Various Genomic Target Regions with LR-PCR
With only minor optimization of the reaction conditions, the entire genomic loci of
F7 (14.9 kb), F10 (26.7 kb), F11 (23.8 kb), F12 (14.4 kb), GATA1 (7.8 kb), TUBB1 (7.4 kb), and WAS (14.8 kb) could be successfully amplified by LR-PCR. Only one or two PCR amplicons
were necessary for a specific and highly efficient enrichment of each gene ([Fig. 1 ], [Table S1 ]). Target enrichment, purification of the PCR products, pooling, and library preparation
for 35 probands were completed in only two regular working days. Due to its large
size of 106.8 kb, the entire genomic region of the MYH9 gene was independently amplified without any PCR dropouts for another eight probands
with eight overlapping LR-PCR amplicons.
Fig. 1 Comprehensive coverage and characteristic, gene-specific patterns of sequencing depth
for the entire genomic target regions. The genomic loci of F7 , F10 , F11 , F12 , GATA1 , TUBB1 , WAS and MYH9 were amplified with 17 LR-PCR amplicons and separated by gel electrophoresis. The
robust and specific amplification of all amplicons is shown for a healthy control
(upper left subpanel). Representative gene-specific patterns of read depth across
the entire genomic regions are shown as coverage plots (darkblue). Horizontal bold
and broken lines indicate a read depth of 6000 × and 3000 ×, respectively. Chromosome
ideograms with the cytogenetic locations of the genes (bold green lines) are depicted
in the upper part of each subpanel. Exon-intron structures of their respective reference
transcripts are shown in the lower parts. The size of its entire genomic region is
given below each gene. kb = 1000 base pairs.
High-throughput sequencing of the pooled DNA libraries produced an average output
of 1.9 × 106 sequencing reads (SD: 0.3 × 106 ) and consistently high mean coverages for the specific target regions (Min: 837 ×,
Max: 3434 ×, SD: 578 ×; [Fig. 2 ]). Interestingly, characteristic patterns of sequencing depth were found for each
gene after read alignment ([Fig. 1 ]). These unique patterns were recapitulated in all probands tested for the respective
genetic loci ([Fig. S1 ]). The NGS data of the pooled DNA libraries were also checked for hints of PCR cross-contaminations,
but significant read depths (>10 reads) were only seen in regions previously enriched
by PCR.
Fig. 2 Quality parameters of NGS analysis. The number of mapped (green) and unmapped reads
(red; left y-axis) are shown for all 14 DNA sequencing libraries (Pool 1–14) that
compromise amplified target regions of up to seven individual probands. Mean region
coverage depths are depicted as black rectangles (right y-axis). Libraries sequenced
with 2 × 150 cycles (1–6) or with 2 × 250 cycles (7–14) are separated by a dashed
line.
Broad Coverage of Coding Regions and Exon-Intron Boundaries
The mean coverage for the coding regions, canonical splice sites, exon-intron boundaries
(± 50 bp) and known promoter regions were:
F7 : 1292 ×,
F10 : 1687 ×,
F11 : 2583 ×, and
F12 : 1455 ×.
The sequencing depths of thrombocytopenia-associated genes were also sufficiently
high:
GATA1 : 2175×
TUBB1 : 2768×
WAS : 1928×
MYH9 : 2420×
Overall, 94 % of the cumulative target region was covered with a sequencing depth
of more than 30 × (180.6 of 191.6 kb). NGS achieved complete coverage for the diagnostic
target regions of F10 , F11 , F12 , GATA1 , and TUBB1 . Only a single exonic gap in exon 10 of the WAS gene (coverage < 30 ×; LRG_125t1: c.932_1338) was consistently seen in all five probands
sequenced for this genomic locus ([Fig. 1 ]). A minor sequencing problem with a mean coverage of 46 × was also found in exon
36 of MYH9 (LRG_567t1: c.5062_5150). Both regions have a high GC-content of nearly 70 %. A complete
sequencing dropout was observed for only one of the 107 LR-PCR products sequenced
in this study (dropout rate = 0.9 %). This single PCR amplicon was included and completely
covered in a second sequencing run.
Reliable, Sensitive and Specific Mutation Detection by NGS
The main purpose of this study was to establish a novel NGS-based screen for missense,
nonsense, splice and small frameshift mutations. For an evaluation of the analytic
sensitivity and specificity of this approach, DNA samples of 43 probands previously
screened with conventional Sanger sequencing were re-analyzed with LR-PCR and NGS.
All 25 known pathogenic or likely pathogenic variants ([Table 1 ]) and 128 polymorphisms in the eight analyzed genes were re-identified.
Table 1
Distinct pathogenic and likely pathogenic variants included in this study. Variants
identified in more than one proband are only listed once
Gene
Nucleotide Change
Protein Change
Reference
F7
c.911C > T
p.(Ala304Val)
Tamary et al. 1996[18 ]
c.1061C > T
p.(Ala354Val)
Bernardi et al. 1994[19 ]
c.1391delC
p.(Pro464Hisfs*32)
Arbini et al. 1994[20 ]
c.64 + 430_131–6delinsTCGTAA
p.?
Rath et al. 2015 [21 ]
F10
c.413A > T
p.(Gln138Leu)
Rath et al. 2015[21 ]
c.979C > T
p.(Arg327Trp)
Millar et al. 2000[22 ]
c.1097G > A
p.(Arg366His)
Rath et al. 2015[21 ]
c.1159C > T
p.(Arg387Cys)
Hermann et al. 2006[23 ]
c.1247A > C
p.(Gln416Pro)
unpublished
deletion of exon 6 [c.(502 + 1_503–1)_(747 + 1_748–1)]
p.?
Hainmann et al. 2009[24 ]
F11
c.644_649delTCGACA
p.(Ile215_Asp216del)
Zadra et al. 2004[25 ]
c.803G > A
p.(Arg268His)
Duncan et al. 2008[26 ]
F12
c.-57G > C
p.?
Hofferbert et al. 1996[27 ]
c.-62C > T
p.?
Lombardi et al. 2008[28 ]
c.1381G > A
p.(Asp461Asn)
Schloesser et al. 1997[29 ]
c.1668delC
p.(Asp557Metfs*107)
unpublished
c.1681–1G > A
p.?
Schloesser et al. 1995[30 ]
GATA1
c.622G > A
p.(Gly208Arg)
Del Vecchio et al. 2005[31 ])
WAS
c.101delG
p.(Arg34Hisfs*11)
unpublished
c.256C > T
p.Arg86Cys
Kolluri et al. 1995[32 ]
As no false-positives were detected in the regions of interest, analytical sensitivity
and specificity of this NGS approach were calculated with 100 % each.
Interestingly, two large deletions previously found using multiplex ligation-dependent
probe amplifications (MLPA) could be re-identified by analyses of the gene-specific
coverage patterns. A homozygous deletion spanning exon 2 of the F7 gene (proband P5; [Fig. S1 ]) and a heterozygous deletion of exon 6 of the F10 gene (P7; [Fig. 3 ]) were consistently detected. NGS sequence analysis gave a rather precise estimation
of the respective size of the F7 [4.35 kb; c.64 + 430_131–6delinsTCGTAA] and F10 [approximately 4.8 kb; c.(502 + 1_503–1)_(747 + 1_748–1)] deletions but could not
fine-map the breakpoints down to single nucleotide level.
Fig. 3 Identification of a known exon-spanning heterozygous deletion in F10 . Analysis of the F10 coverage pattern of proband P7 (A) compared with the characteristic control pattern
(B) revealed an approximately 4.8 kb spanning heterozygous deletion (indicated by
red broken lines) which includes exon 6 of F10 and part of the flanking introns. The exon-intron structure of the reference transcript
is shown below. Result of F10 MLPA analysis for P7 also shows the heterozygous deletion of exon 6 (red arrow).
Green bars indicate normalized gene dosages of P7 compared with three healthy controls
(grey bars). F10 -specific probes are depicted on the left, reference probes on the right.
Discussion
The implementation of NGS in most laboratories has already changed standard workflows.
Nevertheless, NGS can also be challenging for diagnostic laboratories. Its high sequencing
capacity often implies the economic need for parallel analyses of large target regions
or a higher number of patients. These requirements can usually be best met with capture-based
NGS multigene panels. Hybridization probes minimize the risk of allele dropouts, can
be easily combined and are perfectly suitable for multiplexing. Technically, all genes
tested in this study could also have been covered with comprehensive capture-based
panels. However, if a diagnosis can most likely be confirmed by analysis of a single
small disease gene such as F7 or F10 , additional focussed strategies are more cost-effective.
As an alternative to time-consuming conventional Sanger sequencing, LR-PCR combined
with NGS is advantageous in this context since PCR primers are less expensive than
hybridization probes and PCR reactions can be realized with standard laboratory skills
and equipment. Additionally, there is only a limited need for PCR optimization,[4 ] and there are a lot of safety points in the workflow which allow repetition of a
failed reaction without extensive additional costs. Targeted single gene analyses
by LR-PCR also reduce the risk of incidental findings and have higher enrichment specificities
for targets with pseudogenes or repetitive regions compared to capture-based multigene
panels. Analogous strategies for the enrichment of en bloc genomic target regions
have been described not only for BRCA1 - and BRCA2 -associated hereditary cancer predisposition syndromes and HLA genotyping[5 ]
[6 ]
[7 ] but also for molecular analysis in hemophilia A.[8 ]
[9 ]
Our study further expands the spectrum of genes that can be efficiently analyzed by
LR-PCR amplification and NGS in a diagnostic context.
With an analytic sensitivity and specificity of 100 %, our approach proved to be highly
reliable. Furthermore, it is also quite flexible as different target regions can be
individually combined. Sequencing gaps or regions with a relatively low read depth
that were observed in this study in WAS and MYH9 are mainly restricted to regions with a high GC-content. Those regions are known
to cause trouble in NGS sequencing.[10 ] Currently they still need to be completed by Sanger sequencing.
Obviously, the specific assay for the eight genes studied here might not be suitable
for all diagnostic laboratories. However, it can be individually adapted and serve
as a valuable addition rather than as an alternative to existing NGS multigene panels.
Even combined sequencing with DNA samples enriched with other panels is feasible and
further decreases sequencing costs.
Of course, PCR-based enrichments are always hampered by the risk of allele dropout
due to sequence variants in primer binding sites or large deletions that can result
in a selective amplification of a single allele.[11 ]
[12 ] For example, we and others have noticed difficulties in conventional DNA sequencing
of exon 2 of MYH9 with a biased amplification due to a poly-guanine repeat.[13 ] Thus, several optimization steps may be necessary for each PCR to find the best
primer binding sites. Nonetheless, one has to keep in mind that allele dropout could
never be completely excluded and is hard to detect by Sanger sequencing because heterozygous
SNPs are often missing in amplified regions due to their small size. LR-PCR significantly
increases the chance of finding a heterozygous SNP in the genomic target region that
can be used to exclude an allele dropout. Therefore, a screen for heterozygous positions
has to be an integral part of data analysis if PCR-based enrichment strategies are
chosen.
Multiplex ligation-dependent probe amplification (MLPA) analysis still is the method
of choice for copy number detection in routine diagnostics. This technique requires
high quality DNA of patients as well as controls and a separate laboratory workflow.[14 ]
The LR-PCR approach described here can replace the standard method to speed up the
overall diagnostic process in particular in male patients with X-linked diseases such
as GATA1- and WAS -associated disorders.
A hemizygous intragenic deletion can easily be detected with NGS especially in such
small genes that are covered with only one LR-PCR amplicon. Even breakpoint mapping
to the approximate location within the amplicon is possible in men. But also in female
carriers of X-linked diseases and autosomal disorders, a heterozygous intragenic deletion
can be detected due to the unique gene specific NGS pattern ([Fig. S1 ]). In these cases LR-PCR might assess the size of a deletion more accurately than
MLPA.
One further advantage of a LR-PCR strategy covering the entire genomic region of a
specific gene of interest is the possibility to directly search for deep-intronic
mutations in patients that had remained mutation-negative after sequencing of the
coding regions and invariant splice sites of the respective disease-associated gene.
However, this procedure is not remunerable in a diagnostic setting according to the
reimbursement catalogue of Germany‘s statutory health insurance system (EBM) and can
therefore be performed only in a research context.
International research collaborations such as the ThromboGenomics Consortium or the
BRIDGE-BPD study have been formed to unravel the genetic basis in probands with congenital
bleeding and platelet disorders. They apply comprehensive NGS multigene panels which
work well for rather clear phenotypes.[15 ]
[ 16 ] For cases without a clear phenotypic lead, whole exome or genome sequencing might
be more successful.[17 ]. However, this requires a careful clinical characterization of family members, their
participation after informed consent including the management of incidental findings
and a well-established bioinformatics pipeline.
Conclusion
Taken together, our LR-PCR approach combined with NGS resulted in a success rate of
100 % for the re-identification of variants in eight different exemplary genes involved
in congenital coagulation and platelet disorders. In addition to multigene panels
which are particularly useful for disorders with genetic heterogeneity, it appears
to be a practical alternative method to Sanger sequencing for molecular diagnostics
of any small gene to confirm a specific clinical diagnosis.
What is known about this topic?
Molecular testing of genes involved in congenital coagulation and platelet disorders
has been routinely done by Sanger sequencing.
NGS-based molecular testing became remunerable for routine diagnostics of hereditary
coagulation and platelet disorders according to the reimbursement catalogue of Germany‘s
statutory health insurance system (EBM).
What does this paper add?
We here present a new method to transfer mutation analyses of genes involved in congenital
coagulation and platelet disorders from Sanger to NGS-based sequencing.
The LR-PCR strategy covering the entire genomic region of a specific gene of interest
raises the possibility to directly search for deep-intronic mutations in patients
that had remained mutation-negative after sequencing of the coding regions.