Abstract
To gain insight into the transcriptome of the well-used plant model system Physcomitrella patens, several EST sequencing projects have been undertaken. We have clustered, assembled, and annotated all publicly available EST and CDS sequences in order to represent the transcriptome of this non-seed plant. Here, we present our fully annotated knowledge resource for the Physcomitrella patens transcriptome, integrating annotation from the production process of the clustered sequences and from a high-quality annotation pipeline developed during this study. Each transcript is represented as an entity containing full annotations and GO term associations. The whole production, filtering, clustering, and annotation process is being modelled and results in seven datasets, representing the annotated Physcomitrella transcriptome from different perspectives. We were able to annotate 63.4 % of the 26 123 virtual transcripts. The transcript archetype, as covered by our clustered data, is compared to a compilation based on all available Physcomitrella full length CDS. The distribution of the gene ontology annotations (GOA) for the virtual transcriptome of Physcomitrella patens demonstrates consistency in the ratios of the core molecular functions among the plant GOA. However, the metabolism subcategory is over-represented in bryophytes as compared to seed plants. This observation can be taken as an indicator for the wealth of alternative metabolic pathways in moss in comparison to spermatophytes. All resources presented in this study have been made available to the scientific community through a suite of user-friendly web interfaces via www.cosmoss.org and form the basis for assembly and annotation of the moss genome, which will be sequenced in 2005.
Key words
Physcomitrella patens
- moss - transcriptome - annotation - gene ontology
References
1
Allagulova C. R., Gimalov F. R., Shakirova F. M., Vakhitov V. A..
The plant dehydrins: structure and putative functions.
Biochemistry (Mosc).
(2003);
68
945-951
2
Altschul S. F., Madden T. L., Schaffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J..
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Research.
(1997);
25
3389-3402
3
Apel K., Hirt H..
Reactive oxygen species: metabolism, oxidative stress, and signal transduction.
Annual Review of Plant Biology.
(2004);
55
373-399
4
Apweiler R., Bairoch A., Wu C. H., Barker W. C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M. J., Natale D. A., O'Donovan C., Redaschi N., Yeh L. S..
UniProt: the Universal Protein knowledgebase.
Nucleic Acids Research (Database issue).
(2004);
32
D115-D119
5
Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M., Davis A. P., Dolinski K., Dwight S. S., Eppig J. T., Harris M. A., Hill D. P., Issel-Tarver L., Kasarskis A., Lewis S., Matese J. C., Richardson J. E., Ringwald M., Rubin G. M., Sherlock G..
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
Nature Genetics.
(2000);
25
25-29
6
Bateman A., Coin L., Durbin R., Finn R. D., Hollich V., Griffiths-Jones S., Khanna A., Marshall M., Moxon S., Sonnhammer E. L., Studholme D. J., Yeats C., Eddy S. R..
The Pfam protein families database.
Nucleic Acids Research (Database issue).
(2004);
32
D138-D141
7
Benton D..
Recent changes in the GenBank on-line service.
Nucleic Acids Research.
(1990);
18
1517-1520
8
Brun F., Gonneau M., Doutriaux M. P., Laloue M., Nogue F..
Cloning of the PpMSH-2 cDNA of Physcomitrella patens, a moss in which gene targeting by homologous recombination occurs at high frequency.
Biochimie.
(2001);
83
1003-1008
9
Camon E., Magrane M., Barrell D., Binns D., Fleischmann W., Kersey P., Mulder N., Oinn T., Maslen J., Cox A., Apweiler R..
The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro.
Genome Research.
(2003);
13
662-672
10
Camon E., Magrane M., Barrell D., Lee V., Dimmer E., Maslen J., Binns D., Harte N., Lopez R., Apweiler R..
The Gene Ontology Annotation (GOA) database: sharing knowledge in Uniprot with gene ontology.
Nucleic Acids Research (Database issue).
(2004);
32
D262-D266
11
Cove D..
The moss, Physcomitrella patens.
.
Journal of Plant Growth Regulation.
(2000);
19
275-283
12
Du X. M., Yin W. X., Zhao Y. X., Zhang H..
[The production and scavenging of reactive oxygen species in plants].
Sheng Wu Gong Cheng Xue Bao.
(2001);
17
121-125
13
Ewing B., Green P..
Base-calling of automated sequencer traces using phred. II. Error probabilities.
Genome Research.
(1998);
8
186-194
14
Ewing B., Hillier L., Wendl M. C., Green P..
Base-calling of automated sequencer traces using phred. I. Accuracy assessment.
Genome Research.
(1998);
8
175-185
15
Fagegaltier D., Lescure A., Walczak R., Carbon P., Krol A..
Structural analysis of new local features in SECIS RNA hairpins.
Nucleic Acids Research.
(2000);
28
2679-2689
16
Frahm J.-P..
Moose - lebende Fossilien.
Biologie in unserer Zeit.
(1994);
24
120-124
17 Fujita T., Shin-i T., Seki M., Kamiya A., Uchiyama I., Nishiyama T., Carninci P., Hayashizaki Y., Shinozaki K., Kohara Y., Hasebe M.. 82317 Genbank accessions. (2004)
18
Girke T., Schmidt H., Zahringer U., Reski R., Heinz E..
Identification of a novel delta 6-acyl-group desaturase by targeted gene disruption in Physcomitrella patens.
.
The Plant Journal.
(1998);
15
39-48
19
Grillo G., Licciulli F., Liuni S., Sbisa E., Pesole G..
PatSearch: A program for the detection of patterns and structural motifs in nucleotide sequences.
Nucleic Acids Research.
(2003);
31
3608-3612
20
Harris M. A., Clark J., Ireland A., Lomax J., Ashburner M., Foulger R., Eilbeck K., Lewis S., Marshall B., Mungall C., Richter J., Rubin G. M., Blake J. A., Bult C., Dolan M., Drabkin H., Eppig J. T., Hill D. P., Ni L., Ringwald M., Balakrishnan R., Cherry J. M., Christie K. R., Costanzo M. C., Dwight S. S., Engel S., Fisk D. G., Hirschman J. E., Hong E. L., Nash R. S., Sethuraman A., Theesfeld C. L., Botstein D., Dolinski K., Feierbach B., Berardini T., Mundodi S., Rhee S. Y., Apweiler R., Barrell D., Camon E., Dimmer E., Lee V., Chisholm R., Gaudet P., Kibbe W., Kishore R., Schwarz E. M., Sternberg P., Gwinn M., Hannick L., Wortman J., Berriman M., Wood V., de la Cruz N., Tonellato P., Jaiswal P., Seigfried T., White R..
The Gene Ontology (GO) database and informatics resource.
Nucleic Acids Research (Database issue).
(2004);
32
D258-D261
21
Heintz D., Wurtz V., High A. A., Van Dorsselaer A., Reski R., Sarnighausen E..
An efficient protocol for the identification of protein phosphorylation in a seedless plant, sensitive enough to detect members of signalling cascades.
Electrophoresis.
(2004);
25
1149-1159
22
Hentze M. W., Kuhn L. C..
Molecular control of vertebrate iron metabolism: mRNA-based regulatory circuits operated by iron, nitric oxide, and oxidative stress.
Proceedings of the National Academy of Sciences of the USA.
(1996);
93
8175-8182
23
Holtorf H., Guitton M. C., Reski R..
Plant functional genomics.
Naturwissenschaften.
(2002);
89
235-249
24 Iseli C., Jongeneel C. V., Bucher P.. ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. International Conference on Intelligent Systems for Molecular Biology. (1999): 138-148
25
Jurka J..
Repbase update: a database and an electronic journal of repetitive elements.
Trends in Genetics.
(2000);
16
418-420
26
Kasukawa T., Furuno M., Nikaido I., Bono H., Hume D. A., Bult C., Hill D. P., Baldarelli R., Gough J., Kanapin A., Matsuda H., Schriml L. M., Hayashizaki Y., Okazaki Y., Quackenbush J..
Development and evaluation of an automated annotation pipeline and cDNA annotation system.
Genome Research.
(2003);
13
1542-1551
27
Kiessling J., Martin A., Gremillon L., Rensing S. A., Nick P., Sarnighausen E., Decker E. L., Reski R..
Dual targeting of plastid division protein FtsZ to chloroplasts and the cytoplasm.
Embo Reports.
(2004);
5
889-894
28 Kitts P. A., Madden T. L. H. S., Ostell J. A.. UniVec. www.ncbi.nlm.nih.gov/VecScreen/UniVec.html
.
29
Koprivova A., Altmann F., Gorr G., Kopriva S., Reski R., Decker E. L..
N-glycosylation in the moss Physcomitrella patens is organized similarly to that in higher plants.
Plant Biology.
(2003);
5
582-591
30
Koprivova A., Meyer A. J., Schween G., Herschbach C., Reski R., Kopriva S..
Functional knockout of the adenosine 5′-phosphosulfate reductase gene in Physcomitrella patens revives an old route of sulfate assimilation.
Journal of Biological Chemistry.
(2002);
277
32195-32201
31
Kroemer K., Reski R., Frank W..
Abiotic stress response in the moss Physcomitrella patens : evidence for an evolutionary alteration in signaling pathways in land plants.
Plant Cell Reports.
(2004);
22
864-870
32
Le S. Y., Maizel Jr. J. V..
A common RNA structural motif involved in the internal initiation of translation of cellular mRNAs.
Nucleic Acids Research.
(1997);
25
362-369
33
Mangalam H..
The Bio* toolkits - a brief overview.
Briefings in Bioinformatics.
(2002);
3
296-302
34
Mikami K., Repp A., Graebe-Abts E., Hartmann E..
Isolation of cDNAs encoding typical and novel types of phosphoinositide-specific phospholipase C from the moss Physcomitrella patens.
.
Journal of Experimental Botany.
(2004);
55
1437-1439
35 Miller N. D.. Tertiary and quarternary fossils. Schuster, R. M., ed. New Manual of Bryology, Vol. 2. Miyazaki; Hattori Bot. Lab. (1984): 1194-1232
36
Mulder N. J., Apweiler R., Attwood T. K., Bairoch A., Barrell D., Bateman A., Binns D., Biswas M., Bradley P., Bork P., Bucher P., Copley R. R., Courcelle E., Das U., Durbin R., Falquet L., Fleischmann W., Griffiths-Jones S., Haft D., Harte N., Hulo N., Kahn D., Kanapin A., Krestyaninova M., Lopez R., Letunic I., Lonsdale D., Silventoinen V., Orchard S. E., Pagni M., Peyruc D., Ponting C. P., Selengut J. D., Servant F., Sigrist C. J., Vaughan R., Zdobnov E. M..
The InterPro Database, 2003 brings increased coverage and new features.
Nucleic Acids Research.
(2003);
31
315-318
37
Nishiyama T., Fujita T., Shin I. T., Seki M., Nishide H., Uchiyama I., Kamiya A., Carninci P., Hayashizaki Y., Shinozaki K., Kohara Y., Hasebe M..
Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana : implication for land plant evolution.
Proceedings of the National Academy of Sciences of the USA.
(2003);
100
8007-8012
38
Oliver M. J., Dowd S. E., Zaragoza J., Mauget S. A., Payton P. R..
The rehydration transcriptome of the desiccation-tolerant bryophyte Tortula ruralis: Transcript classification and analysis.
BMC Genomics.
(2004);
5
89
39
Patnaik D., Khurana P..
Germins and germin like proteins: an overview.
The Journal of Experimental Biology.
(2001);
39
191-200
40
Pesole G., Grillo G., Liuni S..
Databases of mRNA untranslated regions for metazoa.
Computers and Chemistry.
(1996);
20
141-144
41
Pesole G., Liuni S., Grillo G., Licciulli F., Mignone F., Gissi C., Saccone C..
UTRdb and UTRsite: specialized databases of sequences and functional elements of 5′ and 3′ untranslated regions of eukaryotic mRNAs. Update 2002.
Nucleic Acids Research.
(2002);
30
335-340
42 Quatrano R., Bashiardes S., Cove D., Cuming A., Knight C., Clifton S., Marra M., Hillier L., Pape D., Martin J., Wylie T., Underwood K., Theising B., Allen M., Bowers Y., Person B., Swaller T., Steptoe M., Gibbons M., Harvey N., Ritter E., Jackson Y., McCann R., Waterston R., Wilson R.. Leeds/Wash U Moss EST Project, 19538 Genbank accessions. (1999)
43
Reiser L., Mueller L. A., Rhee S. Y..
Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems.
Plant Molecular Biology.
(2002);
48
59-74
44
Rensing S. A., Fritzowsky D., Lang D., Reski R..
Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens.
.
BMC Genomics.
(2005);
in press
45 Rensing S. A., Lang D., Reski R.. In silico prediction of UTR repeats using clustered EST data. Proceedings of the German Conference on Bioinformatics. Munich, Germany; Belleville Verlag Michael Farin (2003): 117-122
46 Rensing S. A., Rombauts S., Hohe A., Lang D., Duwenig E., Rouze P., Van de Peer Y., Reski R.. The transcriptome of the moss Physcomitrella patens: Comparative analysis reveals a rich source of new genes. http://www.plantbiotech.net/Rensing_et_al_transcriptome2002.pdf . (2002 a)
47
Rensing S. A., Rombauts S., Van de Peer Y., Reski R..
Moss transcriptome and beyond.
Trends in Plant Science.
(2002 b);
7
535-538
48
Reski R..
Development, genetics and molecular biology of mosses.
Botanica Acta.
(1998);
111
1-15
49
Reski R..
Molecular genetics of Physcomitrella.
.
Planta.
(1999);
208
301-309
50
Rhee S. Y., Beavis W., Berardini T. Z., Chen G., Dixon D., Doyle A., Garcia-Hernandez M., Huala E., Lander G., Montoya M., Miller N., Mueller L. A., Mundodi S., Reiser L., Tacklind J., Weems D. C., Wu Y., Xu I., Yoo D., Yoon J., Zhang P..
The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.
Nucleic Acids Research.
(2003);
31
224-228
51
Richter U., Kiessling J., Hedtke B., Decker E., Reski R., Borner T., Weihe A..
Two RpoT genes of Physcomitrella patens encode phage-type RNA polymerases with dual targeting to mitochondria and plastids.
Gene.
(2002);
290
95-105
52
Sarnighausen E., Wurtz V., Heintz D., Van Dorsselaer A., Reski R..
Mapping of the Physcomitrella patens proteome.
Phytochemistry.
(2004);
65
1589-1607
53
Schuler G. D., Epstein J. A., Ohkawa H., Kans J. A..
Entrez: molecular biology database and retrieval system.
Methods in Enzymology.
(1996);
266
141-162
54
Schween G., Egener T., Fritzkowsky D., Granado J., Guitton M.-C., Hartmann N., Hohe A., Holtorf H., Lang D., Lucht J. M., Reinhard C., Rensing S. A., Schlink K., Schulte J., Reski R..
Large-scale analysis of 73 329 Physcomitrella plants transformed with different gene disruption libraries: production parameters and mutant phenotypes.
Plant Biology.
(2005);
in press
55
Stajich J. E., Block D., Boulez K., Brenner S. E., Chervitz S. A., Dagdigian C., Fuellen G., Gilbert J. G., Korf I., Lapp H., Lehvaslaiho H., Matsalla C., Mungall C. J., Osborne B. I., Pocock M. R., Schattner P., Senger M., Stein L. D., Stupka E., Wilkinson M. D., Birney E..
The Bioperl toolkit: Perl modules for the life sciences.
Genome Research.
(2002);
12
1611-1618
56
Takezawa D., Minami A..
Calmodulin-binding proteins in bryophytes: identification of abscisic acid-, cold-, and osmotic stress-induced genes encoding novel membrane-bound transporter-like proteins.
Biochemical and Biophysical Research Communications.
(2004);
317
428-436
57
Theissen G., Münster T., Henschel K..
Why don't mosses flower?.
New Phytologist.
(2001);
150
1-8
59
von Schwartzenberg K., Schultze W., Kassner H..
The moss Physcomitrella patens releases a tetracyclic diterpene.
Plant Cell Reports.
(2004);
22
780-786
60
Walczak R., Westhof E., Carbon P., Krol A..
A novel RNA structural motif in the selenocysteine insertion element of eukaryotic selenoprotein mRNAs.
RNA.
(1996);
2
367-379
61
Ware D., Jaiswal P., Ni J., Pan X., Chang K., Clark K., Teytelman L., Schmidt S., Zhao W., Cartinhour S., McCouch S., Stein L..
Gramene: a resource for comparative grass genomics.
Nucleic Acids Research.
(2002);
30
103-105
62
Wheeler D. L., Church D. M., Edgar R., Federhen S., Helmberg W., Madden T. L., Pontius J. U., Schuler G. D., Schriml L. M., Sequeira E., Suzek T. O., Tatusova T. A., Wagner L..
Database resources of the National Center for Biotechnology Information: update.
Nucleic Acids Research (Database issue).
(2004);
32
D35-D40
63
Wise M. J., Tunnacliffe A..
POPP the question: what do LEA proteins do?.
Trends in Plant Science.
(2004);
9
13-17
64
Wojtaszek P..
Oxidative burst: an early plant response to pathogen infection.
Biochemical Journal.
(1997);
322
681-692
65
Zank T. K., Zahringer U., Beckmann C., Pohnert G., Boland W., Holtorf H., Reski R., Lerchl J., Heinz E..
Cloning and functional characterisation of an enzyme involved in the elongation of Delta6-polyunsaturated fatty acids from the moss Physcomitrella patens.
.
The Plant Journal.
(2002);
31
255-268
S. A. Rensing
Plant Biotechnology Faculty of Biology University of Freiburg
Schänzlestraße 1
79104 Freiburg
Germany
Email: stefan.rensing@biologie.uni-freiburg.de
Editor: H. Rennenberg