PlantSPEADMonocot

Help

Overview

We developed the PlantSPEAD to display annotations on genomics and proteomics of all the SRPs in rice (362 SRPs) and Arabidopsis (436 SRPs). Gene name, chromosomal location, strand direction, nucleotide sequences (gDNA, cDNA and CDS) and gene ontologies are shown as genomics, while the length and molecular weight of the peptide chain, domain and family, description and function of the splicing protein, taxonomic lineage, protein evidence and domain analysis, peptide sequence and structural data are identified as proteomics. Genomic annotations are with respect to sequence database GenBank, and proteomic annotations are with respect to four different protein databases; STRING, RefSeq, TrEMBL and Swiss-Prot. Furthermore, we present gene expression of each SRP under different abiotic stress conditions including drought, cold, heat, salt, and wounding etc.Conclusively, annotations and expression of SRPs are described. The above complexity of splicing related protein information could be effectively managed by an orderly arrangement of data, in the form of a database. Interestingly, accurate and detailed annotation data retrieved through our database identify more SRPs, in addition to the currently described ones.

Experimental evidences reveal that the mutants of SRPs modulate the alternative splicing process. Certain biotic as well as abiotic stress responses in plants take place as a result of functionally different proteins translated by alternative splice variants. Furthermore, some small molecules and drugs are spurned by plants, as novel isoforms of targets may not react with drugs or small molecules in an effective manner. Therefore, the plant stress resistance improvement programmes, which focus on plant genome engineering and drug design (in rice and other related model crops) can positively benefit from the content of PlantSPEAD. The PlantSPEAD encourages the user to identify, classify and compare the SRPs and their annotations in model monocot rice, and model dicot Arabidopsis. The comprehensive and novel database provides a strong contribution to remove the knowledge gaps regarding SRPs in plants, while competently providing up-to date and accurate information.

Background

The starting point of precursor messenger RNA (pre-mRNA) splicing was discovered in the 1970s, as the length difference between RNA in the cell nucleus (longer) and RNA in the cytoplasm (shorter) in vertebrates. This means some string of base pairs are missing before RNA move to the cytoplasm. The missing strings are introns which are not able to encode proteins. However, both nuclear RNA and cytoplasmic RNA possess same termini, cap structure at its 5΄ end and a polyadenosine [poly (A)] tract at its 3΄ end. Then, it is concluded that introns are removed from the middle. After this revolution, tremendous splicing facts have being revealed via many studies.

Pre-mRNA splicing is a crucial step of post-transcriptonal mechanism because of it does excision of introns thereafter ligation of exons together. Introns are non coding regions while exons are coding regions of a gene which finally express as a protein. Furthermore, alternative pre-mRNA splicing (AS) is remarkable for the protein diversity as it enables expression of multiple proteins from a single gene. The spliceosome is the cellular machinery in which pre-mRNA splicing is controlled by various splicing related proteins which are either core or regulatory splicing factors. However, the aberrant splicing is responsible for biotic and abiotic stress responses in plants. Nevertheless, some splice variants are trustworthy for the occurrence of plant immunity.

Pre-mRNA splicing is accomplished by two consecutive trans-esterification reactions (Fig. 1). Thereby, five different snRNPs and other splicing related proteins play crucial roles. In the first trans-esterification, 2´ hydroxyl of an adenosine residue at the branch point of pre-mRNA attacks phosphodiester moiety at the 5´ splice site and does a nucleophilic substitution resulting in two intermediates; a free form of 5´ exon and the intermediate lariat with 3´ exon-intron moiety (still 5´ end of the intron attached to the adenosine at the branch point). In the second trans-esterification, 3´ hydroxyl of the 5´ exon attacks the phosphodiester moiety at the 3´ splice site resulting in ligation of two flanking exons and release lariat with spliced out introns. Later, spliced exons are transferred to the cytoplasm and translated into multiple proteins according to the splice variants.

Fig. 1. Mechanism of pre-mRNA splicing

Workflow

In order to construct PlantSPEAD, splicing related genes were collected via ASRG, SRGD databases, research data and SMRT sequencing. Then, the basic genomic and proteomic information of splicing genes were mined through number of well established databases including Phytozome, TAIR10, ThaleMine, MSU, PlantGDB and UniProtKB. Functional data were gathered via TAIR10, MSU and UniProtKB. Gene ontology information were collected from UniProtKB. Structural data were analyzed through the protein data bank (PDB). Protein evidence and domain analysis were done using InterPro database. Genomic annotations were with respect to sequence database GenBank, and proteomic annotations were with respect to four different protein databases, STRING, RefSeq, TrEMBL and Swiss-Prot. Gene expression data were presented based on BAR eFP web browser while gene models were from UCSC (Fig. 2). At the end of the processing of all the collected data, they were saved using MySQL. Thereafter, the database construction, data interaction and designing of web pages were done using PHP scripts.

The currently reported collection of 362 SRPs with their 611 isoforms in Oryza sativa ssp. japonica can be categorized into five groups; namely, small nuclear ribonucleoproteins (snRNPs; 76 members), splicing factors (SF; 108 members), splicing regulatory proteins (RP; 42 members), novel spliceosome proteins (NS; 72 members) and possible splicing related proteins (PS; 64 members). In addition, the number of isoforms per splicing gene was reportedly increased from 2 to 28 using third generation sequencing technique, PacBio RS II in rice. The model plant, Arabidopsis possesses a collection of 436 SRPs with their 948 isoforms identified to date, which can be divided into, 99 snRNPs, 115 SF, 60 RP, 84 NS and 78 PS. The above complexity of splicing related protein information could be effectively managed by an orderly arrangement of data, in the form of a database called PlantSPEAD.

Fig. 2. Flowchart of the PlantSPEAD development

Using PlantSPEAD

The user can navigate into different pages within PlantSPEAD. They are home, search, browse, download, help, links, and citation & contact page. The outlook and the function of each page can be shown as follows.

Index

1. Home PlantSPEAD
2. Search PlantSPEAD
3. Browse PlantSPEAD
4. Download PlantSPEAD
5. Links PlantSPEAD

1. Home PlantSPEAD

PlantSPEAD's home page (Fig. 3) shows a quick overview about PlantSPEAD and its content. The user can search SRPs based on monocot or dicot.

Fig. 3. Home page

2. Search PlantSPEAD

The user can search the information through three approaches. They are quick search (Fig. 4), advanced search (Fig. 5) and blast search (Fig. 6) In the quick search, the user can see the SRPs based on their classification with respect to the species. Thereby, groups and subgroups of splicing related genes and number of genes/transcripts which each group obtained are shown. If you click on a particular group of splicing protein, a new page will open up, and which shows all the SRPs in that particular group. In the advanced search, gene id, transcript id, gene name, family and domain, group of the splicing related protein, chromosomal location, species specificity, monocot/dicot specificity, GO term and protein domain database accession etc. can be used to access their further information. In the blast search, blastp (protein vs protein) program enables the comparison between your peptide sequence and other biological sequences of SRPs in our database based on their functional and evolutionary relationships.

Fig. 4. Quick search

Fig. 5. Advanced search

Fig. 6. Blast search

3. Browse PlantSPEAD

When you click on a particular gene id or transcript id in here, it will be opened in a new page with many sub categories. They are basic information, functional description, gene ontology, annotation nucleotide, annotation protein, sequence and structural information, protein evidence and domain analysis and link-out. In the basic information, the user can access gene id, transcript id, splicing protein group, gene name, gene properties, organism, taxonomic id, taxonomic lineage, family/domains, protein properties, protein description, remarks (if any) and reference. Protein functions are given with respect to the different databases including uniprot (both for rice and Arabidopsis), rice genome annotation project/MSU (rice) and tair (Arabidopsis).

Genomic annotations are with respect to the sequence database GenBank, and proteomic annotations are with respect to four different protein databases, STRING, RefSeq, TrEMBL and Swiss-Prot. In addition to the nucleotide annotation and protein annotations, gene ontology (GO) of each gene are well described with its GO term, category and description. The category of GO term can be a cellular component (C), molecular function (F) or biological process (P). All sequences of a given gene including genomic DNA (gDNA), complementary DNA (cDNA), coding DNA sequence (CDS) and peptide sequence are shown. Huge portion of PlantSPEAD is covered by protein evidence and domain analysis based on different protein domain databases; Pfam, SUPERFAMILY, Gene3D, SMART, PANTHER, CDD, PROSITE and PRINTS etc. For each database it shows its entry ID, start and end coordinates of the peptide sequence, statistical significance, Interpro ID, and protein description for a best match of a given splicing related protein. If there are any structures of a relevant splicing related protein, structural data present according to the protein data bank (PDB) database (Fig. 7). Furthermore, we present gene expression of each SRP under different abiotic stress condtions including cold, osmotic, salt, drought, genotoxic, oxidative, uv-B, wounding and heat in shoot (Fig. 8) and root (Fig. 9).

Fig. 7. Browse page

Fig. 8. Browse page

Fig. 9. Browse page

4. Download PlantSPEAD

The user can download the small sets of data or bulk data including splicing protein group, gene properties, protein properties, description of the protein, function of the protein, gene ontology, nucleotide annotation, protein annotation, sequence data as well as protein domain analysis with respect to monocot or docot (Fig. 10).

Fig. 10. Download page

5. Links PlantSPEAD

Here we show the links of all the related databases which were used to create our database (Fig. 11). The user can access further information related to splicing genes using those websites and databases.

Fig. 11. Links page

Notice

The starting points of PlantSPEAD are ASRG (Arabidopsis Splicing Related Genes) done by Wang & Brendel in 2004 and SRGD (Splicing Related Gene Database) done by Chen & Brendel in 2011. Splicing related genes which are presented in PlantSPEAD can be classified into five different groups according to a previous work done by Wang & Brendel. They are small nuclear ribonucleoproteins (snRNPs), splicing factors, splicing regulatory proteins, novel spliceosome proteins and possible splicing related proteins. In addition, a new subgroup called novel splicing-related proteins is added under the last group; possible splicing related proteins which possesses another sixteen minor groups in PlantSPEAD.

1.0 Small nuclear ribonucleoproteins (snRNPs)

The most complicated RNP machine responsible for pre-mRNA splicing is “spliceosome” which is located in cell nucleus. In spliceosome, RNA and protein conjoin and create ribonuclear proteins (RNP) called as small nuclear ribonuclear proteins (snRNPs). It consists of five Uridine-rich snRNPs (U1, U2, U4, U5 and U6) corresponding to five snRNAs. Five snRNPs (U1 snRNP, U2 snRNP, U5 snRNP, U4/U6 snRNP and U4.U6/U5 tri-snRNP) are involved in major spliceosome construction. All U snRNPs contain seven core proteins which belong to “sm” domain except U6 snRNP. Instead of, U6 snRNP possesses seven “like sm” core protein (LSM). In addition to that, minor spliceosome is composed of another snRNPs including U11 and U12. PlantSPEAD represents seven subgroups of snRNPs: sm core proteins, U1 snRNP specific proteins, 17S U2 snRNP specific proteins, U5 snRNP specific proteins, U4/U6 snRNP specific proteins, tri-snRNP specific proteins (U4/U6.U5) and 18s U11/U2 snRNP specific proteins.

2.0 Splicing factors

This group is very comprehensive and well-studied which consists of eight subgroups: splice site selection, SR protein, 17S U2 associated proteins, 35S U5 associated proteins, proteins specific for BDU1 complex, exon junction complex (EJC) proteins, second step splicing factors and other known splicing factors. Some splicing factors are specific to particular species. Among splicing factors, SR protein is crucial which comprises serine and arginine rich domain. They are fundamental in constitutive and alternative pre-mRNA splicing. In addition to SR protein, RNA recognition motif (RRM), U2AF and DnaK are also remarkable protein in this group.

3.0 Splicing regulatory proteins

Splicing regulatory proteins act on splicing factors. Thereby, they can modify the function of splicing factors. Furthermore, regulators are rival proteins with splicing factors for their binding sites. Major splicing regulators are heterogeneous nuclear ribonucleoproteins (hnRNP) and SR protein kinase. Splicing regulatory proteins have five subgroups. They are SR protein Kinase, glycine-rich RNA binding protein, hnRNP A/B family, other hnRNP proteins (with animal homologs) and other plant hnRNPs. Most of them contain RRM domain.

4.0 Novel spliceosome proteins

This is the fourth group of splicing related protein and which has five subgroups. They are proteins involved in other processes, poly A binding protein, DEAD/H box helicase, cis-trans prolyl isomerases and related to spliceosome. In fact, this group of protein comprises of diverse domains including WD, DEAD-box ATP-dependent RNA helicase, peptidylprolyl isomerase, cyclin-dependent kinase, RRM and KH domain etc.

5.0 Possible splicing related proteins

This is the last group of splicing related proteins which is composed of five major subgroups; splice factor like, nucleotide binding proteins, possible multiple function proteins, glycine rich proteins and novel splicing related proteins. The last subgroup: novel splicing related protein is again subcategorized into sixteen minor groups. The genes within this subgroup were collected from the paper done by Chen & Brendel. These minor groups are ATP binding/ATP-dependent helicase/RNA binding, ATP-dependent RNA helicase, ATP-dependent RNA or DNA unwinding, DEAD box RNA helicase, KH domain containing protein, mRNA 3-UTR binding protein, PEP (PEPPER); RNA binding /nucleic acid binding, PMH1 (Putative Mitochondrial RNA Helicase 1); ATP-dependent helicase/DNA binding/RNA binding, polyadenylate-binding protein, PRH75; ATP-dependent helicase/DEAD/H-box RNA helicase binding, RNA helicase, RRM containing protein, splicing factor, splicing factor 1 (SF1) K homology RNA-binding domain (KH), splicing factor U2AF and RNA-binding protein cp29.