Prediction of SNP activity

Research activity

	5) Development of algorithms predicting the biological effects of polymorphisms falling within 3'UTRs target for micro-RNAs
	The proposed approach could help to select only SNPs having a (putative) biological function, minimizing the workflow and the costs. Here we discuss about SNPs within 3'UTR and miRNA target sites and we provided the basis for a reasoned algorithm-driven selection of SNPs. Micro-RNAs (miRNAs) are non-coding single stranded RNAs of about 21-25 nucleotides regulating gene translation in both plants and animals, by binding the 3’UTR of target mRNAs. MiRNAs are processed from larger (~80-nt) precursor hairpins by the RNase III enzyme Dicer into miRNA:miRNA* duplexes. One strand of these duplexes associates with the RNA-induced silencing complex (RISC), whereas the other is generally degraded. The gene expression is regulated by miRNAs at post-transcriptional level in two manners: when there are mismatches between miRNA and target the translation is reduced or inhibited, whereas in case of perfect matching the miRNA acts as silencing RNA (siRNA) and the target is degraded by cytosolic enzymes. In the former case the binding of miRNA to its target does not affect the mRNA level, while in the latter case it does (Bartel, 2004). The rules governing the pairing between miRNA and target have been established: the maximal complementarity is restricted to a region called “seed sequence” on the 5’ end of miRNA, spanning the nucleotides 2 to 8 (Tomari and Zamore, 2005). Within the seed sequence a maximum of one mismatch is tolerated. Moreover, the G:T pairing is admitted in the miRNA-mRNA hybrid. Recent evidences indicate that miRNAs are involved in most biological and pathological processes, including development timing, cellular differentiation, proliferation, apoptosis, insulin secretion, cholesterol biosynthesis, and tumorigenesis. However, the binding between miRNA and mRNA can be affected by single-nucleotide polymorphisms (SNPs) that can reside in the target site: SNPs can either weaken or reinforce the binding sites thereby altering the normal regulation of a given gene.
	Therefore, it is helpful to develop a tool enabling the researchers to predict which of the SNPs could really impact the regulation of a target gene. At the present, there are several available databases and algorithms able to predict potential binding sites in the 3’UTR of genes. The scanning algorithms are based on sequence complementarity between the mature miRNA and the target site, the binding free energy of the miRNA–target duplex, the evolutionary conservation of the target site sequence, and the target position in aligned UTRs of homologous genes (John et al., 2004). However, each algorithm gives different predictions and none of them gives, for each polymorphism, a direct measurement of the biological impact. We propose an approach allowing the ranking of each polymorphism falling within 3’UTRs, from the most biologically neutral to the most likely affecting the miRNA binding site. The method is based on a re-elaboration of predictions from pre-existing well-established algorithms. The putative miRNA binding sites are identified by means of the following algorithms: miRBase, miRanda, PicTar, Diana-MicroT, and TargetScanS (for all of them the default parameters were used). The method uses the output from all the algorithms and does not discharge any prediction, preventing any loss of information. In particular, we selected 140 genes candidate for colorectal cancer (CRC). These genes derived from a genome-wide study in which 20,857 transcripts from 18,191 human genes were sequenced in 11 colorectal cancer specimens and were identified a total of 140 somatically mutated genes (called CAN-genes and reported in table 1), thought to be crucial for the cancer development (Wood et al., 2007).
	For all the 140 CAN-genes mutated in CRC, the putative miRNA binding sites were identified by means of the following specialized algorithms: miRBase, miRanda, PicTar, MicroInspector, Diana-MicroT, and TargetScanS (for all of them the default parameters were used). • miRBase (http://microrna.sanger.ac.uk/index.shtml) is a database shared in three parts: miRBase Registry includes the microRNA gene nomenclature; miRBase Sequence is the primary online repository for miRNA sequences data and annotation; miRBase Targets is a comprehensive new database of predicted miRNA target genes. • miRAnda (http://www.microrna.org/) is an algorithm that considers the sequence complementarity between the mature miRNA and the target site, binding free energy of the miRNA-target duplex, and the evolutionary conservation of the target position in aligned UTRs of homologous genes. • PicTar (http://pictar.mdc-berlin.de/) computes a maximum likelihood score that a given RNA sequence (3’ UTR region) is targeted by a fixed set of microRNAs. The output consists of the predicted miRNA binding sites (only the seed match is shown), highlighted in yellow onto the multispecies alignment (Homo sapiens on the top). All the miRNAs analysed are reported independently on separate sections of the output. • Diana-MicroT (http://www.diana.pcbi.upenn.edu/cgi-bin/micro_t.cgi) finds microRNA/target duplexes that are conserved in humans and mice with the minimum free energy. This algorithm at the moment does not allow to insert the gene name directly. A 3’UTR sequence must be provided, and only one sequence is allowed. In this case, for all the CAN-genes mutated in CRC, we selected their 3’UTR regions defined as transcribed sequences from the stop codon to the end of the last exon of each gene. • TargetScan Human 5.1 (http://www.targetscan.org/) searches the 3’UTRs for segments of perfect Watson-Crick complementarity to bases 2-8 of the miRNA (numbered from the 5' end) and assigns a binding free energy of the miRNA-target duplex, given an internal database of miRNAs and UTR sequences. The predicted putative miRNA binding sites were screened for the presence of SNPs, by an extensive search in the SNP database (dbSNP; http://www.ncbi.nlm.nih.gov/SNP/). In our case on 140 genes, 109 genes did not bear SNPs within the predicted miRNA binding sites. Within the remaining 39 genes, we found 61 SNPs. As criterion for polymorphisms selection, we excluded the SNPs having the minor allele frequency (MAF) lower than 0.24 in Caucasians. As second criterion of selection, we excluded all the miRNAs predicted in steps 1-5 not expressed in colorectal cells. The list of microRNAs expressed in colorectal cell has been taken from a study of Cummins et al (2006). With this second selection criteria remained only 37 SNPs in 37 different targets sites. For the selected SNPs, the algorithm RNAcofold (http://rna.tbi.univie.ac.at/cgi-bin/RNAcofold.cgi) was run to assess the Gibbs binding free energy (ΔG, expressed in KJ/mol), both for the common and the variant alleles. The algorithm RNAcofold computes the hybridization energy and base-pairing pattern of two RNA sequences. The difference of the free energies between the two alleles was computed as “variation of ΔG” (i.e. ΔΔG). More negative values correspond to higher stability of the RNA: miRNA duplex. The typical result is quite heterogeneous for each predicted target. For some genes the same sequence is predicted to be targeted by several miRNAs, whereas for others only one miRNA may be predicted. Given the fact that the predictions are based on probabilistic calculations, at least theoretically, one polymorphic target binding more miRNAs should weight more than those targets binding one or few miRNAs. In fact, the more miRNAs are predicted to bind to a given target, the more likely it is that at least one of them truly binds the target. In order to account for these different weights, as parameter for predicting the biological impact of each polymorphism, the sum of the absolute values of ΔΔGs should be used for each SNP (i.e. \|ΔΔG\|tot = Σ \|ΔΔG\|).
	In order to give a priority list of SNPs having an impact on miRNA binding, we ranked the values of \|ΔΔG\|tot and classified the SNPs in three groups corresponding to three tertiles. The first tertile (\|ΔΔG\|tot≥4.75 KJ/mol) is composed by SNPs having a predicted high impact on the biology of the miRNA binding sites). The second tertile (1.46 KJ/mol >\|ΔΔG\|tot>4.75 KJ/mol) is composed by SNPs with a predicted mild biological activity, whereas within the third tertile (\|ΔΔG\|tot≤1.46 KJ/mol) belong SNPs with, likely, a weak activity. These SNPs fall within genes found somatically mutated in CRC and considered important for its development, following a wide-genome mutation scan. Therefore, it is likely that these SNPs, if affecting the gene regulation, could impact the lifetime individual’s risk of cancer in humans.

	HOME

last update: 18/06/2009