Friday 16 July 2010

BIOINFORMATICS- AN INTRODUCTION

Hai guys,
In this post , i am going to write some basic introduction regarding bioinformatics.
The first step of the human genome sequencing project has involved identifying those DNA sequences which code for proteins synthesised by the cell and hence which define its function. For this reason researchers have been isolating the mRNAs expressed in cells, converting them to DNA (cDNA), cloning them and sequencing them in their 1000s. These expressed sequences are given the name of EST (Expressed Sequence Tag). As the sequencing of the human genome progresses the function of more and more DNA/protein sequences are identified. Therefore, it is now routine to generate many ESTs and then to compare them with sequences in the databases to determine their function.

Let us take this EST which is given below:

1 gaacgacctctctcaggcttagcctgggctgtagctatgataaaccggcaggagattggt ggacctcgctcttataccatcgcagttgcttccctgggtaaaggagtggcctgtaatcct gcctgcttcatcacacagctcctccctgtgaaaaggaagctagggttctatgaatggact tcaaggttaagaagtcacataaatcccacaggcactgttttgcttcagctagaaaataca
This part of the practical illustrates how to compare DNA sequences against the DNA databanks using BLAST (Basic Local Alignment Search Tool) to find out what they code for.

1.Copy the relevant sequence onto the clipboard.
2.Click here to go to Sequence Search at NCBI (http://www.ncbi.nlm.nih.gov/BLAST):
3.An NCBI BLAST page will be returned .In that choose the option such as below:
4.Choose the ‘nucleotide blast’ option.
5.A page will be returned with a query window.
6.Paste the sequence into the window.
7.Under ‘Choose Search Set’ set to ‘Nucleotide collection (nr/nt)’.
8.Press the BLAST button and wait while the sequence is compared to the databases and the matches displayed.
9.A page is returned indicating that the ID request and that the search has been placed in a queuing system.
10.Carefully study the output.

INTERPRETATION OF BLAST RESULTS
1. A page will be returned with the results from BLAST search. These are presented both graphically and textually.

2. The graphical view shows the query sequence as a thick red line with base numbers attached to it. Below this are a series of coloured thin lines that represent matches to the query sequence. The length of the line indicates that part of the query sequence which matches the hit sequence. The colour represents the quality of the match.

3. The textual view is found below the picture. It is a list of files that correspond to the visual display matches and sorted in match quality order.
The first hyperlink is to the file containing the entire sequence. If you click on this hyperlink you can view the actual sequence file and information on the sequence it contains (e.g. what it codes for, organism source, repeat regions if any etc ,
4. To display a FASTA version of the file, in the drop down menu next to ‘Display’, click on FASTA and then click ‘Display’. The file for the same gene is now in FASTA format – a format that is used as an input for many different computer programs.
5. This is followed by a very brief description of the file. This description is taken from the sequence file.

6. The next number is a numerical score (Score - bits) which represents a statistical measure of how good the match was (the greater the score the better the match). This score is hyperlinked to the actual match found between your query sequence and the match itself.
Finally the Expect value (E) is a parameter that describes the number of hits one can "expect" to see just by chance when searching a database of a particular size. Essentially, it describes the random background noise that exists for matches between sequences

Exercise for practice
If you want to practice some query sequence,you can just try these sequences given below.
1.tgtgttttatgtcttctacgaacagtacctgaccatcattgacgacactatcttcaacct cggtgtgtccctgggcgcgatatttctggtgaccatggtcctcctgggctgtgagctctg gtctgcagtcatcatgtgtgccaccatcgccatggtcttggtcaacatgtttggagttat gtggctctggggcatcagtctgaacgctgtatccttggtcaacctggtgatg

2.tggtccctatgggcttccgcacatgccgcgggcggccaggcaacgtgcgtgtctctgcca tgtggcagaagtgctctttgtggcagtggccaggcagggagtgtctgcagtcctggtggg gctgagcctgaggccttccagaaagcaggagcagctgtgctgcaccccatgtgggtgacc aggtcctttctcctgatagtcacctgctggttgttgccaggttgcagctgctcttgcatc

3.cccaaatgaagtgtgaacgtgatgttttcggatgcaaactcagctcagggattcattttg tgtcttagttttatatgcatccttatttttaatacacctgcttcacgtccctatgttggg aagtccatatttgtctgcttttcttgcagcatcatttccttacaatactgtccggtggac aaaatgacaattgatatgtttttctgatataattactttagctgcactaacagtacaatg