您的位置:首页 > 家用电器 > 厨房电器 > 【引用】BLAST+实战

【引用】BLAST+实战

luyued 发布于 2011-04-13 00:15   浏览 N 次  
BLAST+实战
2010-04-29 16:08
http://hi.baidu.com/lidaof/blog/item/9d29eade7f340a5895ee3724.html

项目要求,开始使用BLAST+

我们从格式化数据库开始

1,我先获得了一个植物rRNA的序列文件,为fasta格式,用makeblastdb格式化之,如下

makeblastdb -in plant26.rna.fna -dbtype nucl -parse_seqids -hash_index -out plant_rna

注意:在BLAST+2.2.24中 不加参数 -parse_seqids ,-hash_index 不然成死循环


Building a new DB, current time: 04/29/2010 16:09:11
New DB name: plant_rna
New DB title: plant26.rna.fna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 6747 sequences in 0.546889 seconds.

此时目录下还是如往常一下生成如下文件

plant_rna.nhr plant_rna.nin plant_rna.nnd plant_rna.nni plant_rna.nsd plant_rna.nsi plant_rna.nsq

2,我们有如下一个fasta序列想做blast,文件如下

more test
>gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probab
le (VaoD) mRNA, complete cds
ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGACTCACCAAACAGAACTACGC
CGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACGTACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTG
ATCTTCAAGCGCCAGTGAGACCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT
CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATCGATAATCTTGTGTTGGCTTT
AAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTTGAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCG
CGGTTGTCGTGTCGAGTAGTGTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC
TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATACAAGGAATATTTGCAAGATTT
CATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAATTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACG
CGATAAGAATCACACTGAACTCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC
ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGCATACTAGTAGCTTATCCAGA
ATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATTGACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGAC
AAGCACTCGATGAGCAATTCAATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG
TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTACATACAATAG

3,blast搜索命令格式如下:

blastn -db plant_rna -query test -out test.blastn

4,我们来看看输入文件test.blastn的内容

BLASTN 2.2.23+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.

Database: plant26.rna.fna
6,747 sequences; 8,392,753 total letters

Query= gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901
vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
Length=1029
Score E
Sequences producing significant alignments: (Bits) Value

ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar AT... 1901 0.0


>ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit
D, probable (VaoD) mRNA, complete cds
Length=1029

Score = 1901 bits (1029), Expect = 0.0
Identities = 1029/1029 (100%), Gaps = 0/1029 (0%)
Strand=Plus/Plus

Query 1 ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGA 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 1 ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGA 60

Query 61 CTCACCAAACAGAACTACGCCGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACG 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 61 CTCACCAAACAGAACTACGCCGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACG 120

Query 121 TACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTGATCTTCAAGCGCCAGTGAGA 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 121 TACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTGATCTTCAAGCGCCAGTGAGA 180

Query 181 CCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT 240
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 181 CCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT 240

Query 241 CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATC 300
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 241 CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATC 300

Query 301 GATAATCTTGTGTTGGCTTTAAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTT 360
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 301 GATAATCTTGTGTTGGCTTTAAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTT 360

Query 361 GAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCGCGGTTGTCGTGTCGAGTAGT 420
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 361 GAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCGCGGTTGTCGTGTCGAGTAGT 420

Query 421 GTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC 480
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 421 GTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC 480

Query 481 TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATAC 540
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 481 TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATAC 540

Query 541 AAGGAATATTTGCAAGATTTCATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAA 600
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 541 AAGGAATATTTGCAAGATTTCATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAA 600

Query 601 TTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACGCGATAAGAATCACACTGAAC 660
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 601 TTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACGCGATAAGAATCACACTGAAC 660

Query 661 TCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC 720
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 661 TCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC 720

Query 721 ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGC 780
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 721 ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGC 780

Query 781 ATACTAGTAGCTTATCCAGAATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATT 840
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 781 ATACTAGTAGCTTATCCAGAATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATT 840

Query 841 GACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGACAAGCACTCGATGAGCAATTC 900
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 841 GACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGACAAGCACTCGATGAGCAATTC 900

Query 901 AATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG 960
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 901 AATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG 960

Query 961 TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTAC 1020
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 961 TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTAC 1020

Query 1021 ATACAATAG 1029
|||||||||
Sbjct 1021 ATACAATAG 1029

Lambda K H
1.33 0.621 1.12

Gapped
Lambda K H
1.28 0.460 0.850

Effective search space used: 8286997432


Database: plant26.rna.fna
Posted date: Apr 29, 2010 4:09 PM
Number of letters in database: 8,392,753
Number of sequences in database: 6,747

Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 0

5,我们可以看出,如果不加任何参数,和以前的输出差不多相同,大家还是可以使用以前写的bioperl或biopython parser进行结果的分析

6,下面我们看看增加输出参数呢,我们使用-outfmt 7作为控制输出结果,其中选择性输出部分内容,命令和输出如下:

blastn -db plant_rna -query test -outfmt "7 qacc sacc evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query acc., subject acc., evalue, alignment length, % identity
# 1 hits found
gi|145342129|ref|XM_001416109.1| XM_001416109 0.0 1029 100.00
# BLAST processed 1 queries

这个文件是自解释的,大家可以看得很清楚每项是什么意思,呵呵,其他输出的一些部分特性如下:

blastn -db plant_rna -query test -outfmt "7 qacc sacc evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query acc., subject acc., evalue, alignment length, % identity
# 1 hits found
gi|145342129|ref|XM_001416109.1| XM_001416109 0.0 1029 100.00
# BLAST processed 1 queries

blastn -db plant_rna -query test -outfmt "7 qgi sgi evalue length pident" # BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query gi, subject gi, evalue, alignment length, % identity
# 1 hits found
0 145342129 0.0 1029 100.00
# BLAST processed 1 queries

blastn -db plant_rna -query test -outfmt "7 qid sid evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: evalue, alignment length, % identity
# 1 hits found
0.0 1029 100.00
# BLAST processed 1 queries

blastn -db plant_rna -query test -outfmt "7 qseqid sseqid evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query id, subject id, evalue, alignment length, % identity
# 1 hits found
gi|145342129|ref|XM_001416109.1| gi|145342129|ref|XM_001416109.1| 0.0 1029 100.00
# BLAST processed 1 queries

blastn -db plant_rna -query test -outfmt "7 qframe sseqid evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query frame, subject id, evalue, alignment length, % identity
# 1 hits found
1 gi|145342129|ref|XM_001416109.1| 0.0 1029 100.00


广告赞助商