【引用】BLAST+实战
luyued 发布于 2011-04-13 00:15 浏览 N 次2010-04-29 16:08
http://hi.baidu.com/lidaof/blog/item/9d29eade7f340a5895ee3724.html
项目要求,开始使用BLAST+
我们从格式化数据库开始
1,我先获得了一个植物rRNA的序列文件,为fasta格式,用makeblastdb格式化之,如下
makeblastdb -in plant26.rna.fna -dbtype nucl -parse_seqids -hash_index -out plant_rna
注意:在BLAST+2.2.24中 不加参数 -parse_seqids ,-hash_index 不然成死循环
Building a new DB, current time: 04/29/2010 16:09:11
New DB name: plant_rna
New DB title: plant26.rna.fna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 6747 sequences in 0.546889 seconds.
此时目录下还是如往常一下生成如下文件
plant_rna.nhr plant_rna.nin plant_rna.nnd plant_rna.nni plant_rna.nsd plant_rna.nsi plant_rna.nsq
2,我们有如下一个fasta序列想做blast,文件如下
more test
>gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probab
le (VaoD) mRNA, complete cds
ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGACTCACCAAACAGAACTACGC
CGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACGTACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTG
ATCTTCAAGCGCCAGTGAGACCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT
CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATCGATAATCTTGTGTTGGCTTT
AAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTTGAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCG
CGGTTGTCGTGTCGAGTAGTGTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC
TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATACAAGGAATATTTGCAAGATTT
CATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAATTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACG
CGATAAGAATCACACTGAACTCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC
ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGCATACTAGTAGCTTATCCAGA
ATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATTGACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGAC
AAGCACTCGATGAGCAATTCAATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG
TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTACATACAATAG
3,blast搜索命令格式如下:
blastn -db plant_rna -query test -out test.blastn
4,我们来看看输入文件test.blastn的内容
BLASTN 2.2.23+
Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.
Database: plant26.rna.fna
6,747 sequences; 8,392,753 total letters
Query= gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901
vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
Length=1029
Score E
Sequences producing significant alignments: (Bits) Value
ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar AT... 1901 0.0
>ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit
D, probable (VaoD) mRNA, complete cds
Length=1029
Score = 1901 bits (1029), Expect = 0.0
Identities = 1029/1029 (100%), Gaps = 0/1029 (0%)
Strand=Plus/Plus
Query 1 ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGA 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 1 ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGA 60
Query 61 CTCACCAAACAGAACTACGCCGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACG 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 61 CTCACCAAACAGAACTACGCCGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACG 120
Query 121 TACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTGATCTTCAAGCGCCAGTGAGA 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 121 TACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTGATCTTCAAGCGCCAGTGAGA 180
Query 181 CCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT 240
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 181 CCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT 240
Query 241 CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATC 300
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 241 CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATC 300
Query 301 GATAATCTTGTGTTGGCTTTAAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTT 360
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 301 GATAATCTTGTGTTGGCTTTAAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTT 360
Query 361 GAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCGCGGTTGTCGTGTCGAGTAGT 420
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 361 GAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCGCGGTTGTCGTGTCGAGTAGT 420
Query 421 GTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC 480
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 421 GTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC 480
Query 481 TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATAC 540
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 481 TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATAC 540
Query 541 AAGGAATATTTGCAAGATTTCATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAA 600
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 541 AAGGAATATTTGCAAGATTTCATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAA 600
Query 601 TTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACGCGATAAGAATCACACTGAAC 660
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 601 TTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACGCGATAAGAATCACACTGAAC 660
Query 661 TCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC 720
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 661 TCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC 720
Query 721 ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGC 780
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 721 ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGC 780
Query 781 ATACTAGTAGCTTATCCAGAATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATT 840
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 781 ATACTAGTAGCTTATCCAGAATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATT 840
Query 841 GACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGACAAGCACTCGATGAGCAATTC 900
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 841 GACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGACAAGCACTCGATGAGCAATTC 900
Query 901 AATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG 960
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 901 AATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG 960
Query 961 TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTAC 1020
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 961 TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTAC 1020
Query 1021 ATACAATAG 1029
|||||||||
Sbjct 1021 ATACAATAG 1029
Lambda K H
1.33 0.621 1.12
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 8286997432
Database: plant26.rna.fna
Posted date: Apr 29, 2010 4:09 PM
Number of letters in database: 8,392,753
Number of sequences in database: 6,747
Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 0
5,我们可以看出,如果不加任何参数,和以前的输出差不多相同,大家还是可以使用以前写的bioperl或biopython parser进行结果的分析
6,下面我们看看增加输出参数呢,我们使用-outfmt 7作为控制输出结果,其中选择性输出部分内容,命令和输出如下:
blastn -db plant_rna -query test -outfmt "7 qacc sacc evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query acc., subject acc., evalue, alignment length, % identity
# 1 hits found
gi|145342129|ref|XM_001416109.1| XM_001416109 0.0 1029 100.00
# BLAST processed 1 queries
这个文件是自解释的,大家可以看得很清楚每项是什么意思,呵呵,其他输出的一些部分特性如下:
blastn -db plant_rna -query test -outfmt "7 qacc sacc evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query acc., subject acc., evalue, alignment length, % identity
# 1 hits found
gi|145342129|ref|XM_001416109.1| XM_001416109 0.0 1029 100.00
# BLAST processed 1 queries
blastn -db plant_rna -query test -outfmt "7 qgi sgi evalue length pident" # BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query gi, subject gi, evalue, alignment length, % identity
# 1 hits found
0 145342129 0.0 1029 100.00
# BLAST processed 1 queries
blastn -db plant_rna -query test -outfmt "7 qid sid evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: evalue, alignment length, % identity
# 1 hits found
0.0 1029 100.00
# BLAST processed 1 queries
blastn -db plant_rna -query test -outfmt "7 qseqid sseqid evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query id, subject id, evalue, alignment length, % identity
# 1 hits found
gi|145342129|ref|XM_001416109.1| gi|145342129|ref|XM_001416109.1| 0.0 1029 100.00
# BLAST processed 1 queries
blastn -db plant_rna -query test -outfmt "7 qframe sseqid evalue length pident"
# BLASTN 2.2.23+
# Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds
# Database: plant_rna
# Fields: query frame, subject id, evalue, alignment length, % identity
# 1 hits found
1 gi|145342129|ref|XM_001416109.1| 0.0 1029 100.00
- 06-04· 93包青天演员表!
- 06-04· 93版《包青天》演员表(二
- 06-04· 俞小凡、王中皇为宣传
- 06-01· 老板请尊重你们的业务员
- 06-01· 这些年来做销售见过的那
- 06-01· 职业经理人与老板
- 06-01· 老板新开张的指甲店!
- 06-01· [转载]聪明老板不败的赌局
- 06-01· 机智的老板
- 06-01· 开发商老板找钱找情人的
- 05-24· 台湾版《包青天》演员表
- 05-21· 93《包青天》演员表
- 05-21· 台湾华视93版包青天演员表
- 05-21· 2010.1.3 鬼王之王王中皇
- 05-21· 敬请躲避老板创业100个误
- 05-07· 电器公司名字大全
- 05-07· 森乐肩颈康疗仪
- 05-07· 仙缘纪略
- 05-04· 原创摄影】江南行(八)
- 05-04· 上海S32高架道路下违法建