Bioinformatics/기타

[BLAST] 여러 서열의 local blast 결과 정리하기

김해김씨99대손 2022. 12. 20. 14:54

- 수정 2023.04.12

 

 

 

Streptococcus와 Staphylococcus 속에서 NA 뜬 친구들의 blast결과를 보고자 했다. Streptococcus NA는 무려 1000개나 되어서 언제다 일일이 blast 돌리나 했는데, 그냥 fasta파일 형식으로 변환 후 돌리면 된다. 일단 아래 예시 서열로 돌려보자.

 

 

🟦 blast 돌리기 

| 예시 파일

- ID가 NR_025000.1와 NR_0250002(가상의 서열)인 서열이 있다.

16S_query2.fa
0.00MB

>NR_025000.1 Mycobacterium kubicae strain CDC 941078 16S ribosomal RNA, partial sequence
GTGCTTAACACATGCAAGTCGAACGGAAAGGCCCCTTCGGGGGTACTCGAGTGGCGAACGGGTGAGTAACACGTGGGTGA
TCTACCCTGCACTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATAGGACCATGAGATGCATGTCTTATGGTGGA
AAGCTTTTGCGGTGTGGGATGGGCCCGCGGCCTATCAGCTTGTTGGTGGGGTGACGGCCTACCAAGGCGACGACGGGTAG
CCGGCCTGAGAGGGTGTCCGGCCACACTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTG
CACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCAGGGAC
GAAGCGCAAGTGACGGTACCTGCAGAAGAAGCACCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGC
GTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTTTGTCGCGTTGTTCGTGAAAACCGGGGGCTTAACCCTCGG
CGTGCGGGCGATACGGGCAGACTGGAGTACTGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCA
GGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCAGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATT
AGATACCCTGGTAGTCCACGCCGTAAACGGTGGGTACTAGGTGTGGGTTTCCTTCCTTGGGATCCGTGCCGTAGCTAACG
CATTAAGTACCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAG
CATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCACAGGACGCGTCTAGAGATAGGCGTTCC
CTTGTGGCCTGTGTGCAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGC
AACCCTTGTCTCATGTTGCCAGCGGGTAATGCCGGGGACTCGTGAGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGA
TGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTTCACACATGCTACAATGGCCGGTACAAAGGGCTGCGATGCCGCG
AGGTTAAGCGAATCCTTTTAAAGCCGGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAG
TAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGG


>NR_0250002 
GTGCTTAACACATGCAAGTCGAACGGAAAGGCCCCTTCGGGGGTACTCGAGTGGCGAACGGGTGAGTAACACGTGGGTGA
TCTACCCTGCACTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATAGGACCATGAGATGCATGTCTTATGGTGGA
AAGCTTTTGCGGTGTGGGATGGGCCCGCGGCCTATCAGCTTGTTGGTGGGGTGACGGCCTACCAAGGCGACGACGGGTAG
CCGGCCTGAGAGGGTGTCCGGCCACACTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTG
CACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCAGGGAC
GAAGCGCAAGTGACGGTACCTGCAGAAGAAGCACCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGC
GTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTTTGTCGCGTTGTTCGTGAAAACCGGGGGCTTAACCCTCGG
CGTGCGGGCGATACGGGCAGACTGGAGTACTGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCA
GGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCAGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATT
AGATACCCTGGTAGTCCACGCCGTAAACGGTGGGTACTAGGTGTGGGTTTCCTTCCTTGGGATCCGTGCCGTAGCTAACG
CATTAAGTACCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAG
CATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCACAGGACGCGTCTAGAGATAGGCGTTCC
CTTGTGGCCTGTGTGCAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGC
AACCCTTGTCTCATGTTGCCAGCGGGTAATGCCGGGGACTCGTGAGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGA
TGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTTCACACATGCTACAATGGCCGGTACAAAGGGCTGCGATGCCGCG
AGGTTAAGCGAATCCTTTTAAAGCCGGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAG
TAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGTT

 

| blast돌리기 

- Top 5개만 보고, 결과 파일을 result.csv에 저장하자 

blastn -db ~/Reference/blastdb/16S_ribosomal_RNA -query 16S_query2.fa -task blastn -dust no -outfmt "7 delim=, qacc sacc evalue bitscore qcovus pident sscinames" -max_target_seqs 5 > result.csv

 

결과는 아래와 같다 .

# BLASTN 2.13.0+
# Query: NR_025000.1 Mycobacterium kubicae strain CDC 941078 16S ribosomal RNA, partial sequence
# Database: /home/ksy/Reference/blastdb/16S_ribosomal_RNA
# Fields: query acc., subject acc., evalue, bit score, % query coverage per uniq subject, % identity, subject sci names
# 5 hits found
NR_025000.1,NR_025000,0.0,2383,100,100.000,Mycobacterium kubicae
NR_025000.1,NR_028940,0.0,2334,100,99.243,Mycobacterium palustre
NR_025000.1,NR_125568,0.0,2320,100,98.940,Mycobacterium europaeum
NR_025000.1,NR_025760,0.0,2302,100,98.637,Mycobacterium parascrofulaceum
NR_025000.1,NR_118110,0.0,2302,100,98.637,Mycobacterium parascrofulaceum ATCC BAA-614
# BLASTN 2.13.0+
# Query: NR_0250002
# Database: /home/ksy/Reference/blastdb/16S_ribosomal_RNA
# Fields: query acc., subject acc., evalue, bit score, % query coverage per uniq subject, % identity, subject sci names
# 5 hits found
NR_0250002,NR_025000,0.0,2379,99,100.000,Mycobacterium kubicae
NR_0250002,NR_028940,0.0,2331,99,99.242,Mycobacterium palustre
NR_0250002,NR_125568,0.0,2316,99,98.939,Mycobacterium europaeum
NR_0250002,NR_025760,0.0,2298,99,98.635,Mycobacterium parascrofulaceum
NR_0250002,NR_118110,0.0,2298,99,98.635,Mycobacterium parascrofulaceum ATCC BAA-614
# BLAST processed 2 queries

 

왼쪽부터 검색한 서열의 ID, qacc(Query accesion), sacc(Subject accession), evalue, bitscore, qcovus(Query Coverage), pident(Percentage of identical matches), sscinames(Subject Scientific Name(s), separated by a ';')이다.

각 용어에 대한 자세한 뜻은 https://www.metagenomics.wiki/tools/blast/blastn-output-format-6 를 참고해 주세요

 

 

🟦 결과 정리하기 

| 결과 요약하기

내가 보고 싶은 건 # 내용이 없는 결과 파일이다. 간단히 #으로 시작하는 줄을 제외한(-v; invert 옵션) 데이터를 출력해 주면 된다.

grep -v '#'  result.csv

# NR_025000.1,NR_025000,0.0,2383,100,100.000,Mycobacterium kubicae
# NR_025000.1,NR_028940,0.0,2334,100,99.243,Mycobacterium palustre
# NR_025000.1,NR_125568,0.0,2320,100,98.940,Mycobacterium europaeum
# NR_025000.1,NR_025760,0.0,2302,100,98.637,Mycobacterium parascrofulaceum
# NR_025000.1,NR_118110,0.0,2302,100,98.637,Mycobacterium parascrofulaceum ATCC BAA-614
# NR_0250002,NR_025000,0.0,2379,99,100.000,Mycobacterium kubicae
# NR_0250002,NR_028940,0.0,2331,99,99.242,Mycobacterium palustre
# NR_0250002,NR_125568,0.0,2316,99,98.939,Mycobacterium europaeum
# NR_0250002,NR_025760,0.0,2298,99,98.635,Mycobacterium parascrofulaceum
# NR_0250002,NR_118110,0.0,2298,99,98.635,Mycobacterium parascrofulaceum ATCC BAA-614

 

 

 

 


| 참고 

https://www.metagenomics.wiki/tools/blast/blastn-output-format-6

반응형