- 수정 2023.04.12
Streptococcus와 Staphylococcus 속에서 NA 뜬 친구들의 blast결과를 보고자 했다. Streptococcus NA는 무려 1000개나 되어서 언제다 일일이 blast 돌리나 했는데, 그냥 fasta파일 형식으로 변환 후 돌리면 된다. 일단 아래 예시 서열로 돌려보자.
🟦 blast 돌리기
| 예시 파일
- ID가 NR_025000.1와 NR_0250002(가상의 서열)인 서열이 있다.
>NR_025000.1 Mycobacterium kubicae strain CDC 941078 16S ribosomal RNA, partial sequence
GTGCTTAACACATGCAAGTCGAACGGAAAGGCCCCTTCGGGGGTACTCGAGTGGCGAACGGGTGAGTAACACGTGGGTGA
TCTACCCTGCACTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATAGGACCATGAGATGCATGTCTTATGGTGGA
AAGCTTTTGCGGTGTGGGATGGGCCCGCGGCCTATCAGCTTGTTGGTGGGGTGACGGCCTACCAAGGCGACGACGGGTAG
CCGGCCTGAGAGGGTGTCCGGCCACACTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTG
CACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCAGGGAC
GAAGCGCAAGTGACGGTACCTGCAGAAGAAGCACCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGC
GTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTTTGTCGCGTTGTTCGTGAAAACCGGGGGCTTAACCCTCGG
CGTGCGGGCGATACGGGCAGACTGGAGTACTGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCA
GGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCAGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATT
AGATACCCTGGTAGTCCACGCCGTAAACGGTGGGTACTAGGTGTGGGTTTCCTTCCTTGGGATCCGTGCCGTAGCTAACG
CATTAAGTACCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAG
CATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCACAGGACGCGTCTAGAGATAGGCGTTCC
CTTGTGGCCTGTGTGCAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGC
AACCCTTGTCTCATGTTGCCAGCGGGTAATGCCGGGGACTCGTGAGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGA
TGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTTCACACATGCTACAATGGCCGGTACAAAGGGCTGCGATGCCGCG
AGGTTAAGCGAATCCTTTTAAAGCCGGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAG
TAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGG
>NR_0250002
GTGCTTAACACATGCAAGTCGAACGGAAAGGCCCCTTCGGGGGTACTCGAGTGGCGAACGGGTGAGTAACACGTGGGTGA
TCTACCCTGCACTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATAGGACCATGAGATGCATGTCTTATGGTGGA
AAGCTTTTGCGGTGTGGGATGGGCCCGCGGCCTATCAGCTTGTTGGTGGGGTGACGGCCTACCAAGGCGACGACGGGTAG
CCGGCCTGAGAGGGTGTCCGGCCACACTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTG
CACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCAGGGAC
GAAGCGCAAGTGACGGTACCTGCAGAAGAAGCACCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGC
GTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTTTGTCGCGTTGTTCGTGAAAACCGGGGGCTTAACCCTCGG
CGTGCGGGCGATACGGGCAGACTGGAGTACTGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCA
GGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCAGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATT
AGATACCCTGGTAGTCCACGCCGTAAACGGTGGGTACTAGGTGTGGGTTTCCTTCCTTGGGATCCGTGCCGTAGCTAACG
CATTAAGTACCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAG
CATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCACAGGACGCGTCTAGAGATAGGCGTTCC
CTTGTGGCCTGTGTGCAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGC
AACCCTTGTCTCATGTTGCCAGCGGGTAATGCCGGGGACTCGTGAGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGA
TGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTTCACACATGCTACAATGGCCGGTACAAAGGGCTGCGATGCCGCG
AGGTTAAGCGAATCCTTTTAAAGCCGGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAG
TAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGTT
| blast돌리기
- Top 5개만 보고, 결과 파일을 result.csv에 저장하자
blastn -db ~/Reference/blastdb/16S_ribosomal_RNA -query 16S_query2.fa -task blastn -dust no -outfmt "7 delim=, qacc sacc evalue bitscore qcovus pident sscinames" -max_target_seqs 5 > result.csv
결과는 아래와 같다 .
# BLASTN 2.13.0+
# Query: NR_025000.1 Mycobacterium kubicae strain CDC 941078 16S ribosomal RNA, partial sequence
# Database: /home/ksy/Reference/blastdb/16S_ribosomal_RNA
# Fields: query acc., subject acc., evalue, bit score, % query coverage per uniq subject, % identity, subject sci names
# 5 hits found
NR_025000.1,NR_025000,0.0,2383,100,100.000,Mycobacterium kubicae
NR_025000.1,NR_028940,0.0,2334,100,99.243,Mycobacterium palustre
NR_025000.1,NR_125568,0.0,2320,100,98.940,Mycobacterium europaeum
NR_025000.1,NR_025760,0.0,2302,100,98.637,Mycobacterium parascrofulaceum
NR_025000.1,NR_118110,0.0,2302,100,98.637,Mycobacterium parascrofulaceum ATCC BAA-614
# BLASTN 2.13.0+
# Query: NR_0250002
# Database: /home/ksy/Reference/blastdb/16S_ribosomal_RNA
# Fields: query acc., subject acc., evalue, bit score, % query coverage per uniq subject, % identity, subject sci names
# 5 hits found
NR_0250002,NR_025000,0.0,2379,99,100.000,Mycobacterium kubicae
NR_0250002,NR_028940,0.0,2331,99,99.242,Mycobacterium palustre
NR_0250002,NR_125568,0.0,2316,99,98.939,Mycobacterium europaeum
NR_0250002,NR_025760,0.0,2298,99,98.635,Mycobacterium parascrofulaceum
NR_0250002,NR_118110,0.0,2298,99,98.635,Mycobacterium parascrofulaceum ATCC BAA-614
# BLAST processed 2 queries
왼쪽부터 검색한 서열의 ID, qacc(Query accesion), sacc(Subject accession), evalue, bitscore, qcovus(Query Coverage), pident(Percentage of identical matches), sscinames(Subject Scientific Name(s), separated by a ';')이다.
각 용어에 대한 자세한 뜻은 https://www.metagenomics.wiki/tools/blast/blastn-output-format-6 를 참고해 주세요
🟦 결과 정리하기
| 결과 요약하기
내가 보고 싶은 건 # 내용이 없는 결과 파일이다. 간단히 #으로 시작하는 줄을 제외한(-v; invert 옵션) 데이터를 출력해 주면 된다.
grep -v '#' result.csv
# NR_025000.1,NR_025000,0.0,2383,100,100.000,Mycobacterium kubicae
# NR_025000.1,NR_028940,0.0,2334,100,99.243,Mycobacterium palustre
# NR_025000.1,NR_125568,0.0,2320,100,98.940,Mycobacterium europaeum
# NR_025000.1,NR_025760,0.0,2302,100,98.637,Mycobacterium parascrofulaceum
# NR_025000.1,NR_118110,0.0,2302,100,98.637,Mycobacterium parascrofulaceum ATCC BAA-614
# NR_0250002,NR_025000,0.0,2379,99,100.000,Mycobacterium kubicae
# NR_0250002,NR_028940,0.0,2331,99,99.242,Mycobacterium palustre
# NR_0250002,NR_125568,0.0,2316,99,98.939,Mycobacterium europaeum
# NR_0250002,NR_025760,0.0,2298,99,98.635,Mycobacterium parascrofulaceum
# NR_0250002,NR_118110,0.0,2298,99,98.635,Mycobacterium parascrofulaceum ATCC BAA-614
| 참고
https://www.metagenomics.wiki/tools/blast/blastn-output-format-6
반응형