'Bioinformatics' 카테고리의 글 목록 (17 Page)

PERMANOVA (Permutational analysis of variance) 란?

2022.05.23· Bioinformatics/└ 기타

수정: 24.10.15- 예제데이터와 분석 코드 포함 1. Beta diversity란?마이크로바이옴 연구에서 beta diversity란 미생물 군집의 다양성을 분석하기 위한 단계이며, 이는 각 샘플 혹은 샘플이 포함된 집단(질환군 vs. 정상대조군) 사이의 차이를 측정하여 얻어진다. 가장 많이 사용되는 방법은 각 비교 집단 간 유사성 혹은 비유사성을 사용하며, 미생물의 풍부도, 미생물의 계통 간 거리가 사용되기도 한다. 2. Beta diversity 분석 방법가장 대표적인 접근 방법은 총 4가지로 나눌 수 있다. - Bray-Curtis dissmilarity: 샘플 간 미생물 분포의 차이에 따라 측정 - Jaccard index: 집단 간 미생물의 존재/부존재에 따라 측정- weighted..

ANOSIM(Analysis of similarities)이란?

2022.05.19· Bioinformatics/└ 기타

qiime에서 통계적인 유의성을 보기 위해 아래와 같은 diversity beta-group-significance 를 실행하였다. qiime diversity beta-group-significance \ --i-distance-matrix ~\ --m-metadata-file ~ \ --m-metadata-column ~ \ --p-method anosim \ --output-dir ~ 위 함수의 결과 파일은 아래와 같다. 이 결과파일을 어떻게 해석하는것인지 알아보자. | ANOSIM 📌 ANOSIM 이란? (위키백과) Analysis of similarities (ANOSIM) is a non-parametric(비모수성) statistical test widely used in the field ..

[QIIME2] qiime tools import type 정리

2022.05.17· Bioinformatics/└ Qiime2

qiime tools import --show-importable-types ortable-types Bowtie2Index DeblurStats DistanceMatrix EMPPairedEndSequences EMPSingleEndSequences ErrorCorrectionDetails FeatureData[AlignedProteinSequence] FeatureData[AlignedRNASequence] FeatureData[AlignedSequence] FeatureData[BLAST6] FeatureData[Differential] FeatureData[Importance] FeatureData[PairedEndRNASequence] FeatureData[PairedEndSequence] Fe..

[QIIME2 튜토리얼] “Moving Pictures” (1)

2022.05.16· Bioinformatics/└ Qiime2

2024.04.18.업데이트 마이크로바이옴을 공부하면 아마 가장 먼저 배우게 되는 것이 이 QIIME2의 사용법입니다. Moving pictures tutorial을 참고하여 각 단계별로 세세하게 알아봅시다.🙉 분석 데이터 관찰하기- QIIME tutorial 홈페이지: https://docs.qiime2.org/2024.2/tutorials/moving-pictures/- 관련 영상: https://www.youtube.com/watch?v=RcdTZE8VbJg&list=PLOCEVoX6zu2Ii8RD7i9Oi7Pbot_5WF08n QIIME2의 moving-picture tutorial에서 사용된 데이터는 사람의 마이크로바이옴 데이터입니다. 이 데이터는 항생제 사용에 관하여 두 명의 대..

depth, coverage, sequencing depth, depth of coverage, breadth of coverage의 차이

2022.05.16· Bioinformatics/Sequencing data

Depth 하나의 뉴클래오티드 위치에 어떤 염기가 시퀀싱되어 나타나는 횟수를 말함 Coverage sequence read와 reference간의 얼마나 align되는지 말한다 위의 이미지처럼 6개의 read(총 188nt)가 달라 붙었을때 => coverage를 따질때 3가지 관점에서 볼 수 있다 1) whole genome관점 : 전체 112nt 중에서 188nt가 붙었으니 → 188/112 → 1.68 번 접혔다 2) mapping된 46nt 관점에서 : 188/46 → 4.09 fold () +) 추가적으로 6개의 read들이 CTGTGCAATTGCTGA를 공유하니 15/46 → 32.6%의 coverage at 6x depth라고 적을 수 있다 3) 한 염기 관점에서(G) : G가 6개의read..

Alignment와 Assignment의 차이

2022.05.13· Bioinformatics/Sequencing data

Sequence Alignment 시퀀싱된 서열을 공통 부분을 찾아 정렬한것, 즉 더 긴 서열을 만들기 위해 조각조각 이어 붙인것이다 이래와 같이 총 3가지 방법이 있다 (a) 는 전체서열에 맞추어 alignment (c) 는 더 유사한 부분(좁은 부분)을 위주로 alignment한다 alignment software로는 ClustalW2와 BLAST등이 있다 Assignment taxanomy 우리가 가진 sequence와 reference database가 가진sequence과 각각의 taxanomy정보를 이용하여 내가가진 서열이 어떤 속, 종에 속하는지 동정하는것 Reference - https://en.wikipedia.org/wiki/Sequence_alignment - Ahmed, N., Lé..

[Rosalind] Computing GC Content

2022.05.12· Bioinformatics/Rosalind

Problem DNA서열 "AGCTATAG"의 GC비율을 37.5%= 3/8*100 가장 높은 GC-콘텐츠를 가진 문자열의 ID와 그 문자열의 GC-콘텐츠를 반환합니다 Sample Dataset >Rosalind_6404 nCCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG >Rosalind_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACG >Rosalind_0808 CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGT..

Merge, Assemly와 Binning의 차이

2022.05.11· Bioinformatics/Sequencing data

수정: 2024.01.26 Merge란? - 일루미나 시퀀서의 결과물은 하나의 서열을 앞, 뒤로 읽어 paired-end 결과물을 생산한다. 이 서열을 중복되는 영역으로 합쳐서 온전한 하나의 서열을 만들어내는 과정을 merge라고 한다. - merge되지 전 read들을 forward, reverse read라고 하며, merge된 수의 서열도 read혹은 sequence라고 부른다. - long read 시퀀서를 사용할때는 위의 과정이 필요하지 않다. Assembly란? - Assembly는 merge된 혹은 merge되지 않은 하나의 read를 긴 서열로 병합해 과는 과정이다. - 1차적으로, read를 바탕으로 assembly를 진행 후 만들어진 더 긴 서열을 Contig 라고 한다. - 2차적으로..

[QIIME2] Window10 WSL(ubuntu)에 Miniconda3와 QIIME2 깔기

2022.05.10· Bioinformatics/└ Qiime2

23.04.25 수정 | WSL을 이용한 qiime2설치 영상 일단 위 유튜브를 기본으로 따라 했다그래도 오류가 나서 4번의 시도끝에 깔렸다 심지어 설치후 창을 껐다가 다시 키니까 conda명령어가 수행이 안됐다결국 우당탕탕 여차저차 설치완료 | 1. 준비윈도우 설정 바꾸기 1) Win+R -> "OptionalFeatures" 입력 및 확인2) Linux용 Windows 하위 시스템 , 가상머신 -> 체크박스 선택 우분투 다운로드 1) 마이크로소프트 앱스토어에서 WSL버전 ubuntu(22.04) 다운로드2) 설치 기다린 후 username, pw 입력3) ubuntu 업그레이드 sudo apt-get updatesudo apt-get upgrade | 2. Miniconda 설치 Minico..

[Rosalind] Translating RNA into Protein

2022.05.01· Bioinformatics/Rosalind

Problem The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings. The RNA codon table dictates the details regarding the encoding of specific cod..

[Rosalind] Finding a Motif in DNA (Python/R)

2022.05.01· Bioinformatics/Rosalind

2023.06.07 R풀이 추가 | Problem 두 개의 서열을 준다 나머지 서열 하나가 그보다 더 긴 서열에 매치가 되면, 그 매치된 자리의 위치를 출력하다(순서는 왼 -> 오) | 예제데이터와 결과 Sample Dataset GATATATGCATATACTT ATAT Sample Output 2 4 10 | Python with open('rosalind_subs.txt', 'r') as f : s = f.readline() t = f.readline() t_num = len(t) tt = t[0:t_num-1] # '\n' 제거 for i in range(len(s)) : if s[i:i+t_num-1] == tt : print(i+1,end=" ") 추천수 많이 받은 답 by Leandro Lim..

[Rosalind] Counting Point Mutations (Python/R)

2022.04.21· Bioinformatics/Rosalind

2023.06.07 R 풀이 추가 https://rosalind.info/problems/hamm/ (이 전의 문제를 풀어야만 풀이가 가능합니다) | Problem 같은 길이의 두 DNA 서열이 주어질때 각기 서로 다른서열의 자릿수는 어떠한가? | 예제 데이터와 결과 Sample Dataset GAGCCTACTAACGGGAT CATCGTAATGACGGCCT Samplpe output 7 | Python 내 풀이 with open('rosalind_hamm.txt', 'r') as f : s = f.readline() t = f.readline() count = 0 for i in range(len(s)) : if s[i] != t[i] : count += 1 print(count) 추천 많이 받은 풀이 ..

티스토리툴바