计算BAM文件中,单个位点的ATCG的read数量和coverage

阅读: 评论:0

计算BAM文件中,单个位点的ATCG的read数量和coverage

计算BAM文件中,单个位点的ATCG的read数量和coverage

计算BAM文件中,单个位点的ATCG的read数量和coverage

import pandas as pd
import os
import pysam
import sys# usage:
# python bam2coverage.py bamfile posfile output
#
# bamfile:
# posfile: column1, chrom; column2, position
# output:
#
# position is starting with 1bamfile = sys.argv[1]
posfile = sys.argv[2]
outfile = sys.argv[3]def bam2matrix_single_cpu(bam, position, outfile ):tb_out = pd.DataFrame(columns=['chrom', 'pos', 'A', 'T', 'C', 'G', 'coverage'])samfile = pysam.AlignmentFile(bam, "rb")for idx, row in position.iterrows():bases = {"A": 0, "G": 0, "C": 0, "T": 0}chrom_ = row[0]pos_a = row[1] - 1pos_b = row[1]for pileupcolumn in samfile.pileup(chrom_, pos_a, pos_b, truncate=True):# 测序质量高于30的碱基for pileupread in [al for al in pileupcolumn.pileups if al.alignment.mapq > 30]:if not pileupread.is_del and not pileupread.is_refskip:base = pileupread.alignment.query_sequence[pileupread.query_position]try:bases[base] += 1except:passcov = sum(list(bases.values()))tb_out = tb_out.append({'chrom': chrom_,'pos': row[1],'A': int(bases['A']),'T': bases['T'],'G': bases['G'],'C': bases['C'],'coverage': cov}, ignore_index=True)samfile.close()_csv(outfile, index=False, sep='t')position = pd.read_table(posfile)
bam2matrix_single_cpu(bam=bamfile,position=position,outfile=outfile)

例如想要知道如下几个位点的coverage情况,则建立tab分隔的文件如下(第一列为染色体号,第二列为坐标):

chrom	pos
chr1	10000
chr2	20000
chr3	30000

执行脚本:

python myscript.py	 

本文发布于:2024-01-30 19:01:11,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170661247422157.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:数量   文件   位点   BAM   coverage
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23