社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

跟着PNAS学数据分析:minigraph构建的泛基因组解析出来的SV划分不同类型的python脚本

小明的数据分析笔记本 • 3 周前 • 32 次点击  

论文 

Novel functional sequences uncovered through a bovine multiassembly graph

https://www.pnas.org/doi/10.1073/pnas.2101056118

代码链接 

https://github.com/AnimalGenomicsETH/bovine-graphs/blob/main/scripts/get_bialsv.py

论文里关于这部分方法的描述

structural variations were classified as biallelic if two paths were observed in a bubble and multiallelic if a bubble contained more than two paths. The structural variations were further classified into:

Alternate deletion: when the nonreference path was shorter than the reference path (but the reference path has nonzero length) (这里是不是写错了 是不是应该是 nonreference path has nonzero length)

complete deletion: when the nonreference path has a length of zero

Alternate insertion: when the nonreference path was longer than the reference path (这里是不是应该标注the reference path has nonzero length)

complete insertion: when the reference path has a length of zero, the nonreference path was longer than teh reference path

还有一篇NC的牦牛泛基因组论文

Evolutionary origin of genomic structural variations in domestic yaks

https://doi.org/10.1038/s41467-023-41220-x

这里把 ref 和 nonref 的长度都不是0的情况划分为了 divergent

今天推文开头提到的论文里提供的脚本需要一个 graph_length 文件还有一个 biallelic的bubble文件

接下来介绍如何获取这两个文件

首先用minigraph构建图形泛基因组

minigraph --inv no -cxggs -L 5 -t 8 seq1.fa seq2.fa seq3.fa seq4.fa seq5.fa seq6.fa  -o LPA.gfa

用gfatools将变异解析出来

gfatools bubble LPA.gfa > LPA_bubble.tsv

获取二等位的变异

awk '$5==2 {print $1,$2,$4,$5,$12}' LPA_bubble.tsv > LPA_biallelic_bubble.tsv

获取graph 长度文件




    
awk '$1~/S/ {split($5,chr,":");split($6,pos,":");split($7,arr,":");print $2,length($3),chr[3],pos[3],arr[3]}' LPA.gfa > graph_len.tsv

运行论文中提供的脚本

python get_bialsv01.py graph_len.tsv LPA_biallelic_bubble.tsv > biallelic_sv.type

这里106420 是 AltDel 用BandageNG 看一下图形泛基因组的这个位置

欢迎大家关注我的公众号

小明的数据分析笔记本


小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!


Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/180693
 
32 次点击