社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  Python

Python开箱测试Visium HD空转数据

生信技能树 • 9 月前 • 411 次点击  

生信随笔写了一个 Visium HD 数据的开箱推文,该推文基于 R 语言。因此,小编也再来基于 python 体系开箱一次。

import jsonimport warningsfrom pathlib import Path
import pandas as pdimport scanpy as scfrom anndata import AnnDatafrom h5py import Filefrom matplotlib.image import imread
sc.settings.set_figure_params(figsize=(5, 5))%config InlineBackend.figure_format = 'retina'

数据下载

下载示例数据,并解压好相应压缩包进行下面的数据探索(解压: `tar -xvf *.tar.gz`)### Colorectal!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_cloupe_008um.cloupe!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_feature_slice.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_metrics_summary.csv!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_molecule_info.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_spatial.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_square_002um_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_square_008um_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_square_016um_outputs.tar.gz
### Mouse Small Intestine
!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_cloupe_008um.cloupe!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_feature_slice.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_metrics_summary.csv!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_molecule_info.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_spatial.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_square_002um_binned_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_square_008um_binned_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_square_016um_binned_outputs.tar.gz

数据读取

下面定义一个函数来读取数据,主要步骤的注释请看代码

def read_visium_hd(        path,        genome = None,        *,        count_file = "filtered_feature_bc_matrix.h5",        library_id = None,        load_images = True,        source_image_path = None):    path = Path(path)    adata = sc.read_10x_h5(path / count_file, genome=genome)


    

adata.uns["spatial"] = dict()
with File(path / count_file, mode="r") as f: attrs = dict(f.attrs) if library_id is None: library_id = attrs.pop("library_ids")[0].decode()
adata.uns["spatial"][library_id] = dict()
if load_images: files = dict( tissue_positions_file=path / "spatial/tissue_positions.parquet", scalefactors_json_file=path / "spatial/scalefactors_json.json", hires_image=path / "spatial/tissue_hires_image.png", lowres_image=path / "spatial/tissue_lowres_image.png", )
# check if files exists, continue if images are missing for f in files.values(): if not f.exists(): if any(x in str(f) for x in ["hires_image", "lowres_image"]): warnings.warn( f"You seem to be missing an image file.\n" f"Could not find '{f}'." ) else: raise OSError(f"Could not find '{f}'")
adata.uns["spatial"][library_id]["images"] = dict() for res in ["hires", "lowres"]: try: adata.uns["spatial"][library_id]["images"][res] = imread( str(files[f"{res}_image"]) ) except Exception: raise OSError(f"Could not find '{res}_image'")
# read json scalefactors adata.uns["spatial"][library_id]["scalefactors"] = json.loads( files["scalefactors_json_file"].read_bytes() )
adata.uns["spatial"][library_id]["metadata"] = { k: (str(attrs[k], "utf-8") if isinstance(attrs[k], bytes) else attrs[k]) for k in ("chemistry_description", "software_version") if k in attrs }
# read coordinates positions = pd.read_parquet(files["tissue_positions_file"]) positions.set_index('barcode', inplace=True) positions.columns = ['in_tissue', 'array_row', 'array_col', 'pxl_col_in_fullres', 'pxl_row_in_fullres']
adata.obs = adata.obs.join(positions, how="left")
adata.obsm["spatial"] = adata.obs[ ["pxl_row_in_fullres", "pxl_col_in_fullres"] ].to_numpy() adata.obs.drop( columns=["pxl_row_in_fullres", "pxl_col_in_fullres"], inplace=True, )
# put image path in uns if source_image_path is not None: # get an absolute path source_image_path = str(Path(source_image_path).resolve()) adata.uns["spatial"][library_id]["metadata"]["source_image_path"] = str( source_image_path )
return adata
from itertools import product
samples = ['Colon_Cancer', 'Small_Intestine']resolutions = ['square_002um', 'square_008um', 'square_016um']combinations = product(samples, resolutions)
for combin in combinations: sample, resolution = combin adata = read_visium_hd(f'./{sample}/{resolution}') adata.var_names_make_unique() adata.write(f'./{sample}_{resolution}.h5ad')
~/miniconda3/envs/official_Test/lib/python3.9/site-packages/anndata/_core/anndata.py:1830: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")

数据探索

首先,我们将结直肠癌 16um 分辨率的数据导入进来

adata = sc.read_h5ad('./Colon_Cancer_square_016um.h5ad')

以成纤维细胞的 marker DCN 为例,查看基因的表达(注意,尚未对该数据进行归一化,正式画基因表达图时,先进行归一化)

sc.settings.set_figure_params(figsize=(8, 8), dpi=100)# img_key=None时,不展示图像sc.pl.spatial(adata, color='DCN', size=2, img_key=None, cmap='Reds')

DCN基因表达

只展示 HE 图像,不展示点

# hiressc.settings.set_figure_params(figsize=(8, 8))sc.pl.spatial(adata, color='DCN', size=0, img_key='hires')

HE图
# 将点overlay在HE图像上sc.settings.set_figure_params(figsize=(8, 8))sc.pl.spatial(adata, color='DCN', size=1.5, img_key='hires')

DCN基因表达与HE

数据质量 8um

为了查看数据质量,我们导入 8um 分辨率的结直肠癌数据,我们且近似将其看做单细胞数据

adata = sc.read_h5ad('./Colon_Cancer_square_008um.h5ad')
adata

关于 Anndata 的数据结构,我们这里不赘述,详情请观看 Bilibili 相配套的视频教程(https://space.bilibili.com/51135340)

AnnData object with n_obs × n_vars = 545913 × 18085
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
# 计算QC指标adata.var["mt"] = adata.var_names.str.startswith("MT-")sc.pp.calculate_qc_metrics(    adata, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True)
adata
AnnData object with n_obs × n_vars = 545913 × 18085
obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt'
var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
uns: 'spatial'
obsm: 'spatial'
sc.settings.set_figure_params(figsize=(4, 4), dpi=80)sc.pl.violin(    adata,    ["n_genes_by_counts", "total_counts", "pct_counts_mt"],    jitter=0.4,    multi_panel=True,    size=0)

过滤前,中位基因数为 299,中位 UMI 为 380。

结直肠癌Visium HD数据QC图
adata.obs['n_genes_by_counts'].plot(kind='density')

结直肠癌Visium HD基因数目分布图
sc.pl.scatter(adata, x="total_counts", y="pct_counts_mt")sc.pl.scatter(adata, x="total_counts", y="n_genes_by_counts")

数据分布,整体上来说还算正常

total count VS. pct count

total count VS. n genes

Next

如果你习惯 R 语言并且想探索 Visium HD 的数据,我们建议您参考推文全网首发 | Visium HD 空转数据开箱测试

作为亚细胞分辨率的数据,我们更推荐你采用细胞分割构建真实空间单细胞数据的策略,关于这方面,我们建议您参考:

  1. 亚细胞精度空间转录组细胞分割
  2. 亚细胞精度空间转录组 RNA 分割
  3. 单细胞空间域分析详细教程
  4. 真正单细胞空间转录组数据的受配体分析教程

Visium HD 的真正单细胞分析和上述推文可能存在一定的 gap,敬请关注后续推文,我们将持续更新!

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/168908
 
411 次点击