Py学习  »  Python

Python开箱测试Visium HD空转数据

生信技能树 • 7 月前 • 334 次点击  

生信随笔写了一个 Visium HD 数据的开箱推文,该推文基于 R 语言。因此,小编也再来基于 python 体系开箱一次。

import jsonimport warningsfrom pathlib import Path
import pandas as pdimport scanpy as scfrom anndata import AnnDatafrom h5py import Filefrom matplotlib.image import imread
sc.settings.set_figure_params(figsize=(5, 5))%config InlineBackend.figure_format = 'retina'

数据下载

下载示例数据,并解压好相应压缩包进行下面的数据探索(解压: `tar -xvf *.tar.gz`)### Colorectal!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_cloupe_008um.cloupe!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_feature_slice.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_metrics_summary.csv!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_molecule_info.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_spatial.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_square_002um_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_square_008um_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_square_016um_outputs.tar.gz
### Mouse Small Intestine
!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_cloupe_008um.cloupe!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_feature_slice.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_metrics_summary.csv!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_molecule_info.h5!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_spatial.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_square_002um_binned_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_square_008um_binned_outputs.tar.gz!wget https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_square_016um_binned_outputs.tar.gz

数据读取

下面定义一个函数来读取数据,主要步骤的注释请看代码

def read_visium_hd(        path,        genome = None,        *,        count_file = "filtered_feature_bc_matrix.h5",        library_id = None,        load_images = True,        source_image_path = None):    path = Path(path)    adata = sc.read_10x_h5(path / count_file, genome=genome)


    

adata.uns["spatial"] = dict()
with File(path / count_file, mode="r") as f: attrs = dict(f.attrs) if library_id is None: library_id = attrs.pop("library_ids")[0].decode()
adata.uns["spatial"][library_id] = dict()
if load_images: files = dict( tissue_positions_file=path / "spatial/tissue_positions.parquet", scalefactors_json_file=path / "spatial/scalefactors_json.json", hires_image=path / "spatial/tissue_hires_image.png", lowres_image=path / "spatial/tissue_lowres_image.png", )
# check if files exists, continue if images are missing for f in files.values(): if not f.exists(): if any(x in str(f) for x in ["hires_image", "lowres_image"]): warnings.warn( f"You seem to be missing an image file.\n" f"Could not find '{f}'." ) else: raise OSError(f"Could not find '{f}'")
adata.uns["spatial"][library_id]["images"] = dict() for res in ["hires", "lowres"]: try: adata.uns["spatial"][library_id]["images"][res] = imread( str(files[f"{res}_image"]) ) except Exception: raise OSError(f"Could not find '{res}_image'")
# read json scalefactors adata.uns["spatial"][library_id]["scalefactors"] = json.loads( files["scalefactors_json_file"].read_bytes() )
adata.uns["spatial"][library_id]["metadata"] = { k: (str(attrs[k], "utf-8") if isinstance(attrs[k], bytes) else attrs[k]) for k in ("chemistry_description", "software_version") if k in attrs }
# read coordinates positions = pd.read_parquet(files["tissue_positions_file"]) positions.set_index('barcode', inplace=True) positions.columns = ['in_tissue', 'array_row', 'array_col', 'pxl_col_in_fullres', 'pxl_row_in_fullres']
adata.obs = adata.obs.join(positions, how="left")
adata.obsm["spatial"] = adata.obs[ ["pxl_row_in_fullres", "pxl_col_in_fullres"] ].to_numpy() adata.obs.drop( columns=["pxl_row_in_fullres", "pxl_col_in_fullres"], inplace=True, )
# put image path in uns if source_image_path is not None: # get an absolute path source_image_path = str(Path(source_image_path).resolve()) adata.uns["spatial"][library_id]["metadata"]["source_image_path"] = str( source_image_path )
return adata
from itertools import product
samples = ['Colon_Cancer', 'Small_Intestine']resolutions = ['square_002um', 'square_008um', 'square_016um']combinations = product(samples, resolutions)
for combin in combinations: sample, resolution = combin adata = read_visium_hd(f'./{sample}/{resolution}') adata.var_names_make_unique() adata.write(f'./{sample}_{resolution}.h5ad')
~/miniconda3/envs/official_Test/lib/python3.9/site-packages/anndata/_core/anndata.py:1830: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")

数据探索

首先,我们将结直肠癌 16um 分辨率的数据导入进来

adata = sc.read_h5ad('./Colon_Cancer_square_016um.h5ad')

以成纤维细胞的 marker DCN 为例,查看基因的表达(注意,尚未对该数据进行归一化,正式画基因表达图时,先进行归一化)

sc.settings.set_figure_params(figsize=(8, 8), dpi=100)# img_key=None时,不展示图像sc.pl.spatial(adata, color='DCN', size=2, img_key=None, cmap='Reds')

DCN基因表达

只展示 HE 图像,不展示点

# hiressc.settings.set_figure_params(figsize=(8, 8))sc.pl.spatial(adata, color='DCN', size=0, img_key='hires')

HE图
# 将点overlay在HE图像上sc.settings.set_figure_params(figsize=(8, 8))sc.pl.spatial(adata, color='DCN', size=1.5, img_key='hires')

DCN基因表达与HE

数据质量 8um

为了查看数据质量,我们导入 8um 分辨率的结直肠癌数据,我们且近似将其看做单细胞数据

adata = sc.read_h5ad('./Colon_Cancer_square_008um.h5ad')
adata

关于 Anndata 的数据结构,我们这里不赘述,详情请观看 Bilibili 相配套的视频教程(https://space.bilibili.com/51135340)

AnnData object with n_obs × n_vars = 545913 × 18085
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
# 计算QC指标adata.var["mt"] = adata.var_names.str.startswith("MT-")sc.pp.calculate_qc_metrics(    adata, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True)
adata
AnnData object with n_obs × n_vars = 545913 × 18085
obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt'
var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
uns: 'spatial'
obsm: 'spatial'
sc.settings.set_figure_params(figsize=(4, 4), dpi=80)sc.pl.violin(    adata,    ["n_genes_by_counts", "total_counts", "pct_counts_mt"],    jitter=0.4,    multi_panel=True,    size=0)

过滤前,中位基因数为 299,中位 UMI 为 380。

结直肠癌Visium HD数据QC图
adata.obs['n_genes_by_counts'].plot(kind='density')

结直肠癌Visium HD基因数目分布图
sc.pl.scatter(adata, x="total_counts", y="pct_counts_mt")sc.pl.scatter(adata, x="total_counts", y="n_genes_by_counts")

数据分布,整体上来说还算正常

total count VS. pct count

total count VS. n genes

Next

如果你习惯 R 语言并且想探索 Visium HD 的数据,我们建议您参考推文全网首发 | Visium HD 空转数据开箱测试

作为亚细胞分辨率的数据,我们更推荐你采用细胞分割构建真实空间单细胞数据的策略,关于这方面,我们建议您参考:

  1. 亚细胞精度空间转录组细胞分割
  2. 亚细胞精度空间转录组 RNA 分割
  3. 单细胞空间域分析详细教程
  4. 真正单细胞空间转录组数据的受配体分析教程

Visium HD 的真正单细胞分析和上述推文可能存在一定的 gap,敬请关注后续推文,我们将持续更新!

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/168908
 
334 次点击