Skip to content

Scripts, results and documentations for MLCB group project 2023

Notifications You must be signed in to change notification settings

lijin0303/MLCB_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Improved variant calling and phylogeny inference from scATAC-seq

Scripts, results and documentations for MLCB group project 2023

Megan Le, Ruitong Li, Daniel Schaffer

Cancer progression is driven by both genetic alterations and chromatin remodeling, yet little is known about the interplay between these two classes of events in shaping the clonal diversity of cancers. The use of scATAC-seq data for calling SNVs provides a promising alternative to scRNA-seq data, which does not sequence non-coding regions of the genome and may contain spurious mutations. Using the Monopogen framework, we performed germline mutation calling on data from single-cell ATACseq and DNAseq assays on the SNU601 gastric cancer cell line, with the scATAC-seq results containing 77.84% of the identified mutations from the scDNA-seq data with a 4.05% false positive rate. We then performed somatic mutation calling using LD refinement, finding 47.90% of scATAC-seq variants were also identified in the scDNA-seq results. We identified a high proportion of C>G mutations that were also mostly found in promoter regions, with the mutation data fitting to known mutation signatures related to chemotherapy treatment and mismatch repair. Though hierarchical clustering of the scATAC-seq SNV results was not effective in capturing known clonal structure, UMAP analysis reflected the highest-level clonal split through groupings of CNV clones 1 and 2 apart from clones 3, 4, 5, and 6. Additionally, pseudotime trajectory analysis was able to identify clone 3 as an intermediate clone before clones 4 through 6, which were grouped closer together in their UMAP embeddings. We analyzed correlations between the most variable peaks across clones and mutations with the best coverage but did not find any significant results. We also applied the CellPhy maximum likelihood estimate framework to our SNV results to perform phylogeny inference but were unable to reach convergence. Our work is a first step in exploring how scATAC-seq data can be used to improve mutation calling and investigate cancerous cell population dynamics and chromatin accessibility changes during clonal evolution, which can ultimately lead to a better understanding of individual cancer evolution and prognosis.

About

Scripts, results and documentations for MLCB group project 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •