Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version #1

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).


## [1.0.0] - 2025-02-26
### Added
- [GRD-832](https://jira.oicr.on.ca/browse/GRD-871), first verion of the wdl along with README and vidarr files
23 changes: 23 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
pipeline {
agent any
stages {
stage('build') {
when {
not {
buildingTag()
}
}
steps {
sh '/.mounts/labs/gsi/vidarr/jenkins-ci-wrapper test -t /.mounts/labs/gsi/vidarr/testing-config.json'
}
}
stage('Deploy') {
when {
buildingTag()
}
steps {
sh '/.mounts/labs/gsi/vidarr/jenkins-ci-wrapper deploy -v $TAG_NAME -t /.mounts/labs/gsi/vidarr/testing-config.json -U /.mounts/labs/gsi/vidarr/deploy-urls'
}
}
}
}
123 changes: 122 additions & 1 deletion README.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1 +1,122 @@
# methyleDackel
# methylDackel

Workflow to run methylDackel, will process a coordinate-sorted and indexed BAM or CRAM file containing some form of BS-seq or EM-seq alignments and extract per-base methylation metrics from them. The extract task generates bedGraph files, by default generates only CpG metrics, option can be set to also generate CHH and CHG metrics. Mbias task generates tsv file for methylation bias metrics and a svg graph for visualizing mbias (only for chromosome 1 here).

## Overview

## Dependencies

* [methyldackel 0.6.1](https://github.com/dpryan79/MethylDackel)


## Usage

### Cromwell
```
java -jar cromwell.jar run methylDackel.wdl --inputs inputs.json
```

### Inputs

#### Required workflow parameters:
Parameter|Value|Description
---|---|---
`bam`|File|The bam file for methyl analysis
`bai`|File|The index for input bam
`outputFileNamePrefix`|String|Prefix for output files
`reference`|String|The genome reference build


#### Optional workflow parameters:
Parameter|Value|Default|Description
---|---|---|---
`doMbias`|Boolean|true|Whether run Mbias or not


#### Optional task parameters:
Parameter|Value|Default|Description
---|---|---|---
`methylDackelExtract.doCHH`|Boolean|false|whether enable CHH metrics
`methylDackelExtract.doCHG`|Boolean|false|whether enable CHG metrics
`methylDackelExtract.mergeContext`|Boolean|false|whether merge context in bedgraph
`methylDackelExtract.minimumuQualityPhred`|Int?|None|minimumu sequencing quality phred score
`methylDackelExtract.minimumMAPQ`|Int?|None|minimum MAPQ score
`methylDackelExtract.minDepth`|Int?|None|region with minimum depth needed to be included in analysis
`methylDackelExtract.timeout`|Int|8|The hours until the task is killed
`methylDackelExtract.memory`|Int|8|The GB of memory provided to the task
`methylDackelExtract.threads`|Int|8|The number of threads the task has access to
`extractChromosomes.timeout`|Int|1|The hours until the task is killed
`extractChromosomes.memory`|Int|1|The GB of memory provided to the task
`extractChromosomes.threads`|Int|1|The number of threads the task has access to
`extractChromosomes.modules`|String|"samtools/1.16.1"|The modules that will be loaded
`methylDackelMbias.timeout`|Int|12|The hours until the task is killed
`methylDackelMbias.memory`|Int|8|The GB of memory provided to the task
`methylDackelMbias.threads`|Int|8|The number of threads the task has access to
`concatMbiasTsvFiles.timeout`|Int|1|The hours until the task is killed
`concatMbiasTsvFiles.memory`|Int|1|The GB of memory provided to the task
`concatMbiasTsvFiles.threads`|Int|1|The number of threads the task has access to
`concatMbiasTsvFiles.modules`|String|"pandas/2.1.3"|The modules that will be loaded


### Outputs

Output | Type | Description | Labels
---|---|---|---
`extract_bedgraph`|File|bedGraph output from methylDackelExtract|vidarr_label: extract_bedgraph
`mbias_tsv`|File?|mbias tsv output from methylDackelMbias|vidarr_label: mbias_tsv
`mbias_svg`|File?|svg plot files from methylDackelMbias|vidarr_label: mbias_svg


## Commands
This section lists command(s) run by methylDackel workflow

* Running methylDackel


```
samtools view -H ~{bam} | grep @SQ | cut -f2 | sed 's/SN://' | grep -E -v '(_random|chrUn|chrM|MT|_alt|_fix|_decoy|_PATCH|_HSCHR|NC_|_EBV|EBV|phiX|pUC19|lambda|_scaffold)'
```
```
set -euo pipefail
MethylDackel extract ~{filterMAPQ} ~{filterQalityPhred} ~{filterminDepth} ~{optionMergeContext} ~{optionCHH} ~{optionCHG} -@ ~{threads} ~{fasta} ~{bam} -o ~{outputFileNamePrefix}.methyldackel
mkdir -p ~{outputFileNamePrefix}_extract_bedGraph
mv *.bedGraph ~{outputFileNamePrefix}_extract_bedGraph
tar -czf ~{outputFileNamePrefix}_extract_bedGraph.tar.gz ~{outputFileNamePrefix}_extract_bedGraph
```
```
MethylDackel mbias --txt -r ~{chr} ~{fasta} ~{bam} ~{outputFileNamePrefix}.mbias > output_mbias.tsv

mkdir -p ~{outputFileNamePrefix}_mbias.svgs
mv *.svg ~{outputFileNamePrefix}_mbias.svgs
tar -czf ~{outputFileNamePrefix}_mbias.svgs.tar.gz ~{outputFileNamePrefix}_mbias.svgs
```
```
python3<<CODE

import sys
import pandas as pd

dfs = []
input_files = ['~{sep="', '" select_all(inputTsvs)}']
columns = ['Strand', 'Read', 'Position', 'nMethylated', 'nUnmethylated']
for file in input_files:
df = pd.read_csv(file, sep='\t', skiprows=1, names=columns) # Skip header
dfs.append(df)

combined_df = pd.concat(dfs, ignore_index=True)

# Group by Strand, Read, and Position, and sum the methylation counts
aggregated_df = combined_df.groupby(['Strand', 'Read', 'Position'], as_index=False).agg({
'nMethylated': 'sum',
'nUnmethylated': 'sum'
}).sort_values(['Strand', 'Read', 'Position'])

with open("~{outputFileNamePrefix}.mbias.tsv", 'w') as f:
aggregated_df.to_csv(f, sep='\t', index=False)
CODE
```
## Support

For support, please file an issue on the [Github project](https://github.com/oicr-gsi) or send an email to [email protected] .

_Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)_
48 changes: 48 additions & 0 deletions commands.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
## Commands
This section lists command(s) run by methylDackel workflow

* Running methylDackel


```
samtools view -H ~{bam} | grep @SQ | cut -f2 | sed 's/SN://' | grep -E -v '(_random|chrUn|chrM|MT|_alt|_fix|_decoy|_PATCH|_HSCHR|NC_|_EBV|EBV|phiX|pUC19|lambda|_scaffold)'
```
```
set -euo pipefail
MethylDackel extract ~{filterMAPQ} ~{filterQalityPhred} ~{filterminDepth} ~{optionMergeContext} ~{optionCHH} ~{optionCHG} -@ ~{threads} ~{fasta} ~{bam} -o ~{outputFileNamePrefix}.methyldackel
mkdir -p ~{outputFileNamePrefix}_extract_bedGraph
mv *.bedGraph ~{outputFileNamePrefix}_extract_bedGraph
tar -czf ~{outputFileNamePrefix}_extract_bedGraph.tar.gz ~{outputFileNamePrefix}_extract_bedGraph
```
```
MethylDackel mbias --txt -r ~{chr} ~{fasta} ~{bam} ~{outputFileNamePrefix}.mbias > output_mbias.tsv

mkdir -p ~{outputFileNamePrefix}_mbias.svgs
mv *.svg ~{outputFileNamePrefix}_mbias.svgs
tar -czf ~{outputFileNamePrefix}_mbias.svgs.tar.gz ~{outputFileNamePrefix}_mbias.svgs
```
```
python3<<CODE

import sys
import pandas as pd

dfs = []
input_files = ['~{sep="', '" select_all(inputTsvs)}']
columns = ['Strand', 'Read', 'Position', 'nMethylated', 'nUnmethylated']
for file in input_files:
df = pd.read_csv(file, sep='\t', skiprows=1, names=columns) # Skip header
dfs.append(df)

combined_df = pd.concat(dfs, ignore_index=True)

# Group by Strand, Read, and Position, and sum the methylation counts
aggregated_df = combined_df.groupby(['Strand', 'Read', 'Position'], as_index=False).agg({
'nMethylated': 'sum',
'nUnmethylated': 'sum'
}).sort_values(['Strand', 'Read', 'Position'])

with open("~{outputFileNamePrefix}.mbias.tsv", 'w') as f:
aggregated_df.to_csv(f, sep='\t', index=False)
CODE
```
Loading