You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running gtdbtk on a computer cluster and encountering an issue with pplacer.
I already saw issue #170. I am using 1CPU, and after allocating 100GB then 204GB, tried using bigmem. Still didn't work.
Per a suggestions in issue #170, I ran pplacer -m WAG -j 1 -c /gpfs/data/rbeinart/Databases/gtdbtk-1.4.0/db/pplacer/gtdb_r95_bac120.refpkg -o /tmp/pplacer.bac120.json ./align/gtdbtk.bac120.user_msa.fasta
which gave:
Running pplacer v1.1.alpha19-0-g807f6f3 analysis on ./align/gtdbtk.bac120.user_msa.fasta...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
query bin.3 is not the same length as the reference alignment (got 5037; expected 5040)
Output log from running gtdbtk command:
[2021-06-25 19:13:44] INFO: Completed 4 genomes in 4.65 minutes (1.16 minutes/genome).
[2021-06-25 19:13:44] TASK: Identifying TIGRFAM protein families.
[2021-06-25 19:16:02] INFO: Completed 4 genomes in 2.30 minutes (1.74 genomes/minute).
[2021-06-25 19:16:02] TASK: Identifying Pfam protein families.
[2021-06-25 19:16:29] INFO: Completed 4 genomes in 26.79 seconds (6.70 seconds/genome).
[2021-06-25 19:16:29] INFO: Annotations done using HMMER 3.1b2 (February 2015).
[2021-06-25 19:16:29] TASK: Summarising identified marker genes.
[2021-06-25 19:16:33] INFO: Completed 4 genomes in 3.96 seconds (1.01 genomes/second).
[2021-06-25 19:16:33] INFO: Done.
[2021-06-25 19:16:33] INFO: Aligning markers in 4 genomes with 1 CPUs.
[2021-06-25 19:16:33] INFO: Processing 4 genomes identified as bacterial.
[2021-06-25 19:16:38] INFO: Read concatenated alignment for 45,555 GTDB genomes.
[2021-06-25 19:16:38] TASK: Generating concatenated alignment for each marker.
[2021-06-25 19:16:40] INFO: Completed 4 genomes in 2.26 seconds (1.77 genomes/second).
[2021-06-25 19:16:40] TASK: Aligning 120 identified markers using hmmalign 3.1b2 (February 2015).
[2021-06-25 19:16:44] INFO: Completed 120 markers in 3.39 seconds (35.43 markers/second).
[2021-06-25 19:16:44] DEBUG: Successfully written all markers to: ./align/intermediate_results/markers
[2021-06-25 19:16:44] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.
[2021-06-25 19:17:38] INFO: Completed 45,559 sequences in 54.14 seconds (841.47 sequences/second).
[2021-06-25 19:17:38] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs.
[2021-06-25 19:17:38] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
[2021-06-25 19:17:38] INFO: Creating concatenated alignment for 45,559 bacterial GTDB and user genomes.
[2021-06-25 19:17:38] INFO: Creating concatenated alignment for 4 bacterial user genomes.
[2021-06-25 19:17:38] INFO: Done.
[2021-06-25 19:17:39] TASK: Placing 4 bacterial genomes into reference tree with pplacer using 1 CPUs (be patient).
[2021-06-25 19:17:39] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2021-06-25 19:17:39] DEBUG: pplacer -m wag -j 1 -c /gpfs/data/rbeinart/cbreusing/miniconda3/envs/gtdbtk/share/gtdbtk-1.5.0/db/pplacer/gtdb_r202_bac120.refpkg -o ./classify/intermediate_results/pplacer/pplacer.bac120.json ./align/gtdbtk.bac120.user_msa.fasta
[2021-06-25 19:39:00] ERROR: Controlled exit resulting from an unrecoverable error or warning.
================================================================================
EXCEPTION: PplacerException
MESSAGE: An error was encountered while running pplacer.
________________________________________________________________________________
Traceback (most recent call last):
File "/users/mhauer1/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/__main__.py", line 95, in main
gt_parser.parse_options(args)
File "/users/mhauer1/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/main.py", line 718, in parse_options
self.classify(options)
File "/users/mhauer1/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/main.py", line 440, in classify
classify.run(genomes,
File "/users/mhauer1/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/classify.py", line 444, in run
classify_tree = self.place_genomes(user_msa_file,
File "/users/mhauer1/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/classify.py", line 240, in place_genomes
pplacer.run(self.pplacer_cpus, 'wag', pplacer_ref_pkg, pplacer_json_out,
File "/users/mhauer1/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 92, in run
raise PplacerException(
gtdbtk.exceptions.PplacerException: An error was encountered while running pplacer.
================================================================================
(END)
`
output file in classify/intermediate_results/pplacer/pplacer.bac120.out
Running pplacer v1.1.alpha19-0-g807f6f3 analysis on ./align/gtdbtk.bac120.user_msa.fasta...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
Pre-masking sequences... sequence length cut from 5037 to 5002.
Determining figs... figs disabled.
Allocating memory for internal nodes... done.
Caching likelihood information on reference tree...
Any suggestions?
The text was updated successfully, but these errors were encountered:
Hello,
It seems there is a conflict between the Release 95 and Release 202 of GTDB-Tk databases.
Release 95 trims the alignment to 5040 AA and Release 202 uses 5037 AA so , looking at the log , it looks like the alignment step is done based on R202 (Masked bacterial alignment from 41,084 to 5,037 AAs.) but pplacer is still using Release 95 (that is why it expects 5040 AA in the MSA) to place the genomes.
I would recommend downloading a fresh version of GTDB-Tk release 202 and place it in a newly created folder.
Running gtdbtk on a computer cluster and encountering an issue with pplacer.
I already saw issue #170. I am using 1CPU, and after allocating 100GB then 204GB, tried using bigmem. Still didn't work.
Per a suggestions in issue #170, I ran
pplacer -m WAG -j 1 -c /gpfs/data/rbeinart/Databases/gtdbtk-1.4.0/db/pplacer/gtdb_r95_bac120.refpkg -o /tmp/pplacer.bac120.json ./align/gtdbtk.bac120.user_msa.fasta
which gave:
Output log from running gtdbtk command:
output file in
classify/intermediate_results/pplacer/pplacer.bac120.out
Any suggestions?
The text was updated successfully, but these errors were encountered: