save kmer hash for re-use? #33

rwhetten · 2021-07-29T17:26:41Z

I'm working with a large dataset of multiple files of PacBio CLR reads, and at the moment filtlong is running on a combined file containing all the data (267 Gb). It seems it might be faster if there were an option to build and save a kmer hash of the Illumina reads used for QC so that the same hash could be used by multiple independent processes running on individual files. If fast read access to the hash is important, it could be copied to local scratch space on each individual node, so each process has its own copy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save kmer hash for re-use? #33

save kmer hash for re-use? #33

rwhetten commented Jul 29, 2021

save kmer hash for re-use? #33

save kmer hash for re-use? #33

Comments

rwhetten commented Jul 29, 2021