-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize/improve import tool #37
Comments
Hey @bernt-matthias! Yeah there's definitely some mild impedance here, this section of our tutorial should go over the "easy" way to do this: But generally speaking, QIIME 2 doesn't have a notion of "collections" per-se, instead we are indeed trying to place all of those fastq.gz into a single QZA (we've found this to be pretty user-friendly). But to get the data into that QZA, we're expecting a galaxy collection and then we use a regular expression on the element IDs to figure out which is forward vs reverse. This is the same regular expression that we use to validate the user has given us a directory containing the appropriate files (we're quite file oriented). There's really no equivalent concept of paired data in QIIME 2, as it's all defined by the format, which is expecting some directory structure. Instead we rely on the semantic type to indicate paired-ness, since many tools will use the default Casava layout. In principle, you should be able to upload a directory of raw reads from the sequencing instrument and place them in a collection (not paired, just a boring collection) and then probably add the file-extension of |
Also I should mention that the Manifest style formats you mention for this particular type were a hack for importing which can basically never work in Galaxy, as they expect real filepaths to exist. I have a rather informal proposal for modifying directory formats to better suite Galaxy as well, perhaps there is a way to indicate pair-ed-ness in this realm, which we could then automatically map to Galaxy's paired collections. |
Thanks for the clarifications and in particular for the link. |
Hi @ebolyen is there some documentation on the expected file names for the different input types (which might be added to the Galaxy tool help)? I'm (better a colleague) currently struggling to import data: I'm using With
With
The latter is kind of clear from the error message since the regex does not match our file names: ids.txt Could you give us some advice which import format we should choose, or if we should rename our data? |
Hey @bernt-matthias, Sorry for not getting back to you. For user-support the forum is much more closely observed. Regarding the error. Yeah that's definitely an unhelpful error. Your IDs look ok, although I see
in your list, which I presume isn't actually in the collection. I would try setting the
as QIIME2 is trying to match the entire collection element identifier to the directory regex. |
No worries :)
Wondering if you want to add a link to the forum to the tool's help section?
Oh, yes. That is probably it. |
Just have read this again
Would relative path work? |
Unfortunately no, you would need to have an absolute path the the I'm working on something right now that may clean this up, but no particular ETA. Until then, using the directory formats is your best bet as you have control over the element identifiers which can be made to match the expected relative path of the directory format (as tedious as that is). |
Indeed, this assumption does not hold in all Galaxy installations.
+1
Thanks |
Hi guys! Now that we have the tools on EU we get this problem as well :) It there any workaround yet? |
Workaround seems to be to not use the manifest for importing. For now we have to educate users to maybe use the manifest (which is just a metadata table, or?) to construct a collection and use this for the import. |
I have a hard time figuring out how to import data into qiime2 tools using the import tool. I guess the most frequently used data is demultiplexed fastq.gz (maybe + sample data tsv file), e.g https://data.qiime2.org/2022.8/tutorials/importing/casava-18-single-end-demultiplexed.zip. I failed to find the corresponding option in the import tool.
SampleData[PairedEndSequencesWithQuality]
?Paired End Fastq Manifest Phred33
)Casava One Eight Laneless Per Sample Directory Format
) but the collection type is not set (I guess it should becollection_type="list:paired"
)To get me started with exploring downstream tools it would be nice if someone could tell me for now how I could import data like the above (is there already a Galaxy specific tutorial that I did not notice so far?).
I guess the main problem is that the mapping between Galaxy concepts and qiime2 concepts needs a bit of improvement (e.g. that galaxy data types and collection types are not used yet). But probably its also because I'm unexperienced with qiime2 .. at the moment I'm just guessing that the goal of the import is to create a single
qza
dataset from all fastq files? Also I'm missing info in the help (like the definition of what a manifest is).Since the tool is auto generated I'm unsure if this is easily possible. An alternative would be to handcraft an import tool covering the most frequently used types of input data that has a tight integration of the Galaxy concepts.
I imagine a tool that takes as input either
with format
fastq.gz
plus (in addition simple data inputs withmultiple="true"
might be useful [because some users don't seem to like collections for some reason])The tool then automatically knows about the phred encoding due to the specific Galaxy fastq.gz sub-datatypes.
The text was updated successfully, but these errors were encountered: