Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing the creation of hetionet #34

Closed
olszewskip opened this issue Sep 30, 2020 · 2 comments
Closed

Reproducing the creation of hetionet #34

olszewskip opened this issue Sep 30, 2020 · 2 comments

Comments

@olszewskip
Copy link

Hi! Not sure if this is the right place to ask this, but here goes:

I've read through https://think-lab.github.io/p/rephetio/. Hetionet seems extremely impressive and useful. I need something very similar, if not identical, but with the emphasis on diagnosing rare diseases, and I would also strongly prefer to have the ability of automatically updating or adding new data to my database, e.g. to include some new GWAS findings, tailoring specificity of the disease terms to my needs, or maybe adding other node types like genetic variants. Hence, I'm wondering, how hard would it be to reproduce something like hetionet from scratch, possibly in litttle steps (for a group of a couple of people)? I see that https://think-lab.github.io/p/rephetio/#methods has some detailed information about what steps where taken, and also quite a number of links to files hosted on Zenodo. Would You say that all information is there or should I also look elsewhere? Was the main "mode of operation" to download text files from the internet, parse/preprocess/unify/join the data using python scripts, and then inject into Neo4j?

Apologies for a vague question. Many thanks for any suggestions! :)

@dhimmel
Copy link
Member

dhimmel commented Sep 30, 2020

Sounds like you're most interested in https://github.com/dhimmel/integrate. This repo does the following:

download text files from the internet, parse/preprocess/unify/join the data using python scripts, and then inject into Neo4j?

Particularly, the integrate.ipynb notebook will be of interest.

Note that most datasets don't come directly from the upstream resource, but rather an intermediate repo that performs pre-processing. In total, there's dozens of repositories that work together to create Hetionet, but the creation is all orchestrated in the dhimmel/integrate repo.

@olszewskip
Copy link
Author

Awesome! Thank You.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants