Gilda: Grounding Integrating Learned Disambiguation

Gilda is a Python package and REST service that grounds (i.e., finds appropriate identifiers in various namespaces for) named entities in biomedical text.

Gyori BM, Hoyt CT, Steppi A (2022). Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinformatics Advances, 2022; vbac034 https://doi.org/10.1093/bioadv/vbac034.

Installation

Gilda is deployed as a web service at http://grounding.indra.bio/ (see Usage instructions below), however, it can also be used locally as a Python package.

The recommended method to install Gilda is through PyPI as

pip install gilda

Note that Gilda uses a single large resource file for grounding, which is automatically downloaded into the ~/.data/gilda/<version> folder during runtime (see pystow for options to configure the location of this folder).

Given some additional dependencies, the grounding resource file can also be regenerated locally by running python -m gilda.generate_terms.

Documentation and notebooks

Documentation for Gilda is available here. We also provide several interactive Jupyter notebooks to help use and customize Gilda:

Gilda Introduction provides an interactive tutorial for using Gilda.
Custom Grounders shows several examples of how Gilda can be instantiated with custom grounding resources.
Model Training provides interactive sample code for training new disambiguation models.

Usage

Gilda can either be used as a REST web service or used programmatically via its Python API. An introduction Jupyter notebook for using Gilda is available at https://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb

Use as a Python package

For using Gilda as a Python package, the documentation at http://gilda.readthedocs.org provides detailed descriptions of each module of Gilda and their usage. A basic usage example for named entity normalization (NEN), or grounding is as follows:

import gilda
scored_matches = gilda.ground('ER', context='Calcium is released from the ER.')

Gilda also implements a simple dictionary-based named entity recognition (NER) algorithm that can be used as follows:

import gilda
results = gilda.annotate('Calcium is released from the ER.')

Use as a web service

The REST service accepts POST requests with a JSON header on the /ground endpoint. There is a public REST service running at http://grounding.indra.bio but the service can also be run locally as

python -m gilda.app

which, by default, launches the server at localhost:8001 (for local usage replace the URL in the examples below with this address).

Below is an example request using curl:

curl -X POST -H "Content-Type: application/json" -d '{"text": "kras"}' http://grounding.indra.bio/ground

The same request using Python's request package would be as follows:

import requests
requests.post('http://grounding.indra.bio/ground', json={'text': 'kras'})

The web service also supports multiple inputs in a single request on the ground_multi endpoint, for instance

import requests
requests.post('http://grounding.indra.bio/ground_multi',
              json=[
                  {'text': 'braf'},
                  {'text': 'ER', 'context': 'endoplasmic reticulum (ER) is a cellular component'}
              ]
          )

Resource usage

Gilda loads grounding terms into memory when first used. If memory usage is an issue, the following options are recommended.

Run a single instance of Gilda as a local web service that one or more other processes send requests to.
Create a custom Grounder instance that only loads a subset of terms appropriate for a narrow use case.
Gilda also offers an optional sqlite back-end which significantly decreases memory usage and results in minor drop in the number of strings grounder per unit time. The sqlite back-end database can be built as follows with an optional [db_path] argument, which if used, should use the .db extension. If not specified, the .db file is generated in Gilda's default resource folder.

python -m gilda.resources.sqlite_adapter [db_path]

A Grounder instance can then be instantiated as follows:

from gilda.grounder import Grounder
gr = Grounder(db_path)
matches = gr.ground('kras')

Run web service with Docker

After cloning the repository locally, you can build and run a Docker image of Gilda using the following commands:

$ docker build -t gilda:latest .
$ docker run -d -p 8001:8001 gilda:latest

Alternatively, you can use docker-compose to do both the initial build and run the container based on the docker-compose.yml configuration:

$ docker-compose up

Default grounding resources

Gilda is customizable with terms coming from different vocabularies. However, Gilda comes with a default set of resources from which terms are collected (almost 2 million entries as of v1.1.0), without any additional configuration needed. These resources include:

HGNC (human genes)
UniProt (human and model organism proteins)
FamPlex (human protein families and complexes)
CHeBI (small molecules, metabolites, etc.)
GO (biological processes, molecular functions, complexes)
DOID (diseases)
EFO (experimental factors: cell lines, cell types, anatomical entities, etc.)
HP (human phenotypes)
MeSH (general: diseases, proteins, small molecules, cell types, etc.)
Adeft (misc. terms corresponding to ambiguous acronyms)

Citation

@article{gyori2022gilda,
    author = {Gyori, Benjamin M and Hoyt, Charles Tapley and Steppi, Albert},
    title = "{{Gilda: biomedical entity text normalization with machine-learned disambiguation as a service}}",
    journal = {Bioinformatics Advances},
    year = {2022},
    month = {05},
    issn = {2635-0041},
    doi = {10.1093/bioadv/vbac034},
    url = {https://doi.org/10.1093/bioadv/vbac034},
    note = {vbac034}
}

Funding

The development of Gilda was funded under the DARPA Communicating with Computers program (ARO grant W911NF-15-1-0544) and the DARPA Young Faculty Award (ARO grant W911NF-20-1-0255).

Name	Name	Last commit message	Last commit date
Latest commit bgyori Merge pull request #153 from gyorilab/fix Mar 20, 2025 5b0eabf · Mar 20, 2025 History 884 Commits
.github/workflows	.github/workflows	Update tests.yml to use new punkt file from NLTK	Aug 29, 2024
benchmarks	benchmarks	Added readme output.	Jul 26, 2024
doc	doc	Add NER tutorial and update documentation configuration	Jun 30, 2023
extensions/google_docs	extensions/google_docs	Move terms inside folder	Apr 19, 2022
gilda	gilda	Fix corner case with trailing space in synonym	Mar 20, 2025
models	models	Add script to plot model F1 histogram	Mar 17, 2022
notebooks	notebooks	Add README to documentation	Apr 26, 2022
scripts	scripts	Add known mappings for debugging	Jul 18, 2024
.gitignore	.gitignore	Update .gitignore	Sep 12, 2023
.readthedocs.yml	.readthedocs.yml	Add Readthedocs config	Dec 11, 2023
Dockerfile	Dockerfile	Move UI to own module	Sep 12, 2023
LICENSE	LICENSE	Add license	Jun 27, 2019
MANIFEST.in	MANIFEST.in	Add templates to install	Oct 22, 2022
README.md	README.md	Add an extra word to README	Jul 19, 2024
docker-compose.yml	docker-compose.yml	Add first draft of dockerfile	Aug 23, 2021
setup.py	setup.py	Add scikit-learn constraint	Jun 12, 2024
tox.ini	tox.ini	Update Gilda documentation	Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gilda: Grounding Integrating Learned Disambiguation

Installation

Documentation and notebooks

Usage

Use as a Python package

Use as a web service

Resource usage

Run web service with Docker

Default grounding resources

Citation

Funding

About

Releases 16

Contributors 7

Languages

License

gyorilab/gilda

Folders and files

Latest commit

History

Repository files navigation

Gilda: Grounding Integrating Learned Disambiguation

Installation

Documentation and notebooks

Usage

Use as a Python package

Use as a web service

Resource usage

Run web service with Docker

Default grounding resources

Citation

Funding

About

Resources

License

Stars

Watchers

Forks

Releases 16

Contributors 7

Languages