Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read simulation model versions from DB #927

Merged
merged 8 commits into from
May 24, 2024
Merged

Conversation

GernotMaier
Copy link
Contributor

This PR adds functionality to read / update the simulation model versions and tags from the database.

This replaces the hardwired list of model versions in names.py (which has been removed). This means names.validate_model_version_name() is replaced by db_handler.model_version()

As a reminder, we want:

  • db.model_version("2024-02-01") == "2024-02-01"
  • db.model_version("Prod5") == "2020-06-28"
    (I am actually not a friend of this - we should have one model version name and not alternative names).

Model versions and tags are stored in the database in the meta collection (and in the model_parameters repository in model_versions/metadata/simulation_model_tags.json). To update the DB, one updates this file and then runs e.g.,

python simtools/applications/db_development_tools/add_model_parameters_from_repository_to_db.py --input_path ../model_parameters/model_versions/metadata --type metadata --db_name Staging-CTA-Simulation-Model-v0-2-0

(this will be added to the documentation, see the issues on the documentation).

@GernotMaier GernotMaier self-assigned this May 17, 2024
@GernotMaier GernotMaier linked an issue May 17, 2024 that may be closed by this pull request
3 tasks
@GernotMaier GernotMaier marked this pull request as ready for review May 17, 2024 07:48
@VictorBarbosaMartins
Copy link
Contributor

This PR adds functionality to read / update the simulation model versions and tags from the database.

This replaces the hardwired list of model versions in names.py (which has been removed). This means names.validate_model_version_name() is replaced by db_handler.model_version()

As a reminder, we want:

  • db.model_version("2024-02-01") == "2024-02-01"
  • db.model_version("Prod5") == "2020-06-28"
    (I am actually not a friend of this - we should have one model version name and not alternative names).

Model versions and tags are stored in the database in the meta collection (and in the model_parameters repository in model_versions/metadata/simulation_model_tags.json). To update the DB, one updates this file and then runs e.g.,

python simtools/applications/db_development_tools/add_model_parameters_from_repository_to_db.py --input_path ../model_parameters/model_versions/metadata --type metadata --db_name Staging-CTA-Simulation-Model-v0-2-0

(this will be added to the documentation, see the issues on the documentation).

Hi @GernotMaier, with which database are you working in this PR? I am not a big fan of having to update another repository for something in simtools to work. Isn't there a way to get it automatically from the online repo?

Copy link
Contributor

@VictorBarbosaMartins VictorBarbosaMartins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for this. Looks in general good. Here are a couple of comments to start with.

db_name : str
Name of the database to be created.
logger : logging.Logger
Logger object.
"""
logger.info("Adding metadata to the DB")
input_path = Path(args_dict["input_path"])
simulation_model_tags = gen.collect_data_from_file_or_dict(
file_name=input_path / "simulation_model_tags.json", in_dict=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means that every repository that we want to update to the DB has to have a simulation_model_tags.json file? How can we be sure that this is going to be the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this file should actually look like? Is there a schema file that defines it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new model version means to add a new directory to the gitlab etc (documentation updates are currently in preparation). So adding an entry into the model version json file is one step of this workflow.

Note that this is only necessary because we want to keep the possibility to call productions 'Prod6', 'Prod5' plus have the actually versioning with a date (this is what we have decided a while ago). If this wouldn't be necessary, one could simply read the directory names in the model_parameters repository. Anyway, I would keep this for now and see if we simplify in future.

Schema file: yes, and no - this is a very simple structure and I think the logic is absolutely clear from the examples. So probably no need at this point for a schema file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that this part is database maintenance and not foreseen to be done by the 'normal' user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The logic is absolutely clear" is not a very stable or convincing argument. Simtools does not have any idea at the moment on how this file should look like. This is for example different for the model parameter files (in the model parameters repo), because simtools have the schema files and know what to expect. So either we put that file in simtools or we create a schema file for it in simtools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema file would only allow to validate the format of the yaml file. As this is a single field dict, I am not sure if we need to introduce a schema.

@GernotMaier
Copy link
Contributor Author

First - if you try the db update tools, don't use a db_name which is used by others. Use a new name (sandbox...). Note that those databases needs to be deleted by hand, so best not to use it.

On the tools: it works for me - are you use you are in the right branch, or maybe you have to do a pip install -e?

@GernotMaier
Copy link
Contributor Author

@VictorBarbosaMartins - thanks for the comments - let me know if this is ok.

@VictorBarbosaMartins
Copy link
Contributor

sandbox

I still did not get it to work without cloning the repository locally. Can we make this work with the env variable SIMTOOLS_DB_SIMULATION_MODEL_URL as we did before?

@GernotMaier
Copy link
Contributor Author

You don't need a SIMTOOLS_DB_SIMULATION_MODEL_URL for this.

@VictorBarbosaMartins
Copy link
Contributor

You don't need a SIMTOOLS_DB_SIMULATION_MODEL_URL for this.

My point is that it might be better to use SIMTOOLS_DB_SIMULATION_MODEL_URL (in case no --inputis passed to the tool) in order not to need to have an up to date model_parameters repository locally alongside simtools

@VictorBarbosaMartins VictorBarbosaMartins self-requested a review May 24, 2024 12:22
Copy link
Contributor

@VictorBarbosaMartins VictorBarbosaMartins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The application is intended for experts-only and the model parameters repository needs to be set up alongside simtools. Hence, for a normal user this is all fine. We only need to include the guidelines for the experts into the documentation. Thanks for these changes!

@GernotMaier GernotMaier merged commit 6fa26d0 into main May 24, 2024
15 checks passed
@GernotMaier GernotMaier deleted the model-versions-in-db branch May 24, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants