Skip to content

Quick Start

Sul edited this page May 10, 2024 · 25 revisions

Here you will learn how to install the initial package, build the initial network, print it out, and perform some analysis.

Installation

GlobalChem is the graph network that has no dependencies and it's functionality built within the main object.

GlobalChemExtensions has a dependency network that is too cumbersome to deal with but has additional functionality for cheminformaticians (or anyone really) to perform analysis on chemical data including the GlobalChem Graph network.

The best way to interact with our API is to use one our official library distributed on PyPi

code

# Install via pip
pip install global-chem 

# Install the Extension Package
pip install global-chem[extensions]

Additional Dependency Features

Not everyone wants to install everything into their local environment which can be a very hefty especially something as large as GlobalChemTo combat this we partitioned some of the applications dependencies into different package dependencies that can be installed with the extra function from setuptools. Please refer to the master extension list about which app depends on where.

  • cheminformatics
  • bioinformatics
  • quantum_chemistry
  • development_operations
  • forcefields
  • graphing
  • all - All the extensions

code

pip install 'global-chem[graphing]'
pip install 'global-chem[forcefields]'
pip install 'global-chem[bioinformatics]'
pip install 'global-chem[cheminformatics]'
pip install 'global-chem[quantum_chemistry]'
pip install 'global-chem[development_operations]'
pip install 'global-chem[all]'

Good to know: global-chem-extensions dependencies are not linked to any specific versions in hopes for flexibility of other development environments.

Access Data

To build the GlobalChem Graph network we first import the package, initialize the class, and call the function :

code

gc = GlobalChem()
gc.build_global_chem_network(print_output=True)

output


'global_chem': {
    'children': [
        'environment',
        'miscellaneous',
        'organic_synthesis',
        'medicinal_chemistry',
        'narcotics',
        'interstellar_space',
        'proteins',
        'materials'
    ],
    'name': 'global_chem',
    'node_value': <global_chem.global_chem.Node object at 0x10f60eed0>,
    'parents': []
}, etc.

gc.print_globalchem_network()

code

gc.print_globalchem_network()

output

solventscommon_organic_solventsorganic_synthesis─└protecting_groupsamino_acid_protecting_groups
             │          ┌polymerscommon_monomer_repeating_unitsmaterials─└claymontmorillonite_adsorption
             │                            ┌privileged_kinase_inhibtors
             │                            ├privileged_scaffoldsproteinskinases─┌scaffolds─├iupac_blue_book_substituents
             │                 │          └common_r_group_replacements
             │                 └brafinhibitors
             │              ┌vitamins
             │              ├open_smilesmiscellaneous─├amino_acids
             │              └regex_patterns
global_chem──├environmentemerging_perfluoroalkyls
             │          ┌schedule_one
             │          ├schedule_four
             │          ├schedule_fivenarcotics─├pihkal
             │          ├schedule_two
             │          └schedule_threeinterstellar_space
             │                    ┌cannabinoids
             │                    │         ┌electrophillic_warheads_for_kinases
             │                    ├warheads─└common_warheads_covalent_inhibitorsmedicinal_chemistry─│      ┌phase_2_hetereocyclic_ringsrings─├iupac_blue_book_ringsrings_in_drugs
                                        

Data Analysis

Let's have some fun. Let's access a node and perform some PCA Analysis. We want to test whether an object functional groups share some similarity some arbitrary features and try to determine what those features specifically are. This will help understand features of relevance for small molecules.

We are going to look at the list of the molecules in pihkal because it's a pretty comprehensive list of what's on the drug market currently published on the wikipedia page. This will help us identify

PCA Analysis

code

from global_chem import GlobalChem
from global_chem_extensions import GlobalChemExtensions

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
smiles_list = list(gc.get_node_smiles('pihkal').values())

GlobalChemExtensions().node_pca_analysis(smiles_list, save_file=False)

plot

Graph

Radial Analysis

Let's have a look at how a list of emerging perfluoroalkyls to the rest of the nodes in the network using a radial analysis. For more details on the Radial Analysis algorithm please head over to the page.

code

from global_chem_extensions import GlobalChemExtensions

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)

smiles_list = list(gc.get_node_smiles('pihkal').values())
GlobalChemExtensions().sunburst_chemical_list(smiles_list, save_file=False)

plot

newplot - 2022-02-24T085008 217

If we have a quick look at the list of Pihkal, we can see that they are very similar to the covalent warheads. More nodes, and more in-ferment can be made about the data and up to the user verify :).

newplot - 2022-02-20T095100 543

Enjoy

Read more of the documentation or just start playing around with the data. This data takes some time to digest so patience is necessary when building you're own networks as well. Happy cheminformatics.

Clone this wiki locally