matador.db package

The db module provides all the submodules that touch the database, with functionality to connect, add or refine database objects, and observe changes.

class matador.db.Spatula(*args, settings=None)[source]

Bases: object

The Spatula class implements methods to scrape folders and individual files for crystal structures and create a MongoDB document for each.

Files types that can be read are:

  • CASTEP .castep output

  • SHELX (from airss.pl / pyAIRSS) .res output

  • CASTEP .param, .cell input

This class will recursively scan directories from the cwd to find the files types above. Base filenames will be matched to prevent duplication of data from e.g. .castep and .res files. The following directory structures are recommended:

  • One .res file per structure and template .cell and

.param files that provide all CASTEP parameters that structures in this folder were ran at.

  • One .castep file per structure, containing all information. If

pseudopotential information is not present in the CASTEP file, this class will check for the corresponding .usp files and try to scrape those.

Set up arguments and initialise DB client.

Notes

Several arguments can be passed to this class from the command-line, and here are interpreted through *args:

Parameters:
  • db (str) – the name of the collection to import to.

  • scan (bool) – whether or not to just scan the directory, rather than importing (automatically sets dryrun to true).

  • dryrun (bool) – perform whole process, up to actually importing to the database.

  • tags (str) – apply this tag to each structure added to database.

  • force (bool) – override rules about which folders can be imported into main database.

  • recent_only (bool) – if true, sort file lists by modification date and stop scanning when a file that already exists in database is found.

class matador.db.DatabaseChanges(collection_name: str, changeset_ind=0, action='view', override=False, mongo_settings=None)[source]

Bases: object

Class to view and undo particular database changesets.

Parse arguments and run changes interface.

Parameters:

collection_name (str) – the base collection name to act upon

Keyword Arguments:
  • changset_ind (int) – the number of the changset to act upon (1 is oldest)

  • action (str) – either ‘view’ or ‘undo’

  • override (bool) – override all options to positive answers for testing

  • mongo_settings (dict) – dictionary of already-sources mongo settings

static view_changeset(changeset, index)[source]

Prints all details about a particular changeset.

Parameters:
  • changeset (dict) – changeset stored in changelog database

  • index (int) – changeset index

static print_change_summary(curs)[source]

Prints a summary of changes.

Parameters:

curs (list) – cursor from changelog database

class matador.db.Refiner(cursor, collection=None, task=None, mode='display', **kwargs)[source]

Bases: object

Refiner implements methods to alter certain parts of the database in place, either in overwrite, set or compare/display mode. Current modifiables are space groups, substructures, the set of elements, tags and DOIs.

Parses args and initiates modification.

Parameters:

cursor (list of dicts) – matador cursor to refine.

Keyword Arguments:
  • collection (Collection) – mongodb collection to query/edit.

  • task (str/callable) – one of ‘sym’, ‘spg’, ‘elem_set’, ‘tag’, ‘doi’ or ‘source’, or a custom function that takes in and returns a cursor and field to modify.

  • mode (str) – one of ‘display’, ‘overwrite’, ‘set’.

update_docs()[source]

Updates documents in database with correct priority.

symmetry(symprec=0.001)[source]

Compute space group with spglib.

elem_set()[source]

Imbue documents with the set of elements, i.e. set(doc[‘atom_types’]), for quicker look-up.

add_tag()[source]

Add a tag to each document.

add_doi()[source]

Add a doi to each document.

add_root_source()[source]

Add the “root_source” key to a document in the database, i.e. the name of the structure, minus file extension.

tidy_pspots()[source]

Loop over all documents and make sure they only have pspots for the elements that exist in the structure.

add_raw_data()[source]

Loop over all documents in the query and try to open the files listed under their source fields, storing them under the _raw key.

matador.db.make_connection_to_collection(coll_names, check_collection=False, allow_changelog=False, mongo_settings=None, override=False, import_mode=False, quiet=True, debug=False)[source]

Connect to database of choice.

Parameters:

coll_names (str) – name of collection.

Keyword Arguments:
  • check_collection (bool) – check whether collections exist (forces connection)

  • allow_changelog (bool) – allow queries to collections with names prefixed by __

  • mongo_settings (dict) – dict containing mongo and related config

  • override (bool) – don’t ask for user input from stdin and assume all is well

  • quiet (bool) – don’t print very much.

Returns:

the connection to the database db (Database): the database to query collections (dict): Collection objects indexed by name

Return type:

client (MongoClient)

Submodules

matador.db.changes module

This file implements an interface for querying the __changelog_<collection> collections to allow for display and reversion of particular database changes.

class matador.db.changes.DatabaseChanges(collection_name: str, changeset_ind=0, action='view', override=False, mongo_settings=None)[source]

Bases: object

Class to view and undo particular database changesets.

Parse arguments and run changes interface.

Parameters:

collection_name (str) – the base collection name to act upon

Keyword Arguments:
  • changset_ind (int) – the number of the changset to act upon (1 is oldest)

  • action (str) – either ‘view’ or ‘undo’

  • override (bool) – override all options to positive answers for testing

  • mongo_settings (dict) – dictionary of already-sources mongo settings

static view_changeset(changeset, index)[source]

Prints all details about a particular changeset.

Parameters:
  • changeset (dict) – changeset stored in changelog database

  • index (int) – changeset index

static print_change_summary(curs)[source]

Prints a summary of changes.

Parameters:

curs (list) – cursor from changelog database

matador.db.connect module

Some simple utilities for making DB connections.

matador.db.connect.make_connection_to_collection(coll_names, check_collection=False, allow_changelog=False, mongo_settings=None, override=False, import_mode=False, quiet=True, debug=False)[source]

Connect to database of choice.

Parameters:

coll_names (str) – name of collection.

Keyword Arguments:
  • check_collection (bool) – check whether collections exist (forces connection)

  • allow_changelog (bool) – allow queries to collections with names prefixed by __

  • mongo_settings (dict) – dict containing mongo and related config

  • override (bool) – don’t ask for user input from stdin and assume all is well

  • quiet (bool) – don’t print very much.

Returns:

the connection to the database db (Database): the database to query collections (dict): Collection objects indexed by name

Return type:

client (MongoClient)

matador.db.connect.fuzzy_collname_match(trial, targets)[source]

Do a noddy fuzzy match for bits between punctuation, e.g. matthews_cool_database will search for matthews, cool and database in the known collection names.

Parameters:
  • trial (str) – database search name.

  • targets (list) – list of existing database names.

Returns:

list of roughly matching collection names ordered

by occurence of tokens.

Return type:

list

matador.db.importer module

This file implements the base class Spatula that calls the scrapers and interfaces with the MongoDB client.

class matador.db.importer.Spatula(*args, settings=None)[source]

Bases: object

The Spatula class implements methods to scrape folders and individual files for crystal structures and create a MongoDB document for each.

Files types that can be read are:

  • CASTEP .castep output

  • SHELX (from airss.pl / pyAIRSS) .res output

  • CASTEP .param, .cell input

This class will recursively scan directories from the cwd to find the files types above. Base filenames will be matched to prevent duplication of data from e.g. .castep and .res files. The following directory structures are recommended:

  • One .res file per structure and template .cell and

.param files that provide all CASTEP parameters that structures in this folder were ran at.

  • One .castep file per structure, containing all information. If

pseudopotential information is not present in the CASTEP file, this class will check for the corresponding .usp files and try to scrape those.

Set up arguments and initialise DB client.

Notes

Several arguments can be passed to this class from the command-line, and here are interpreted through *args:

Parameters:
  • db (str) – the name of the collection to import to.

  • scan (bool) – whether or not to just scan the directory, rather than importing (automatically sets dryrun to true).

  • dryrun (bool) – perform whole process, up to actually importing to the database.

  • tags (str) – apply this tag to each structure added to database.

  • force (bool) – override rules about which folders can be imported into main database.

  • recent_only (bool) – if true, sort file lists by modification date and stop scanning when a file that already exists in database is found.

matador.db.refine module

This module contains functionality to update and overwrite database entries with specific tasks, e.g. symmetry and substructure analysis. This can be especially useful for “patching” the data in databases created with older matador versions.

class matador.db.refine.Refiner(cursor, collection=None, task=None, mode='display', **kwargs)[source]

Bases: object

Refiner implements methods to alter certain parts of the database in place, either in overwrite, set or compare/display mode. Current modifiables are space groups, substructures, the set of elements, tags and DOIs.

Parses args and initiates modification.

Parameters:

cursor (list of dicts) – matador cursor to refine.

Keyword Arguments:
  • collection (Collection) – mongodb collection to query/edit.

  • task (str/callable) – one of ‘sym’, ‘spg’, ‘elem_set’, ‘tag’, ‘doi’ or ‘source’, or a custom function that takes in and returns a cursor and field to modify.

  • mode (str) – one of ‘display’, ‘overwrite’, ‘set’.

update_docs()[source]

Updates documents in database with correct priority.

symmetry(symprec=0.001)[source]

Compute space group with spglib.

elem_set()[source]

Imbue documents with the set of elements, i.e. set(doc[‘atom_types’]), for quicker look-up.

add_tag()[source]

Add a tag to each document.

add_doi()[source]

Add a doi to each document.

add_root_source()[source]

Add the “root_source” key to a document in the database, i.e. the name of the structure, minus file extension.

tidy_pspots()[source]

Loop over all documents and make sure they only have pspots for the elements that exist in the structure.

add_raw_data()[source]

Loop over all documents in the query and try to open the files listed under their source fields, storing them under the _raw key.