matador.db package¶
The db module provides all the submodules that touch the database, with functionality to connect, add or refine database objects, and observe changes.
- class matador.db.Spatula(*args, settings=None)[source]¶
Bases:
object
The Spatula class implements methods to scrape folders and individual files for crystal structures and create a MongoDB document for each.
Files types that can be read are:
CASTEP .castep output
SHELX (from airss.pl / pyAIRSS) .res output
CASTEP .param, .cell input
This class will recursively scan directories from the cwd to find the files types above. Base filenames will be matched to prevent duplication of data from e.g. .castep and .res files. The following directory structures are recommended:
One .res file per structure and template .cell and
.param files that provide all CASTEP parameters that structures in this folder were ran at.
One .castep file per structure, containing all information. If
pseudopotential information is not present in the CASTEP file, this class will check for the corresponding .usp files and try to scrape those.
Set up arguments and initialise DB client.
Notes
Several arguments can be passed to this class from the command-line, and here are interpreted through *args:
- Parameters
db (str) – the name of the collection to import to.
scan (bool) – whether or not to just scan the directory, rather than importing (automatically sets dryrun to true).
dryrun (bool) – perform whole process, up to actually importing to the database.
tags (str) – apply this tag to each structure added to database.
force (bool) – override rules about which folders can be imported into main database.
recent_only (bool) – if true, sort file lists by modification date and stop scanning when a file that already exists in database is found.
- class matador.db.DatabaseChanges(collection_name: str, changeset_ind=0, action='view', override=False, mongo_settings=None)[source]¶
Bases:
object
Class to view and undo particular database changesets.
Parse arguments and run changes interface.
- Parameters
collection_name (str) – the base collection name to act upon
- Keyword Arguments
- class matador.db.Refiner(cursor, collection=None, task=None, mode='display', **kwargs)[source]¶
Bases:
object
Refiner implements methods to alter certain parts of the database in place, either in overwrite, set or compare/display mode. Current modifiables are space groups, substructures, the set of elements, tags and DOIs.
Parses args and initiates modification.
- Parameters
cursor (list of dicts) – matador cursor to refine.
- Keyword Arguments
collection (Collection) – mongodb collection to query/edit.
task (str/callable) – one of ‘sym’, ‘spg’, ‘elem_set’, ‘tag’, ‘doi’ or ‘source’, or a custom function that takes in and returns a cursor and field to modify.
mode (str) – one of ‘display’, ‘overwrite’, ‘set’.
- elem_set()[source]¶
Imbue documents with the set of elements, i.e. set(doc[‘atom_types’]), for quicker look-up.
- add_root_source()[source]¶
Add the “root_source” key to a document in the database, i.e. the name of the structure, minus file extension.
- matador.db.make_connection_to_collection(coll_names, check_collection=False, allow_changelog=False, mongo_settings=None, override=False, import_mode=False, quiet=True, debug=False)[source]¶
Connect to database of choice.
- Parameters
coll_names (str) – name of collection.
- Keyword Arguments
check_collection (bool) – check whether collections exist (forces connection)
allow_changelog (bool) – allow queries to collections with names prefixed by __
mongo_settings (dict) – dict containing mongo and related config
override (bool) – don’t ask for user input from stdin and assume all is well
quiet (bool) – don’t print very much.
- Returns
the connection to the database db (Database): the database to query collections (dict): Collection objects indexed by name
- Return type
client (MongoClient)
Submodules¶
matador.db.changes module¶
This file implements an interface for querying the __changelog_<collection> collections to allow for display and reversion of particular database changes.
- class matador.db.changes.DatabaseChanges(collection_name: str, changeset_ind=0, action='view', override=False, mongo_settings=None)[source]¶
Bases:
object
Class to view and undo particular database changesets.
Parse arguments and run changes interface.
- Parameters
collection_name (str) – the base collection name to act upon
- Keyword Arguments
matador.db.connect module¶
Some simple utilities for making DB connections.
- matador.db.connect.make_connection_to_collection(coll_names, check_collection=False, allow_changelog=False, mongo_settings=None, override=False, import_mode=False, quiet=True, debug=False)[source]¶
Connect to database of choice.
- Parameters
coll_names (str) – name of collection.
- Keyword Arguments
check_collection (bool) – check whether collections exist (forces connection)
allow_changelog (bool) – allow queries to collections with names prefixed by __
mongo_settings (dict) – dict containing mongo and related config
override (bool) – don’t ask for user input from stdin and assume all is well
quiet (bool) – don’t print very much.
- Returns
the connection to the database db (Database): the database to query collections (dict): Collection objects indexed by name
- Return type
client (MongoClient)
matador.db.importer module¶
This file implements the base class Spatula that calls the scrapers and interfaces with the MongoDB client.
- class matador.db.importer.Spatula(*args, settings=None)[source]¶
Bases:
object
The Spatula class implements methods to scrape folders and individual files for crystal structures and create a MongoDB document for each.
Files types that can be read are:
CASTEP .castep output
SHELX (from airss.pl / pyAIRSS) .res output
CASTEP .param, .cell input
This class will recursively scan directories from the cwd to find the files types above. Base filenames will be matched to prevent duplication of data from e.g. .castep and .res files. The following directory structures are recommended:
One .res file per structure and template .cell and
.param files that provide all CASTEP parameters that structures in this folder were ran at.
One .castep file per structure, containing all information. If
pseudopotential information is not present in the CASTEP file, this class will check for the corresponding .usp files and try to scrape those.
Set up arguments and initialise DB client.
Notes
Several arguments can be passed to this class from the command-line, and here are interpreted through *args:
- Parameters
db (str) – the name of the collection to import to.
scan (bool) – whether or not to just scan the directory, rather than importing (automatically sets dryrun to true).
dryrun (bool) – perform whole process, up to actually importing to the database.
tags (str) – apply this tag to each structure added to database.
force (bool) – override rules about which folders can be imported into main database.
recent_only (bool) – if true, sort file lists by modification date and stop scanning when a file that already exists in database is found.
matador.db.refine module¶
This module contains functionality to update and overwrite database entries with specific tasks, e.g. symmetry and substructure analysis. This can be especially useful for “patching” the data in databases created with older matador versions.
- class matador.db.refine.Refiner(cursor, collection=None, task=None, mode='display', **kwargs)[source]¶
Bases:
object
Refiner implements methods to alter certain parts of the database in place, either in overwrite, set or compare/display mode. Current modifiables are space groups, substructures, the set of elements, tags and DOIs.
Parses args and initiates modification.
- Parameters
cursor (list of dicts) – matador cursor to refine.
- Keyword Arguments
collection (Collection) – mongodb collection to query/edit.
task (str/callable) – one of ‘sym’, ‘spg’, ‘elem_set’, ‘tag’, ‘doi’ or ‘source’, or a custom function that takes in and returns a cursor and field to modify.
mode (str) – one of ‘display’, ‘overwrite’, ‘set’.
- elem_set()[source]¶
Imbue documents with the set of elements, i.e. set(doc[‘atom_types’]), for quicker look-up.
- add_root_source()[source]¶
Add the “root_source” key to a document in the database, i.e. the name of the structure, minus file extension.