matador.db package¶
The db module provides all the submodules that touch the database, with functionality to connect, add or refine database objects, and observe changes.
- class matador.db.Spatula(*args, settings=None)[source]¶
Bases:
object
The Spatula class implements methods to scrape folders and individual files for crystal structures and create a MongoDB document for each.
Files types that can be read are:
CASTEP
.castep
outputSHELX (from airss.pl / pyAIRSS)
.res
outputCASTEP
.param
,.cell
input
This class will recursively scan directories from the cwd to find the files types above. Base filenames will be matched to prevent duplication of data from e.g.
.castep
and.res
files. The following directory structures are recommended:One
.res
file per structure and template.cell
and
.param
files that provide all CASTEP parameters that structures in this folder were ran at.One
.castep
file per structure, containing all information. If
pseudopotential information is not present in the CASTEP file, this class will check for the corresponding
.usp
files and try to scrape those.Set up arguments and initialise DB client.
Notes
Several arguments can be passed to this class from the command-line, and here are interpreted through *args:
- Parameters:
db (str) – the name of the collection to import to.
scan (bool) – whether or not to just scan the directory, rather than importing (automatically sets dryrun to true).
dryrun (bool) – perform whole process, up to actually importing to the database.
tags (str) – apply this tag to each structure added to database.
force (bool) – override rules about which folders can be imported into main database.
recent_only (bool) – if true, sort file lists by modification date and stop scanning when a file that already exists in database is found.
- class matador.db.DatabaseChanges(collection_name: str, changeset_ind=0, action='view', override=False, mongo_settings=None)[source]¶
Bases:
object
Class to view and undo particular database changesets.
Parse arguments and run changes interface.
- Parameters:
collection_name (str) – the base collection name to act upon
- Keyword Arguments:
- class matador.db.Refiner(cursor, collection=None, task=None, mode='display', **kwargs)[source]¶
Bases:
object
Refiner implements methods to alter certain parts of the database in place, either in overwrite, set or compare/display mode. Current modifiables are space groups, substructures, the set of elements, tags and DOIs.
Parses args and initiates modification.
- Parameters:
cursor (list of dicts) – matador cursor to refine.
- Keyword Arguments:
collection (Collection) – mongodb collection to query/edit.
task (str/callable) – one of ‘sym’, ‘spg’, ‘elem_set’, ‘tag’, ‘doi’ or ‘source’, or a custom function that takes in and returns a cursor and field to modify.
mode (str) – one of ‘display’, ‘overwrite’, ‘set’.
- elem_set()[source]¶
Imbue documents with the set of elements, i.e. set(doc[‘atom_types’]), for quicker look-up.
- add_root_source()[source]¶
Add the “root_source” key to a document in the database, i.e. the name of the structure, minus file extension.
- matador.db.make_connection_to_collection(coll_names, check_collection=False, allow_changelog=False, mongo_settings=None, override=False, import_mode=False, quiet=True, debug=False)[source]¶
Connect to database of choice.
- Parameters:
coll_names (str) – name of collection.
- Keyword Arguments:
check_collection (bool) – check whether collections exist (forces connection)
allow_changelog (bool) – allow queries to collections with names prefixed by __
mongo_settings (dict) – dict containing mongo and related config
override (bool) – don’t ask for user input from stdin and assume all is well
quiet (bool) – don’t print very much.
- Returns:
the connection to the database db (Database): the database to query collections (dict): Collection objects indexed by name
- Return type:
client (MongoClient)
Submodules¶
matador.db.changes module¶
This file implements an interface for querying the __changelog_<collection> collections to allow for display and reversion of particular database changes.
- class matador.db.changes.DatabaseChanges(collection_name: str, changeset_ind=0, action='view', override=False, mongo_settings=None)[source]¶
Bases:
object
Class to view and undo particular database changesets.
Parse arguments and run changes interface.
- Parameters:
collection_name (str) – the base collection name to act upon
- Keyword Arguments:
matador.db.connect module¶
Some simple utilities for making DB connections.
- matador.db.connect.make_connection_to_collection(coll_names, check_collection=False, allow_changelog=False, mongo_settings=None, override=False, import_mode=False, quiet=True, debug=False)[source]¶
Connect to database of choice.
- Parameters:
coll_names (str) – name of collection.
- Keyword Arguments:
check_collection (bool) – check whether collections exist (forces connection)
allow_changelog (bool) – allow queries to collections with names prefixed by __
mongo_settings (dict) – dict containing mongo and related config
override (bool) – don’t ask for user input from stdin and assume all is well
quiet (bool) – don’t print very much.
- Returns:
the connection to the database db (Database): the database to query collections (dict): Collection objects indexed by name
- Return type:
client (MongoClient)
matador.db.importer module¶
This file implements the base class Spatula that calls the scrapers and interfaces with the MongoDB client.
- class matador.db.importer.Spatula(*args, settings=None)[source]¶
Bases:
object
The Spatula class implements methods to scrape folders and individual files for crystal structures and create a MongoDB document for each.
Files types that can be read are:
CASTEP
.castep
outputSHELX (from airss.pl / pyAIRSS)
.res
outputCASTEP
.param
,.cell
input
This class will recursively scan directories from the cwd to find the files types above. Base filenames will be matched to prevent duplication of data from e.g.
.castep
and.res
files. The following directory structures are recommended:One
.res
file per structure and template.cell
and
.param
files that provide all CASTEP parameters that structures in this folder were ran at.One
.castep
file per structure, containing all information. If
pseudopotential information is not present in the CASTEP file, this class will check for the corresponding
.usp
files and try to scrape those.Set up arguments and initialise DB client.
Notes
Several arguments can be passed to this class from the command-line, and here are interpreted through *args:
- Parameters:
db (str) – the name of the collection to import to.
scan (bool) – whether or not to just scan the directory, rather than importing (automatically sets dryrun to true).
dryrun (bool) – perform whole process, up to actually importing to the database.
tags (str) – apply this tag to each structure added to database.
force (bool) – override rules about which folders can be imported into main database.
recent_only (bool) – if true, sort file lists by modification date and stop scanning when a file that already exists in database is found.
matador.db.refine module¶
This module contains functionality to update and overwrite database entries with specific tasks, e.g. symmetry and substructure analysis. This can be especially useful for “patching” the data in databases created with older matador versions.
- class matador.db.refine.Refiner(cursor, collection=None, task=None, mode='display', **kwargs)[source]¶
Bases:
object
Refiner implements methods to alter certain parts of the database in place, either in overwrite, set or compare/display mode. Current modifiables are space groups, substructures, the set of elements, tags and DOIs.
Parses args and initiates modification.
- Parameters:
cursor (list of dicts) – matador cursor to refine.
- Keyword Arguments:
collection (Collection) – mongodb collection to query/edit.
task (str/callable) – one of ‘sym’, ‘spg’, ‘elem_set’, ‘tag’, ‘doi’ or ‘source’, or a custom function that takes in and returns a cursor and field to modify.
mode (str) – one of ‘display’, ‘overwrite’, ‘set’.
- elem_set()[source]¶
Imbue documents with the set of elements, i.e. set(doc[‘atom_types’]), for quicker look-up.
- add_root_source()[source]¶
Add the “root_source” key to a document in the database, i.e. the name of the structure, minus file extension.