matador.scrapers package

The scrapers module implements the construction of Python dicts from many common DFT/crystallography file formats.

Currently supported filetypes:

  • CASTEP filetypes: .cell, .param, .castep, .bands, .phonon, .usp

  • Custom .res type based on SHELX files, first used by AIRSS package https://www.mtg.msm.cam.ac.uk/Codes/AIRSS

  • OptaDOS output: .odo, .adaptive.dat

  • Quantum Espresso output files: .out

  • Magres files: .magres

  • Crystallographic Information File (via ASE): .cif

matador.scrapers.res2dict(fname, db=True, **kwargs)[source]

Extract available information from .res file; preferably used in conjunction with cell or param file.

Parameters

fname (str or list) – filename or list of filenames of res file(s) (with or without extension).

Keyword Arguments

db (bool) – whether to fail if unable to scrape energies.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.cell2dict(fname, db=False, lattice=True, positions=True, **kwargs)[source]

Extract available information from .cell file; probably to be merged with another dict from a .param or .res file.

Parameters

fname (str/list) – filename or list of filenames of cell file(s) to scrape, with or without extension.

Keyword Arguments
  • db (bool) – scrape database quality file

  • lattice (bool) – scrape lattice vectors

  • positions (bool) – scrape positions

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.param2dict(fname, db=True, **kwargs)[source]

Extract available information from .param file; probably to be merged with other dicts from other files.

Parameters

fname (str/list) – param filename or list of filenames with or without file extension

Keyword Arguments

db (bool) – if True, only scrape relevant info, otherwise scrape all

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep2dict(fname, db=True, intermediates=False, **kwargs)[source]

From seed filename, create dict of the most relevant information about a calculation.

Parameters

fname (str/list) – filename or list of filenames of castep file(s)

Keyword Arguments
  • db (bool) – whether to error on missing relaxation info

  • intermediates (bool) – instead of a single dict containing the relaxed structure return a list of snapshots found in .castep file

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.bands2dict(fname, **kwargs)[source]

Parse a CASTEP bands file into a dictionary, which can be used as input to an matador.orm.spectral.ElectronicDispersion object.

Parameters

fname (str/list) – filename of list of filenames to be scraped.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.arbitrary2dict(fname, **kwargs)[source]

Read arbitrary CASTEP-style input files into a dictionary.

Parameters

fname (str/list) – filename or list of filenames.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.phonon2dict(fname, **kwargs)[source]

Parse a CASTEP phonon file into a dictionary.

Parameters

fname (str/list) – phonon filename or list of filenames.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.phonon_dos2dict(*args, **kwargs)[source]

Wrapper for old phonon DOS scraper, which has since been merged with phonon2dict. Note that this function still has a different effect to phonon2dict when as_model is used as the results will be cast into a VibrationalDOS object.

matador.scrapers.optados2dict(fname, **kwargs)[source]

Scrape optados output file (..dat) or (.pdos..dat) for DOS, projectors and projected DOS/dispersion.

Parameters

fname (str/list) – optados filename or list of filenames.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.usp2dict(fname, **kwargs)[source]

Extract pseudopotential string from a CASTEP OTF .USP file.

Parameters

fname (str/list) – filename of usp file, or list of filenames.

Returns

partial species_pot dict from usp file.

Return type

dict

matador.scrapers.pwout2dict(fname, **kwargs)[source]

Extract available information from pw.x .out file.

Parameters

fname (str/list) – filename or list of filenames to scrape as a QuantumEspresso pw.x output.

matador.scrapers.magres2dict(fname, **kwargs)[source]

Extract available information from .magres file. Assumes units of Angstrom and ppm for relevant quantities.

matador.scrapers.cif2dict(fname, **kwargs)[source]

Extract available information from .cif file and store as a dictionary. Raw cif data is stored under the ‘_cif’ key. Symmetric sites are expanded by the symmetry operations and their occupancies are tracked.

Parameters

fname (str/list) – filename or list of filenames of .cif file(s) (with or without extension).

Returns

if successful, a dictionary containing scraped

data and True, if not, then an error string and False.

Return type

(dict/str, bool)

Submodules

matador.scrapers.castep_scrapers module

This file implements the scraper functions for CASTEP-related inputs and outputs.

matador.scrapers.castep_scrapers.res2dict(fname, db=True, **kwargs)[source]

Extract available information from .res file; preferably used in conjunction with cell or param file.

Parameters

fname (str or list) – filename or list of filenames of res file(s) (with or without extension).

Keyword Arguments

db (bool) – whether to fail if unable to scrape energies.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.cell2dict(fname, db=False, lattice=True, positions=True, **kwargs)[source]

Extract available information from .cell file; probably to be merged with another dict from a .param or .res file.

Parameters

fname (str/list) – filename or list of filenames of cell file(s) to scrape, with or without extension.

Keyword Arguments
  • db (bool) – scrape database quality file

  • lattice (bool) – scrape lattice vectors

  • positions (bool) – scrape positions

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.param2dict(fname, db=True, **kwargs)[source]

Extract available information from .param file; probably to be merged with other dicts from other files.

Parameters

fname (str/list) – param filename or list of filenames with or without file extension

Keyword Arguments

db (bool) – if True, only scrape relevant info, otherwise scrape all

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.castep2dict(fname, db=True, intermediates=False, **kwargs)[source]

From seed filename, create dict of the most relevant information about a calculation.

Parameters

fname (str/list) – filename or list of filenames of castep file(s)

Keyword Arguments
  • db (bool) – whether to error on missing relaxation info

  • intermediates (bool) – instead of a single dict containing the relaxed structure return a list of snapshots found in .castep file

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.bands2dict(fname, **kwargs)[source]

Parse a CASTEP bands file into a dictionary, which can be used as input to an matador.orm.spectral.ElectronicDispersion object.

Parameters

fname (str/list) – filename of list of filenames to be scraped.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.arbitrary2dict(fname, **kwargs)[source]

Read arbitrary CASTEP-style input files into a dictionary.

Parameters

fname (str/list) – filename or list of filenames.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.optados2dict(fname, **kwargs)[source]

Scrape optados output file (..dat) or (.pdos..dat) for DOS, projectors and projected DOS/dispersion.

Parameters

fname (str/list) – optados filename or list of filenames.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.phonon2dict(fname, **kwargs)[source]

Parse a CASTEP phonon file into a dictionary.

Parameters

fname (str/list) – phonon filename or list of filenames.

Returns

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type

(tuple)

matador.scrapers.castep_scrapers.phonon_dos2dict(*args, **kwargs)[source]

Wrapper for old phonon DOS scraper, which has since been merged with phonon2dict. Note that this function still has a different effect to phonon2dict when as_model is used as the results will be cast into a VibrationalDOS object.

matador.scrapers.castep_scrapers.usp2dict(fname, **kwargs)[source]

Extract pseudopotential string from a CASTEP OTF .USP file.

Parameters

fname (str/list) – filename of usp file, or list of filenames.

Returns

partial species_pot dict from usp file.

Return type

dict

matador.scrapers.castep_scrapers.get_seed_metadata(doc, seed)[source]

For a given document and seedname, look for ICSD CollCode, MaterialsProject IDs and DOIs to add to the document.

Parameters
  • doc (dict) – the input document.

  • seed (str) – the filename that is being scraped (sans file extension).

matador.scrapers.cif_scraper module

This file implements the scraper functions for the CIF (Crystallographic Information File) format.

matador.scrapers.cif_scraper.cif2dict(fname, **kwargs)[source]

Extract available information from .cif file and store as a dictionary. Raw cif data is stored under the ‘_cif’ key. Symmetric sites are expanded by the symmetry operations and their occupancies are tracked.

Parameters

fname (str/list) – filename or list of filenames of .cif file(s) (with or without extension).

Returns

if successful, a dictionary containing scraped

data and True, if not, then an error string and False.

Return type

(dict/str, bool)

matador.scrapers.magres_scrapers module

This submodule implements some scraper functions for NMR-related inputs and outputs, e.g. .magres files.

matador.scrapers.magres_scrapers.magres2dict(fname, **kwargs)[source]

Extract available information from .magres file. Assumes units of Angstrom and ppm for relevant quantities.

matador.scrapers.qe_scrapers module

This file implements some scraper functions for Quantum Espresso-related inputs and outputs.

matador.scrapers.qe_scrapers.pwout2dict(fname, **kwargs)[source]

Extract available information from pw.x .out file.

Parameters

fname (str/list) – filename or list of filenames to scrape as a QuantumEspresso pw.x output.

matador.scrapers.utils module

This file defines some useful scraper functionality, like custom errors and a scraper function wrapper.

matador.scrapers.utils.get_flines_extension_agnostic(fname, ext)[source]

Try to open and read the filename provided, if it doesn’t exist then try adding the given file extension to it.

Parameters
  • fname (str) – the filename with or without extension.

  • ext (list of str or str) – the extension or list of file extensions to try, or None. Should not contain “.”.

Raises

FileNotFoundError – if the file was not found in either form.

Returns

the contents of the file and the filename.

Return type

(list of str, str)

matador.scrapers.utils.scraper_function(function)[source]

Wrapper for scraper functions to handle exceptions and template the scraper functions to work for multiples files at once.

matador.scrapers.utils.f90_float_parse(val)[source]

Wrapper to float that handles Fortran’s horrible behaviour for float exponents <= 100, e.g. 1e-100 -> 1.0000000-100 rather than 1.000000E-100. Also handles “+” signs in front of numbers.

Parameters

val (str) – the string to cast to a float.

exception matador.scrapers.utils.DFTError[source]

Bases: Exception

Quick DFT exception class for unconverged or non-useful calculations.

exception matador.scrapers.utils.ComputationError[source]

Bases: Exception

Raised when the calculation fails to do the DFT. Distinct from DFTError as this is an issue of numerics or chemistry, where this is raised for technical issues, e.g. CASTEP crashes.