matador.scrapers package

The scrapers module implements the construction of Python dicts from many common DFT/crystallography file formats.

Currently supported filetypes:

  • CASTEP filetypes: .cell, .param, .castep, .bands, .phonon, .usp

  • Custom .res type based on SHELX files, first used by AIRSS package https://www.mtg.msm.cam.ac.uk/Codes/AIRSS

  • OptaDOS output: .odo, .adaptive.dat

  • Quantum Espresso output files: .out

  • Magres files: .magres

  • Crystallographic Information File (via ASE): .cif

matador.scrapers.res2dict(fname, db=True, **kwargs)[source]

Extract available information from .res file; preferably used in conjunction with cell or param file.

Parameters:

fname (str or list) – filename or list of filenames of res file(s) (with or without extension).

Keyword Arguments:

db (bool) – whether to fail if unable to scrape energies.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.cell2dict(fname, db=False, lattice=True, positions=True, **kwargs)[source]

Extract available information from .cell file; probably to be merged with another dict from a .param or .res file.

Parameters:

fname (str/list) – filename or list of filenames of cell file(s) to scrape, with or without extension.

Keyword Arguments:
  • db (bool) – scrape database quality file

  • lattice (bool) – scrape lattice vectors

  • positions (bool) – scrape positions

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.param2dict(fname, db=True, **kwargs)[source]

Extract available information from .param file; probably to be merged with other dicts from other files.

Parameters:

fname (str/list) – param filename or list of filenames with or without file extension

Keyword Arguments:

db (bool) – if True, only scrape relevant info, otherwise scrape all

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep2dict(fname, db=True, intermediates=False, timings=False, **kwargs)[source]

From seed filename, create dict of the most relevant information about a calculation.

Parameters:

fname (str/list) – filename or list of filenames of castep file(s)

Keyword Arguments:
  • db (bool) – whether to error on missing relaxation info

  • intermediates (bool) – instead of a single dict containing the relaxed structure return a list of snapshots found in .castep file

  • timings (bool) – Run through the CASTEP file one extra time to calculate total time taken.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.bands2dict(fname, **kwargs)[source]

Parse a CASTEP bands file into a dictionary, which can be used as input to an matador.orm.spectral.ElectronicDispersion object.

Parameters:

fname (str/list) – filename of list of filenames to be scraped.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.arbitrary2dict(fname, **kwargs)[source]

Read arbitrary CASTEP-style input files into a dictionary.

Parameters:

fname (str/list) – filename or list of filenames.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.phonon2dict(fname, **kwargs)[source]

Parse a CASTEP phonon file into a dictionary.

Parameters:

fname (str/list) – phonon filename or list of filenames.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.phonon_dos2dict(*args, **kwargs)[source]

Wrapper for old phonon DOS scraper, which has since been merged with phonon2dict. Note that this function still has a different effect to phonon2dict when as_model is used as the results will be cast into a VibrationalDOS object.

matador.scrapers.optados2dict(fname, **kwargs)[source]

Scrape optados output file (..dat) or (.pdos..dat) for DOS, projectors and projected DOS/dispersion.

Parameters:

fname (str/list) – optados filename or list of filenames.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.usp2dict(fname, **kwargs)[source]

Extract pseudopotential string from a CASTEP OTF .USP file.

Parameters:

fname (str/list) – filename of usp file, or list of filenames.

Returns:

partial species_pot dict from usp file.

Return type:

dict

matador.scrapers.pwout2dict(fname, **kwargs)[source]

Extract available information from pw.x .out file.

Parameters:

fname (str/list) – filename or list of filenames to scrape as a QuantumEspresso pw.x output.

matador.scrapers.magres2dict(fname, **kwargs)[source]

Extract available information from .magres file. Assumes units of Angstrom and ppm for relevant quantities.

matador.scrapers.cif2dict(fname, **kwargs)[source]

Extract available information from .cif file and store as a dictionary. Raw cif data is stored under the '_cif' key. Symmetric sites are expanded by the symmetry operations and their occupancies are tracked.

Parameters:

fname (str/list) – filename or list of filenames of .cif file(s) (with or without extension).

Returns:

if successful, a dictionary containing scraped

data and True, if not, then an error string and False.

Return type:

(dict/str, bool)

Submodules

matador.scrapers.castep_scrapers module

This file implements the scraper functions for CASTEP-related inputs and outputs.

matador.scrapers.castep_scrapers.res2dict(fname, db=True, **kwargs)[source]

Extract available information from .res file; preferably used in conjunction with cell or param file.

Parameters:

fname (str or list) – filename or list of filenames of res file(s) (with or without extension).

Keyword Arguments:

db (bool) – whether to fail if unable to scrape energies.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.cell2dict(fname, db=False, lattice=True, positions=True, **kwargs)[source]

Extract available information from .cell file; probably to be merged with another dict from a .param or .res file.

Parameters:

fname (str/list) – filename or list of filenames of cell file(s) to scrape, with or without extension.

Keyword Arguments:
  • db (bool) – scrape database quality file

  • lattice (bool) – scrape lattice vectors

  • positions (bool) – scrape positions

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.param2dict(fname, db=True, **kwargs)[source]

Extract available information from .param file; probably to be merged with other dicts from other files.

Parameters:

fname (str/list) – param filename or list of filenames with or without file extension

Keyword Arguments:

db (bool) – if True, only scrape relevant info, otherwise scrape all

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.castep2dict(fname, db=True, intermediates=False, timings=False, **kwargs)[source]

From seed filename, create dict of the most relevant information about a calculation.

Parameters:

fname (str/list) – filename or list of filenames of castep file(s)

Keyword Arguments:
  • db (bool) – whether to error on missing relaxation info

  • intermediates (bool) – instead of a single dict containing the relaxed structure return a list of snapshots found in .castep file

  • timings (bool) – Run through the CASTEP file one extra time to calculate total time taken.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.bands2dict(fname, **kwargs)[source]

Parse a CASTEP bands file into a dictionary, which can be used as input to an matador.orm.spectral.ElectronicDispersion object.

Parameters:

fname (str/list) – filename of list of filenames to be scraped.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.arbitrary2dict(fname, **kwargs)[source]

Read arbitrary CASTEP-style input files into a dictionary.

Parameters:

fname (str/list) – filename or list of filenames.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.optados2dict(fname, **kwargs)[source]

Scrape optados output file (..dat) or (.pdos..dat) for DOS, projectors and projected DOS/dispersion.

Parameters:

fname (str/list) – optados filename or list of filenames.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.phonon2dict(fname, **kwargs)[source]

Parse a CASTEP phonon file into a dictionary.

Parameters:

fname (str/list) – phonon filename or list of filenames.

Returns:

containing either dict/str containing data or error, and a bool stating

if the scrape was successful.

Return type:

(tuple)

matador.scrapers.castep_scrapers.phonon_dos2dict(*args, **kwargs)[source]

Wrapper for old phonon DOS scraper, which has since been merged with phonon2dict. Note that this function still has a different effect to phonon2dict when as_model is used as the results will be cast into a VibrationalDOS object.

matador.scrapers.castep_scrapers.usp2dict(fname, **kwargs)[source]

Extract pseudopotential string from a CASTEP OTF .USP file.

Parameters:

fname (str/list) – filename of usp file, or list of filenames.

Returns:

partial species_pot dict from usp file.

Return type:

dict

matador.scrapers.castep_scrapers.get_seed_metadata(doc, seed)[source]

For a given document and seedname, look for ICSD CollCode, MaterialsProject IDs and DOIs to add to the document.

Parameters:
  • doc (dict) – the input document.

  • seed (str) – the filename that is being scraped (sans file extension).

matador.scrapers.cif_scraper module

This file implements the scraper functions for the CIF (Crystallographic Information File) format.

matador.scrapers.cif_scraper.cif2dict(fname, **kwargs)[source]

Extract available information from .cif file and store as a dictionary. Raw cif data is stored under the '_cif' key. Symmetric sites are expanded by the symmetry operations and their occupancies are tracked.

Parameters:

fname (str/list) – filename or list of filenames of .cif file(s) (with or without extension).

Returns:

if successful, a dictionary containing scraped

data and True, if not, then an error string and False.

Return type:

(dict/str, bool)

matador.scrapers.magres_scrapers module

This submodule implements some scraper functions for NMR-related inputs and outputs, e.g. .magres files.

matador.scrapers.magres_scrapers.magres2dict(fname, **kwargs)[source]

Extract available information from .magres file. Assumes units of Angstrom and ppm for relevant quantities.

matador.scrapers.qe_scrapers module

This file implements some scraper functions for Quantum Espresso-related inputs and outputs.

matador.scrapers.qe_scrapers.pwout2dict(fname, **kwargs)[source]

Extract available information from pw.x .out file.

Parameters:

fname (str/list) – filename or list of filenames to scrape as a QuantumEspresso pw.x output.

matador.scrapers.utils module

This file defines some useful scraper functionality, like custom errors and a scraper function wrapper.

matador.scrapers.utils.get_flines_extension_agnostic(fname, ext)[source]

Try to open and read the filename provided, if it doesn’t exist then try adding the given file extension to it.

Parameters:
  • fname (str) – the filename with or without extension.

  • ext (list of str or str) – the extension or list of file extensions to try, or None. Should not contain “.”.

Raises:

FileNotFoundError – if the file was not found in either form.

Returns:

the contents of the file and the filename.

Return type:

(list of str, str)

matador.scrapers.utils.scraper_function(function)[source]

Wrapper for scraper functions to handle exceptions and template the scraper functions to work for multiples files at once.

matador.scrapers.utils.f90_float_parse(val)[source]

Wrapper to float that handles Fortran’s horrible behaviour for float exponents <= 100, e.g. 1e-100 -> 1.0000000-100 rather than 1.000000E-100. Also handles “+” signs in front of numbers.

Parameters:

val (str) – the string to cast to a float.

exception matador.scrapers.utils.DFTError[source]

Bases: Exception

Quick DFT exception class for unconverged or non-useful calculations.

exception matador.scrapers.utils.ComputationError[source]

Bases: Exception

Raised when the calculation fails to do the DFT. Distinct from DFTError as this is an issue of numerics or chemistry, where this is raised for technical issues, e.g. CASTEP crashes.