matador.compute package¶
The compute module contains three submodules, compute, batch and slurm.
The compute submodule contains the ComputeTask
class for performing
continually restarted geometry optimisation and SCF calculations in
CASTEP, as well as the execution of arbitrary programs with mpirun.
The batch submodule contains the BatchRun
class for running several
independent ComputeTask
instances on a folder of structures, without
clashes.
The slurm submodule provides a wrapper to useful slurm commands, and to writing slurm job submission files.
- class matador.compute.ComputeTask(res, ncores, nnodes, node, **kwargs)[source]¶
Bases:
object
The main use of this class is to call an executable on a given structure. The various parameters are passed to this class by the common entrypoints, run3 and ilustrado. It is unlikely that you will want to use this class directly. Each keyword is saved as an attribute for later use.
Note
By default, calculations are run inside a folder with the same name as the host (e.g. node12, or whatever). This decreases the load on parallel file systems such as Lustre.
Make the files to run the calculation and call the desired program.
- Parameters:
- Keyword Arguments:
param_dict (dict) – dictionary of CASTEP parameters
cell_dict (dict) – dictionary of CASTEP cell options
executable (str) – name of binary to execute (DEFAULT: ‘castep’). Special string $seed will be parsed as the seedname, e.g. executable = ‘pw6.x -i $seed.in > $seed.out’ (requires mode=’generic’).
mode (str) – either ‘castep’ or ‘generic’ (DEFAULT: ‘castep’)
custom_params (bool) – use custom per-structure param file (DEFAULT: False)
output_queue (multiprocessing.Queue) – write results to queue rather than file.
rough (int) – number of small “rough” calculations (DEFAULT: 4)
rough_iter (int) – number of iterations per rough calculation (DEFAULT: 2)
fine_iter (int) – number of iterations per fine calculation (DEFAULT: 20)
spin (bool) – break spin symmetry in first calculation by amount specified (DEFAULT: None if not present, 5 if no argument)
conv_cutoffs (
list
offloat
) – list of cutoffs to use for SCF convergence testconv_kpts (
list
offloat
) – list of kpt spacings to use for SCF convergence testkpts_1D (bool) – treat z-direction as special and create kpt_grid [1 1 n_kz] (DEFAULT: False)
noise (bool) – add noise to the positions (DEFAULT: False)
squeeze (bool/float) – add an external pressure to the first steps (DEFAULT: False)
archer (bool) – force use of aprun over mpirun (DEFAULT: False)
slurm (bool) – force use of srun over mpirun (DEFAULT: False)
intel (bool) – force use of Intel mpirun-style calls (DEFAULT: False)
redirect (str) – file to redirect stdout to (DEFAULT: /dev/null unless debug).
exec_test (bool) – test executable with
<exec> --version
before progressing (DEFAULT: True)start (bool) – begin calculation immediately or manually call it (DEFAULT: True)
reopt (bool) – whether to optimise one more time after success (DEFAULT: False)
memcheck (bool) – perform CASTEP dryrun to estimate memory usage, do not proceed if fails (DEFAULT: False)
maxmem (int) – maximum memory allowed in MB for memcheck (DEFAULT: None)
killcheck (bool) – check for file called $seed.kill during operation, and kill executable if present (DEFAULT: True)
compute_dir (str) – folder to run computations in; default is None (i.e. cwd), if not None, prepend paths with this folder
verbosity (int) – either 0, 1, 2 or >3, corresponding to ERROR, WARNING INFO and DEBUG logging levels.
(obj (timings) – tuple: of `obj`:int:): tuple containing max and elapsed time in seconds
- Raises:
WalltimeError – if desired/alotted walltime is exceeded, current run will be tidied up, ready to be restarted from intermediate state.
CriticalError – if a fatal error occurs, failed run will be moved to bad_castep and no further calculations will be attempted.
CalculationError – if a structure-level error occurs, causing the seed files to be moved to bad_castep.
- begin()[source]¶
Run the prepared ComputeTask. Catches CalculationError objects and cleans up, passing all other errors upwards.
- run_castep()[source]¶
Set up and run CASTEP calculation on the prepared structure,
self.res_dict
, using the parameters inself.cell_dict
andself.param_dict
.- Raises:
WalltimeError – if max_walltime is exceeded.
CriticalError – if no further calculations should be performed on this thread.
CalculationError – if this structure errored in some way, but others will hopefully be okay.
- Returns:
- True if calculations were successful, False otherwise.
In the case of convergence tests, this is always True unless every calculation fails.
- Return type:
- run_generic(intermediate=False, mv_bad_on_failure=True)[source]¶
Run a generic mpi program on the given seed. Files from completed runs are moved to “completed” (unless intermediate is True) and failed runs to “bad_castep”.
- Keyword Arguments:
- Returns:
True if calculations progressed without error.
- Return type:
- run_castep_relaxation(intermediate=False)[source]¶
Set up a structural relaxation that is restarted intermittently in order to re-mesh the kpoint grid. Completed calculations are moved to the “completed” folder, and failures to “bad_castep”.
- Keyword Arguments:
intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.
- Returns:
True iff structure was optimised, False otherwise.
- Return type:
- Raises:
CalculationError – if structure-level error occured.
CriticalError – if fatal global error occured.
WalltimeError – if walltime was reached, and jobs need to stop.
- run_castep_singleshot(calc_doc, seed, keep=True, intermediate=False)[source]¶
Perform a singleshot calculation with CASTEP. Singleshot runs do not attempt to remedy any errors raised.
Files from completed runs are moved to
completed
, if not in intermediate mode, and failed runs tobad_castep
.- Parameters:
- Keyword Arguments:
- Returns:
True iff SCF completed successfully, False otherwise.
- Return type:
- static validate_calc_doc(calc_doc, required, forbidden)[source]¶
Remove keys inside forbidden from calc_doc, and error if a required key is missing.
- Parameters:
- Raises:
AssertionError – if required key is missing.
- static get_seekpath_compliant_input(calc_doc, spacing, debug=False)[source]¶
Return seekpath cell/kpoint path for the given cell and spacing.
- run_convergence_tests(calc_doc)[source]¶
Run kpoint and cutoff_energy convergence tests based on options passed to ComputeTask.
- parse_executable(seed)[source]¶
Turn executable string into list with arguments to be executed.
Example
With
self.executable='castep17'
andseed='test'
,['castep17', 'test']
will be returned.Example
With
self.executable='pw6.x -i $seed.in > $seed.out'
andseed='test'
,['pw6.x', '-i', 'test.in', '>' 'test.out']
will be returned.
- test_exec()[source]¶
Test if <executable> –version returns a valid string.
- Raises:
CriticalError – if executable not found.
- property mpi_library¶
Property to store/compute desired MPI library.
- set_mpi_library()[source]¶
Combines command-line MPI arguments into string and calls MPI library detection is no args are present.
- static detect_mpi()[source]¶
Test which mpi library is being used when
mpirun
.- Returns:
‘intel’, ‘archer’, or ‘default’.
- Return type:
mpi_library (str)
- run_command(seed)[source]¶
Calls executable on seed with desired number of cores.
- Parameters:
seed (str) – seedname to pass append to CASTEP command, e.g. <seed> or –version.
- Returns:
process to run.
- Return type:
- mv_to_bad(seed)[source]¶
Move all files associated with “seed” to bad_castep, from both the compute directory (if it exists) and the root dir..
- Parameters:
seed (str) – filename of structure.
- mv_to_completed(seed, completed_dir='completed', keep=False, skip_existing=False)[source]¶
Move all associated files to completed, removing any remaining files in the root_folder and compute_dir.
- tidy_up(seed)[source]¶
Delete all created files before quitting.
- Parameters:
seed (str) – filename for structure.
- class matador.compute.BatchRun(seed, **kwargs)[source]¶
Bases:
object
A class that implements the running of multiple generic jobs on a series of files without collisions with other nodes using the ComputeTask class. Jobs that have been started are listed in
jobs.txt
, failed jobs are moved tobad_castep/
, completed jobs are moved tocompleted/
.Interface initially inspired by on run.pl, run2.pl and PyAIRSS class CastepRunner.
Check directory has valid contents and prepare log files and directories if not already prepared, then begin running calculations.
Note
This class is usually initialised by the run3 script, which has a full description of possible arguments.
- Parameters:
seed (
list
ofstr
) – single entry of param/cell file seed for CASTEP geometry optimisations of res files, or a list of filenames of$seed
to run arbitrary executables on. e.g.['LiAs']
if LiAs.cell and LiAs.param exist in cwd full of res files, e.g.2.['LiAs_1', 'LiAs_2']
if LiAs_1.in/LiAs_2.in exist, and executable = ‘pw6.x < $seed.in’.- Keyword Arguments:
matador/cli/run3.py. (Exhaustive list found in argparse parser inside) –
- spawn(join=False)[source]¶
Spawn processes to perform calculations.
- Keyword Arguments:
join (bool) – whether or not to attach to ComputeTask process. Useful for testing.
- perform_new_calculations(res_list, error_queue, proc_id)[source]¶
Perform all calculations that have not already failed or finished to completion.
- Parameters:
error_queue (multiprocessing.Queue) – queue to push exceptions to
proc_id (int) – process id for logging
- matador.compute.reset_job_folder(debug=False)[source]¶
Remove all lock files and clean up jobs.txt ready for job restart.
Note
This should be not called by a ComputeTask instance, in case other instances are running.
- Returns:
number of structures left to relax
- Return type:
num_remaining (int)
Submodules¶
matador.compute.batch module¶
This file implements the BatchRun class for chaining ComputeTask instances across several structures with high-throughput.
- class matador.compute.batch.BatchRun(seed, **kwargs)[source]¶
Bases:
object
A class that implements the running of multiple generic jobs on a series of files without collisions with other nodes using the ComputeTask class. Jobs that have been started are listed in
jobs.txt
, failed jobs are moved tobad_castep/
, completed jobs are moved tocompleted/
.Interface initially inspired by on run.pl, run2.pl and PyAIRSS class CastepRunner.
Check directory has valid contents and prepare log files and directories if not already prepared, then begin running calculations.
Note
This class is usually initialised by the run3 script, which has a full description of possible arguments.
- Parameters:
seed (
list
ofstr
) – single entry of param/cell file seed for CASTEP geometry optimisations of res files, or a list of filenames of$seed
to run arbitrary executables on. e.g.['LiAs']
if LiAs.cell and LiAs.param exist in cwd full of res files, e.g.2.['LiAs_1', 'LiAs_2']
if LiAs_1.in/LiAs_2.in exist, and executable = ‘pw6.x < $seed.in’.- Keyword Arguments:
matador/cli/run3.py. (Exhaustive list found in argparse parser inside) –
- spawn(join=False)[source]¶
Spawn processes to perform calculations.
- Keyword Arguments:
join (bool) – whether or not to attach to ComputeTask process. Useful for testing.
- perform_new_calculations(res_list, error_queue, proc_id)[source]¶
Perform all calculations that have not already failed or finished to completion.
- Parameters:
error_queue (multiprocessing.Queue) – queue to push exceptions to
proc_id (int) – process id for logging
- exception matador.compute.batch.BundledErrors[source]¶
Bases:
Exception
Raise this after collecting all exceptions from processes.
matador.compute.compute module¶
This file implements the ComputeTask
class for handling
calculations on a single structure.
- class matador.compute.compute.ComputeTask(res, ncores, nnodes, node, **kwargs)[source]¶
Bases:
object
The main use of this class is to call an executable on a given structure. The various parameters are passed to this class by the common entrypoints, run3 and ilustrado. It is unlikely that you will want to use this class directly. Each keyword is saved as an attribute for later use.
Note
By default, calculations are run inside a folder with the same name as the host (e.g. node12, or whatever). This decreases the load on parallel file systems such as Lustre.
Make the files to run the calculation and call the desired program.
- Parameters:
- Keyword Arguments:
param_dict (dict) – dictionary of CASTEP parameters
cell_dict (dict) – dictionary of CASTEP cell options
executable (str) – name of binary to execute (DEFAULT: ‘castep’). Special string $seed will be parsed as the seedname, e.g. executable = ‘pw6.x -i $seed.in > $seed.out’ (requires mode=’generic’).
mode (str) – either ‘castep’ or ‘generic’ (DEFAULT: ‘castep’)
custom_params (bool) – use custom per-structure param file (DEFAULT: False)
output_queue (multiprocessing.Queue) – write results to queue rather than file.
rough (int) – number of small “rough” calculations (DEFAULT: 4)
rough_iter (int) – number of iterations per rough calculation (DEFAULT: 2)
fine_iter (int) – number of iterations per fine calculation (DEFAULT: 20)
spin (bool) – break spin symmetry in first calculation by amount specified (DEFAULT: None if not present, 5 if no argument)
conv_cutoffs (
list
offloat
) – list of cutoffs to use for SCF convergence testconv_kpts (
list
offloat
) – list of kpt spacings to use for SCF convergence testkpts_1D (bool) – treat z-direction as special and create kpt_grid [1 1 n_kz] (DEFAULT: False)
noise (bool) – add noise to the positions (DEFAULT: False)
squeeze (bool/float) – add an external pressure to the first steps (DEFAULT: False)
archer (bool) – force use of aprun over mpirun (DEFAULT: False)
slurm (bool) – force use of srun over mpirun (DEFAULT: False)
intel (bool) – force use of Intel mpirun-style calls (DEFAULT: False)
redirect (str) – file to redirect stdout to (DEFAULT: /dev/null unless debug).
exec_test (bool) – test executable with
<exec> --version
before progressing (DEFAULT: True)start (bool) – begin calculation immediately or manually call it (DEFAULT: True)
reopt (bool) – whether to optimise one more time after success (DEFAULT: False)
memcheck (bool) – perform CASTEP dryrun to estimate memory usage, do not proceed if fails (DEFAULT: False)
maxmem (int) – maximum memory allowed in MB for memcheck (DEFAULT: None)
killcheck (bool) – check for file called $seed.kill during operation, and kill executable if present (DEFAULT: True)
compute_dir (str) – folder to run computations in; default is None (i.e. cwd), if not None, prepend paths with this folder
verbosity (int) – either 0, 1, 2 or >3, corresponding to ERROR, WARNING INFO and DEBUG logging levels.
(obj (timings) – tuple: of `obj`:int:): tuple containing max and elapsed time in seconds
- Raises:
WalltimeError – if desired/alotted walltime is exceeded, current run will be tidied up, ready to be restarted from intermediate state.
CriticalError – if a fatal error occurs, failed run will be moved to bad_castep and no further calculations will be attempted.
CalculationError – if a structure-level error occurs, causing the seed files to be moved to bad_castep.
- begin()[source]¶
Run the prepared ComputeTask. Catches CalculationError objects and cleans up, passing all other errors upwards.
- run_castep()[source]¶
Set up and run CASTEP calculation on the prepared structure,
self.res_dict
, using the parameters inself.cell_dict
andself.param_dict
.- Raises:
WalltimeError – if max_walltime is exceeded.
CriticalError – if no further calculations should be performed on this thread.
CalculationError – if this structure errored in some way, but others will hopefully be okay.
- Returns:
- True if calculations were successful, False otherwise.
In the case of convergence tests, this is always True unless every calculation fails.
- Return type:
- run_generic(intermediate=False, mv_bad_on_failure=True)[source]¶
Run a generic mpi program on the given seed. Files from completed runs are moved to “completed” (unless intermediate is True) and failed runs to “bad_castep”.
- Keyword Arguments:
- Returns:
True if calculations progressed without error.
- Return type:
- run_castep_relaxation(intermediate=False)[source]¶
Set up a structural relaxation that is restarted intermittently in order to re-mesh the kpoint grid. Completed calculations are moved to the “completed” folder, and failures to “bad_castep”.
- Keyword Arguments:
intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.
- Returns:
True iff structure was optimised, False otherwise.
- Return type:
- Raises:
CalculationError – if structure-level error occured.
CriticalError – if fatal global error occured.
WalltimeError – if walltime was reached, and jobs need to stop.
- run_castep_singleshot(calc_doc, seed, keep=True, intermediate=False)[source]¶
Perform a singleshot calculation with CASTEP. Singleshot runs do not attempt to remedy any errors raised.
Files from completed runs are moved to
completed
, if not in intermediate mode, and failed runs tobad_castep
.- Parameters:
- Keyword Arguments:
- Returns:
True iff SCF completed successfully, False otherwise.
- Return type:
- static validate_calc_doc(calc_doc, required, forbidden)[source]¶
Remove keys inside forbidden from calc_doc, and error if a required key is missing.
- Parameters:
- Raises:
AssertionError – if required key is missing.
- static get_seekpath_compliant_input(calc_doc, spacing, debug=False)[source]¶
Return seekpath cell/kpoint path for the given cell and spacing.
- run_convergence_tests(calc_doc)[source]¶
Run kpoint and cutoff_energy convergence tests based on options passed to ComputeTask.
- parse_executable(seed)[source]¶
Turn executable string into list with arguments to be executed.
Example
With
self.executable='castep17'
andseed='test'
,['castep17', 'test']
will be returned.Example
With
self.executable='pw6.x -i $seed.in > $seed.out'
andseed='test'
,['pw6.x', '-i', 'test.in', '>' 'test.out']
will be returned.
- test_exec()[source]¶
Test if <executable> –version returns a valid string.
- Raises:
CriticalError – if executable not found.
- property mpi_library¶
Property to store/compute desired MPI library.
- set_mpi_library()[source]¶
Combines command-line MPI arguments into string and calls MPI library detection is no args are present.
- static detect_mpi()[source]¶
Test which mpi library is being used when
mpirun
.- Returns:
‘intel’, ‘archer’, or ‘default’.
- Return type:
mpi_library (str)
- run_command(seed)[source]¶
Calls executable on seed with desired number of cores.
- Parameters:
seed (str) – seedname to pass append to CASTEP command, e.g. <seed> or –version.
- Returns:
process to run.
- Return type:
- mv_to_bad(seed)[source]¶
Move all files associated with “seed” to bad_castep, from both the compute directory (if it exists) and the root dir..
- Parameters:
seed (str) – filename of structure.
- mv_to_completed(seed, completed_dir='completed', keep=False, skip_existing=False)[source]¶
Move all associated files to completed, removing any remaining files in the root_folder and compute_dir.
- tidy_up(seed)[source]¶
Delete all created files before quitting.
- Parameters:
seed (str) – filename for structure.
matador.compute.pbs module¶
This file implements a simple interface to basic PBS functionality, mostly for monitoring walltime of jobs submitted via PBS.
- class matador.compute.pbs.PBSQueueManager[source]¶
Bases:
QueueManager
Wrapper for the PBS queueing system.
- token = 'pbs'¶
- get_walltime()[source]¶
Query available walltime with qstat on the current job.
- Parameters:
pbs_dict (dict) – pbs env parameters to query.
- Raises:
RuntimeError – if PBS_JOBID not present in slurm env.
subprocess.CalledProcessError – if unable to use qstat.
- Returns:
maximum allowed walltime time in seconds.
- Return type:
matador.compute.queueing module¶
This file implements a simple agnostic interface to various queueing systems.
- class matador.compute.queueing.QueueManager[source]¶
Bases:
ABC
Abstract base class for queue managers.
- token = None¶
- property walltime¶
Return the allotted walltime in seconds, returning None if not available.
- property ntasks¶
- property max_memory¶
Return the allotted memory in MB, returning None if not available.
- property array_id¶
- matador.compute.queueing.get_queue_env(token)[source]¶
Read os.environment variables for either PBS or SLURM prefixes, and return a dictionary of those vars only.
- Parameter:
token (str): choose one of either SLURM or PBS explicitly.
- matador.compute.queueing.get_queue_manager()[source]¶
Detects whether PBS, SLURM or neither is being used by probing the environment variables SLURM_NTASKS and PBS_TASKNUM.
- Returns:
either “slurm”, “pbs” or None.
- Return type:
str or None
- Raises:
SystemExit – if both SLURM and PBS were found.
matador.compute.slurm module¶
This file implements a simple interface to basic SLURM functionality, including creating and submitting slurm scripts and cancelling jobs.
- class matador.compute.slurm.SlurmQueueManager[source]¶
Bases:
QueueManager
Wrapper for the Slurm queueing system.
- token = 'slurm'¶
- get_walltime()[source]¶
Query available walltime with scontrol on the current job.
- Parameters:
slurm_dict (dict) – slurm env parameters to query.
- Raises:
RuntimeError – if SLURM_JOB_ID not present in slurm env.
subprocess.CalledProcessError – if unable to use scontrol.
- Returns:
maximum allowed walltime time in seconds.
- Return type:
- matador.compute.slurm.submit_slurm_script(slurm_fname, depend_on_job=None, num_array_tasks=None)[source]¶
Submit a SLURM job.
- Parameters:
slurm_fname (str) – SLURM job file to submit.
- Keyword Arguments:
- Raises:
subprocess.CalledProcessError – if jobfile doesn’t exist or has failed.
- Returns:
submitted SLURM job ID.
- Return type:
- matador.compute.slurm.get_slurm_header(slurm_dict, walltime_hrs, num_nodes=None)[source]¶
Write a SLURM script header from a set of slurm parameters.