matador.compute package¶

The compute module contains three submodules, compute, batch and slurm.

The compute submodule contains the ComputeTask class for performing continually restarted geometry optimisation and SCF calculations in CASTEP, as well as the execution of arbitrary programs with mpirun.

The batch submodule contains the BatchRun class for running several independent ComputeTask instances on a folder of structures, without clashes.

The slurm submodule provides a wrapper to useful slurm commands, and to writing slurm job submission files.

class matador.compute.ComputeTask(res, ncores, nnodes, node, **kwargs)[source]¶

Bases: object

The main use of this class is to call an executable on a given structure. The various parameters are passed to this class by the common entrypoints, run3 and ilustrado. It is unlikely that you will want to use this class directly. Each keyword is saved as an attribute for later use.

self.final_result¶

stores the final result of the calculation, if it was successful.

Type:: dict

Note

By default, calculations are run inside a folder with the same name as the host (e.g. node12, or whatever). This decreases the load on parallel file systems such as Lustre.

Make the files to run the calculation and call the desired program.

Parameters:

res (str/dict) – filename or input structure dict
ncores (int) – number of cores per node for mpirun call
nnodes (int) – number of nodes for mpirun call (if None, use 1)
node (str) – node name to run on (if None, run on localhost)

Keyword Arguments:

param_dict (dict) – dictionary of CASTEP parameters
cell_dict (dict) – dictionary of CASTEP cell options
executable (str) – name of binary to execute (DEFAULT: ‘castep’). Special string $seed will be parsed as the seedname, e.g. executable = ‘pw6.x -i $seed.in > $seed.out’ (requires mode=’generic’).
mode (str) – either ‘castep’ or ‘generic’ (DEFAULT: ‘castep’)
custom_params (bool) – use custom per-structure param file (DEFAULT: False)
output_queue (multiprocessing.Queue) – write results to queue rather than file.
rough (int) – number of small “rough” calculations (DEFAULT: 4)
rough_iter (int) – number of iterations per rough calculation (DEFAULT: 2)
fine_iter (int) – number of iterations per fine calculation (DEFAULT: 20)
spin (bool) – break spin symmetry in first calculation by amount specified (DEFAULT: None if not present, 5 if no argument)
conv_cutoffs (list of float) – list of cutoffs to use for SCF convergence test
conv_kpts (list of float) – list of kpt spacings to use for SCF convergence test
kpts_1D (bool) – treat z-direction as special and create kpt_grid [1 1 n_kz] (DEFAULT: False)
noise (bool) – add noise to the positions (DEFAULT: False)
squeeze (bool/float) – add an external pressure to the first steps (DEFAULT: False)
archer (bool) – force use of aprun over mpirun (DEFAULT: False)
slurm (bool) – force use of srun over mpirun (DEFAULT: False)
intel (bool) – force use of Intel mpirun-style calls (DEFAULT: False)
redirect (str) – file to redirect stdout to (DEFAULT: /dev/null unless debug).
exec_test (bool) – test executable with <exec> --version before progressing (DEFAULT: True)
start (bool) – begin calculation immediately or manually call it (DEFAULT: True)
reopt (bool) – whether to optimise one more time after success (DEFAULT: False)
memcheck (bool) – perform CASTEP dryrun to estimate memory usage, do not proceed if fails (DEFAULT: False)
maxmem (int) – maximum memory allowed in MB for memcheck (DEFAULT: None)
killcheck (bool) – check for file called $seed.kill during operation, and kill executable if present (DEFAULT: True)
compute_dir (str) – folder to run computations in; default is None (i.e. cwd), if not None, prepend paths with this folder
verbosity (int) – either 0, 1, 2 or >3, corresponding to ERROR, WARNING INFO and DEBUG logging levels.
(obj (timings) – tuple: of `obj`:int:): tuple containing max and elapsed time in seconds

Raises:

WalltimeError – if desired/alotted walltime is exceeded, current run will be tidied up, ready to be restarted from intermediate state.
CriticalError – if a fatal error occurs, failed run will be moved to bad_castep and no further calculations will be attempted.
CalculationError – if a structure-level error occurs, causing the seed files to be moved to bad_castep.

begin()[source]¶: Run the prepared ComputeTask. Catches CalculationError objects and cleans up, passing all other errors upwards.

run_castep()[source]¶

Set up and run CASTEP calculation on the prepared structure, self.res_dict, using the parameters in self.cell_dict and self.param_dict.

Raises:

WalltimeError – if max_walltime is exceeded.
CriticalError – if no further calculations should be performed on this thread.
CalculationError – if this structure errored in some way, but others will hopefully be okay.

Returns:

True if calculations were successful, False otherwise.: In the case of convergence tests, this is always True unless every calculation fails.

Return type:

bool

run_generic(intermediate=False, mv_bad_on_failure=True)[source]¶

Run a generic mpi program on the given seed. Files from completed runs are moved to “completed” (unless intermediate is True) and failed runs to “bad_castep”.

Keyword Arguments:

intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.
mv_bad_on_failure (bool) – whether to move files to bad_castep on failure, or leave them in place.

Returns:

True if calculations progressed without error.

Return type:

bool

run_castep_relaxation(intermediate=False)[source]¶

Set up a structural relaxation that is restarted intermittently in order to re-mesh the kpoint grid. Completed calculations are moved to the “completed” folder, and failures to “bad_castep”.

Keyword Arguments:

intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.

Returns:

True iff structure was optimised, False otherwise.

Return type:

bool

Raises:

CalculationError – if structure-level error occured.
CriticalError – if fatal global error occured.
WalltimeError – if walltime was reached, and jobs need to stop.

run_castep_singleshot(calc_doc, seed, keep=True, intermediate=False)[source]¶

Perform a singleshot calculation with CASTEP. Singleshot runs do not attempt to remedy any errors raised.

Files from completed runs are moved to completed, if not in intermediate mode, and failed runs to bad_castep.

Parameters:

calc_doc (dict) – dictionary containing parameters and structure
seed (str) – structure filename

Keyword Arguments:

intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.
keep (bool) – whether to keep intermediate files e.g. .bands

Returns:

True iff SCF completed successfully, False otherwise.

Return type:

bool

static validate_calc_doc(calc_doc, required, forbidden)[source]¶

Remove keys inside forbidden from calc_doc, and error if a required key is missing.

Parameters:

calc_doc (dict) – dictionary of structure and parameters.
required (list) – list of required key strings.
forbidden (list) – list of forbidden keys.

Raises:

AssertionError – if required key is missing.

static get_seekpath_compliant_input(calc_doc, spacing, debug=False)[source]¶

Return seekpath cell/kpoint path for the given cell and spacing.

Parameters:

calc_doc (dict) – structural and calculation parameters.
spacing (float) – desired kpoint path spacing.

Returns:

dictionary containing the standardised unit cell: and list containing the kpoints.

Return type:

(dict, list)

run_convergence_tests(calc_doc)[source]¶

Run kpoint and cutoff_energy convergence tests based on options passed to ComputeTask.

Parameters:: calc_doc (dict) – the structure to converge.
Returns:: True unless every single calculation failed.
Return type:: bool

parse_executable(seed)[source]¶

Turn executable string into list with arguments to be executed.

Example

With self.executable='castep17' and seed='test', ['castep17', 'test'] will be returned.

Example

With self.executable='pw6.x -i $seed.in > $seed.out' and seed='test', ['pw6.x', '-i', 'test.in', '>' 'test.out'] will be returned.

Parameters:: seed (str) – filename to replace $seed with in command.
Returns:: list called by subprocess.POpen.
Return type:: list of str

test_exec()[source]¶

Test if <executable> –version returns a valid string.

Raises:: CriticalError – if executable not found.

property mpi_library¶: Property to store/compute desired MPI library.

set_mpi_library()[source]¶: Combines command-line MPI arguments into string and calls MPI library detection is no args are present.

static detect_mpi()[source]¶

Test which mpi library is being used when mpirun.

Returns:: ‘intel’, ‘archer’, or ‘default’.
Return type:: mpi_library (str)

do_memcheck(calc_doc, seed)[source]¶

Perform a CASTEP dryrun to estimate memory usage.

Parameters:

calc_doc (dict) – dictionary of structure and CASTEP parameters
seed (str) – filename for structure

Returns:

True if the memory estimate is <90% of node RAM or: self.maxmem, if set

Return type:

bool

run_command(seed)[source]¶

Calls executable on seed with desired number of cores.

Parameters:: seed (str) – seedname to pass append to CASTEP command, e.g. <seed> or –version.
Returns:: process to run.
Return type:: subprocess.Popen

mv_to_bad(seed)[source]¶

Move all files associated with “seed” to bad_castep, from both the compute directory (if it exists) and the root dir..

Parameters:: seed (str) – filename of structure.

mv_to_completed(seed, completed_dir='completed', keep=False, skip_existing=False)[source]¶

Move all associated files to completed, removing any remaining files in the root_folder and compute_dir.

Parameters:

seed (str) – filename for structure.

Keyword Arguments:

completed_dir (str) – folder for completed jobs.
keep (bool) – whether to also move intermediate files.
skip_existing (bool) – if True, skip files that already exist, otherwise throw an error.

cp_to_input(seed, ext='res', glob_files=False)[source]¶

Copy initial cell and res to input folder.

Parameters:

seed (str) – filename of structure.

Keyword Arguments:

ext (str) – file extension for structure.
glob_files (bool) – whether to glob all related seed files.

tidy_up(seed)[source]¶

Delete all created files before quitting.

Parameters:: seed (str) – filename for structure.

static remove_compute_dir_if_finished(compute_dir)[source]¶

Delete the compute directory, provided it contains no calculation data.

Parameters:

compute_dir (str) – path to compute directory.

Returns:

True if folder was deleted as no res/castep files: were found, otherwise False.

Return type:

bool

scf(*args, **kwargs)[source]¶: Alias for backwards-compatibility.

relax(*args, **kwargs)[source]¶: Alias for backwards-compatibility.

class matador.compute.BatchRun(seed, **kwargs)[source]¶

Bases: object

A class that implements the running of multiple generic jobs on a series of files without collisions with other nodes using the ComputeTask class. Jobs that have been started are listed in jobs.txt, failed jobs are moved to bad_castep/, completed jobs are moved to completed/.

Interface initially inspired by on run.pl, run2.pl and PyAIRSS class CastepRunner.

Check directory has valid contents and prepare log files and directories if not already prepared, then begin running calculations.

Note

This class is usually initialised by the run3 script, which has a full description of possible arguments.

Parameters:: seed (list of str) – single entry of param/cell file seed for CASTEP geometry optimisations of res files, or a list of filenames of $seed to run arbitrary executables on. e.g. ['LiAs'] if LiAs.cell and LiAs.param exist in cwd full of res files, e.g.2. ['LiAs_1', 'LiAs_2'] if LiAs_1.in/LiAs_2.in exist, and executable = ‘pw6.x < $seed.in’.
Keyword Arguments:: matador/cli/run3.py. (Exhaustive list found in argparse parser inside) –

spawn(join=False)[source]¶

Spawn processes to perform calculations.

Keyword Arguments:: join (bool) – whether or not to attach to ComputeTask process. Useful for testing.

perform_new_calculations(res_list, error_queue, proc_id)[source]¶

Perform all calculations that have not already failed or finished to completion.

Parameters:

res_list (list of str) – list of structure filenames.
error_queue (multiprocessing.Queue) – queue to push exceptions to
proc_id (int) – process id for logging

generic_setup()[source]¶: Undo things that are set ready for CASTEP jobs…

castep_setup()[source]¶: Set up CASTEP jobs from res files, and $seed.cell/param.

convergence_run_setup()[source]¶: Set the correct args for a convergence run.

matador.compute.reset_job_folder(debug=False)[source]¶

Remove all lock files and clean up jobs.txt ready for job restart.

Note

This should be not called by a ComputeTask instance, in case other instances are running.

Returns:: number of structures left to relax
Return type:: num_remaining (int)

Submodules¶

matador.compute.batch module¶

This file implements the BatchRun class for chaining ComputeTask instances across several structures with high-throughput.

class matador.compute.batch.BatchRun(seed, **kwargs)[source]¶

Bases: object

A class that implements the running of multiple generic jobs on a series of files without collisions with other nodes using the ComputeTask class. Jobs that have been started are listed in jobs.txt, failed jobs are moved to bad_castep/, completed jobs are moved to completed/.

Interface initially inspired by on run.pl, run2.pl and PyAIRSS class CastepRunner.

Check directory has valid contents and prepare log files and directories if not already prepared, then begin running calculations.

Note

This class is usually initialised by the run3 script, which has a full description of possible arguments.

Parameters:: seed (list of str) – single entry of param/cell file seed for CASTEP geometry optimisations of res files, or a list of filenames of $seed to run arbitrary executables on. e.g. ['LiAs'] if LiAs.cell and LiAs.param exist in cwd full of res files, e.g.2. ['LiAs_1', 'LiAs_2'] if LiAs_1.in/LiAs_2.in exist, and executable = ‘pw6.x < $seed.in’.
Keyword Arguments:: matador/cli/run3.py. (Exhaustive list found in argparse parser inside) –

spawn(join=False)[source]¶

Spawn processes to perform calculations.

Keyword Arguments:: join (bool) – whether or not to attach to ComputeTask process. Useful for testing.

perform_new_calculations(res_list, error_queue, proc_id)[source]¶

Perform all calculations that have not already failed or finished to completion.

Parameters:

res_list (list of str) – list of structure filenames.
error_queue (multiprocessing.Queue) – queue to push exceptions to
proc_id (int) – process id for logging

generic_setup()[source]¶: Undo things that are set ready for CASTEP jobs…

castep_setup()[source]¶: Set up CASTEP jobs from res files, and $seed.cell/param.

convergence_run_setup()[source]¶: Set the correct args for a convergence run.

exception matador.compute.batch.BundledErrors[source]¶

Bases: Exception

Raise this after collecting all exceptions from processes.

matador.compute.batch.reset_job_folder(debug=False)[source]¶

Remove all lock files and clean up jobs.txt ready for job restart.

Note

This should be not called by a ComputeTask instance, in case other instances are running.

Returns:: number of structures left to relax
Return type:: num_remaining (int)

matador.compute.batch.reset_single_seed(seed)[source]¶

Remove the file lock and jobs.txt entry for a single seed.

Parameters:: seed (str) – the seedname to remove.

matador.compute.compute module¶

This file implements the ComputeTask class for handling calculations on a single structure.

class matador.compute.compute.ComputeTask(res, ncores, nnodes, node, **kwargs)[source]¶

Bases: object

The main use of this class is to call an executable on a given structure. The various parameters are passed to this class by the common entrypoints, run3 and ilustrado. It is unlikely that you will want to use this class directly. Each keyword is saved as an attribute for later use.

self.final_result¶

stores the final result of the calculation, if it was successful.

Type:: dict

Note

By default, calculations are run inside a folder with the same name as the host (e.g. node12, or whatever). This decreases the load on parallel file systems such as Lustre.

Make the files to run the calculation and call the desired program.

Parameters:

res (str/dict) – filename or input structure dict
ncores (int) – number of cores per node for mpirun call
nnodes (int) – number of nodes for mpirun call (if None, use 1)
node (str) – node name to run on (if None, run on localhost)

Keyword Arguments:

param_dict (dict) – dictionary of CASTEP parameters
cell_dict (dict) – dictionary of CASTEP cell options
executable (str) – name of binary to execute (DEFAULT: ‘castep’). Special string $seed will be parsed as the seedname, e.g. executable = ‘pw6.x -i $seed.in > $seed.out’ (requires mode=’generic’).
mode (str) – either ‘castep’ or ‘generic’ (DEFAULT: ‘castep’)
custom_params (bool) – use custom per-structure param file (DEFAULT: False)
output_queue (multiprocessing.Queue) – write results to queue rather than file.
rough (int) – number of small “rough” calculations (DEFAULT: 4)
rough_iter (int) – number of iterations per rough calculation (DEFAULT: 2)
fine_iter (int) – number of iterations per fine calculation (DEFAULT: 20)
spin (bool) – break spin symmetry in first calculation by amount specified (DEFAULT: None if not present, 5 if no argument)
conv_cutoffs (list of float) – list of cutoffs to use for SCF convergence test
conv_kpts (list of float) – list of kpt spacings to use for SCF convergence test
kpts_1D (bool) – treat z-direction as special and create kpt_grid [1 1 n_kz] (DEFAULT: False)
noise (bool) – add noise to the positions (DEFAULT: False)
squeeze (bool/float) – add an external pressure to the first steps (DEFAULT: False)
archer (bool) – force use of aprun over mpirun (DEFAULT: False)
slurm (bool) – force use of srun over mpirun (DEFAULT: False)
intel (bool) – force use of Intel mpirun-style calls (DEFAULT: False)
redirect (str) – file to redirect stdout to (DEFAULT: /dev/null unless debug).
exec_test (bool) – test executable with <exec> --version before progressing (DEFAULT: True)
start (bool) – begin calculation immediately or manually call it (DEFAULT: True)
reopt (bool) – whether to optimise one more time after success (DEFAULT: False)
memcheck (bool) – perform CASTEP dryrun to estimate memory usage, do not proceed if fails (DEFAULT: False)
maxmem (int) – maximum memory allowed in MB for memcheck (DEFAULT: None)
killcheck (bool) – check for file called $seed.kill during operation, and kill executable if present (DEFAULT: True)
compute_dir (str) – folder to run computations in; default is None (i.e. cwd), if not None, prepend paths with this folder
verbosity (int) – either 0, 1, 2 or >3, corresponding to ERROR, WARNING INFO and DEBUG logging levels.
(obj (timings) – tuple: of `obj`:int:): tuple containing max and elapsed time in seconds

Raises:

WalltimeError – if desired/alotted walltime is exceeded, current run will be tidied up, ready to be restarted from intermediate state.
CriticalError – if a fatal error occurs, failed run will be moved to bad_castep and no further calculations will be attempted.
CalculationError – if a structure-level error occurs, causing the seed files to be moved to bad_castep.

begin()[source]¶: Run the prepared ComputeTask. Catches CalculationError objects and cleans up, passing all other errors upwards.

run_castep()[source]¶

Set up and run CASTEP calculation on the prepared structure, self.res_dict, using the parameters in self.cell_dict and self.param_dict.

Raises:

WalltimeError – if max_walltime is exceeded.
CriticalError – if no further calculations should be performed on this thread.
CalculationError – if this structure errored in some way, but others will hopefully be okay.

Returns:

True if calculations were successful, False otherwise.: In the case of convergence tests, this is always True unless every calculation fails.

Return type:

bool

run_generic(intermediate=False, mv_bad_on_failure=True)[source]¶

Run a generic mpi program on the given seed. Files from completed runs are moved to “completed” (unless intermediate is True) and failed runs to “bad_castep”.

Keyword Arguments:

intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.
mv_bad_on_failure (bool) – whether to move files to bad_castep on failure, or leave them in place.

Returns:

True if calculations progressed without error.

Return type:

bool

run_castep_relaxation(intermediate=False)[source]¶

Set up a structural relaxation that is restarted intermittently in order to re-mesh the kpoint grid. Completed calculations are moved to the “completed” folder, and failures to “bad_castep”.

Keyword Arguments:

intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.

Returns:

True iff structure was optimised, False otherwise.

Return type:

bool

Raises:

CalculationError – if structure-level error occured.
CriticalError – if fatal global error occured.
WalltimeError – if walltime was reached, and jobs need to stop.

run_castep_singleshot(calc_doc, seed, keep=True, intermediate=False)[source]¶

Perform a singleshot calculation with CASTEP. Singleshot runs do not attempt to remedy any errors raised.

Files from completed runs are moved to completed, if not in intermediate mode, and failed runs to bad_castep.

Parameters:

calc_doc (dict) – dictionary containing parameters and structure
seed (str) – structure filename

Keyword Arguments:

intermediate (bool) – whether we want to run more calculations on the output of this, i.e. whether to move to completed or not.
keep (bool) – whether to keep intermediate files e.g. .bands

Returns:

True iff SCF completed successfully, False otherwise.

Return type:

bool

static validate_calc_doc(calc_doc, required, forbidden)[source]¶

Remove keys inside forbidden from calc_doc, and error if a required key is missing.

Parameters:

calc_doc (dict) – dictionary of structure and parameters.
required (list) – list of required key strings.
forbidden (list) – list of forbidden keys.

Raises:

AssertionError – if required key is missing.

static get_seekpath_compliant_input(calc_doc, spacing, debug=False)[source]¶

Return seekpath cell/kpoint path for the given cell and spacing.

Parameters:

calc_doc (dict) – structural and calculation parameters.
spacing (float) – desired kpoint path spacing.

Returns:

dictionary containing the standardised unit cell: and list containing the kpoints.

Return type:

(dict, list)

run_convergence_tests(calc_doc)[source]¶

Run kpoint and cutoff_energy convergence tests based on options passed to ComputeTask.

Parameters:: calc_doc (dict) – the structure to converge.
Returns:: True unless every single calculation failed.
Return type:: bool

parse_executable(seed)[source]¶

Turn executable string into list with arguments to be executed.

Example

With self.executable='castep17' and seed='test', ['castep17', 'test'] will be returned.

Example

With self.executable='pw6.x -i $seed.in > $seed.out' and seed='test', ['pw6.x', '-i', 'test.in', '>' 'test.out'] will be returned.

Parameters:: seed (str) – filename to replace $seed with in command.
Returns:: list called by subprocess.POpen.
Return type:: list of str

test_exec()[source]¶

Test if <executable> –version returns a valid string.

Raises:: CriticalError – if executable not found.

property mpi_library¶: Property to store/compute desired MPI library.

set_mpi_library()[source]¶: Combines command-line MPI arguments into string and calls MPI library detection is no args are present.

static detect_mpi()[source]¶

Test which mpi library is being used when mpirun.

Returns:: ‘intel’, ‘archer’, or ‘default’.
Return type:: mpi_library (str)

do_memcheck(calc_doc, seed)[source]¶

Perform a CASTEP dryrun to estimate memory usage.

Parameters:

calc_doc (dict) – dictionary of structure and CASTEP parameters
seed (str) – filename for structure

Returns:

True if the memory estimate is <90% of node RAM or: self.maxmem, if set

Return type:

bool

run_command(seed)[source]¶

Calls executable on seed with desired number of cores.

Parameters:: seed (str) – seedname to pass append to CASTEP command, e.g. <seed> or –version.
Returns:: process to run.
Return type:: subprocess.Popen

mv_to_bad(seed)[source]¶

Move all files associated with “seed” to bad_castep, from both the compute directory (if it exists) and the root dir..

Parameters:: seed (str) – filename of structure.

mv_to_completed(seed, completed_dir='completed', keep=False, skip_existing=False)[source]¶

Move all associated files to completed, removing any remaining files in the root_folder and compute_dir.

Parameters:

seed (str) – filename for structure.

Keyword Arguments:

completed_dir (str) – folder for completed jobs.
keep (bool) – whether to also move intermediate files.
skip_existing (bool) – if True, skip files that already exist, otherwise throw an error.

cp_to_input(seed, ext='res', glob_files=False)[source]¶

Copy initial cell and res to input folder.

Parameters:

seed (str) – filename of structure.

Keyword Arguments:

ext (str) – file extension for structure.
glob_files (bool) – whether to glob all related seed files.

tidy_up(seed)[source]¶

Delete all created files before quitting.

Parameters:: seed (str) – filename for structure.

static remove_compute_dir_if_finished(compute_dir)[source]¶

Delete the compute directory, provided it contains no calculation data.

Parameters:

compute_dir (str) – path to compute directory.

Returns:

True if folder was deleted as no res/castep files: were found, otherwise False.

Return type:

bool

scf(*args, **kwargs)[source]¶: Alias for backwards-compatibility.

relax(*args, **kwargs)[source]¶: Alias for backwards-compatibility.

matador.compute.pbs module¶

This file implements a simple interface to basic PBS functionality, mostly for monitoring walltime of jobs submitted via PBS.

class matador.compute.pbs.PBSQueueManager[source]¶

Bases: QueueManager

Wrapper for the PBS queueing system.

token = 'pbs'¶

get_ntasks()[source]¶

get_max_memory()[source]¶

get_array_id()[source]¶

get_walltime()[source]¶

Query available walltime with qstat on the current job.

Parameters:

pbs_dict (dict) – pbs env parameters to query.

Raises:

RuntimeError – if PBS_JOBID not present in slurm env.
subprocess.CalledProcessError – if unable to use qstat.

Returns:

maximum allowed walltime time in seconds.

Return type:

int

matador.compute.queueing module¶

This file implements a simple agnostic interface to various queueing systems.

class matador.compute.queueing.QueueManager[source]¶

Bases: ABC

Abstract base class for queue managers.

token = None¶

property walltime¶: Return the allotted walltime in seconds, returning None if not available.

property ntasks¶

property max_memory¶: Return the allotted memory in MB, returning None if not available.

property array_id¶

abstract get_walltime()[source]¶

abstract get_max_memory()[source]¶

abstract get_array_id()[source]¶

matador.compute.queueing.get_queue_env(token)[source]¶

Read os.environment variables for either PBS or SLURM prefixes, and return a dictionary of those vars only.

Parameter:: token (str): choose one of either SLURM or PBS explicitly.

Returns:

dictionary of keys from the detected/specified queue, and: a string containing either “slurm” or “pbs”.

Return type:

(dict, str)

matador.compute.queueing.get_queue_manager()[source]¶

Detects whether PBS, SLURM or neither is being used by probing the environment variables SLURM_NTASKS and PBS_TASKNUM.

Returns:: either “slurm”, “pbs” or None.
Return type:: str or None
Raises:: SystemExit – if both SLURM and PBS were found.

matador.compute.slurm module¶

This file implements a simple interface to basic SLURM functionality, including creating and submitting slurm scripts and cancelling jobs.

class matador.compute.slurm.SlurmQueueManager[source]¶

Bases: QueueManager

Wrapper for the Slurm queueing system.

token = 'slurm'¶

get_array_id()[source]¶

get_ntasks()[source]¶

get_max_memory()[source]¶

get_walltime()[source]¶

Query available walltime with scontrol on the current job.

Parameters:

slurm_dict (dict) – slurm env parameters to query.

Raises:

RuntimeError – if SLURM_JOB_ID not present in slurm env.
subprocess.CalledProcessError – if unable to use scontrol.

Returns:

maximum allowed walltime time in seconds.

Return type:

int

matador.compute.slurm.scancel_all_matching_jobs(name=None)[source]¶

Cancel all of the user’s jobs.

Keyword Arguments:: name (str) – optional name to pass to scancel
Returns:: output from scancel.
Return type:: str

matador.compute.slurm.submit_slurm_script(slurm_fname, depend_on_job=None, num_array_tasks=None)[source]¶

Submit a SLURM job.

Parameters:

slurm_fname (str) – SLURM job file to submit.

Keyword Arguments:

depend_on_job (int) – job ID to make current job depend on.
num_array_tasks (int) – number of array tasks to submit.

Raises:

subprocess.CalledProcessError – if jobfile doesn’t exist or has failed.

Returns:

submitted SLURM job ID.

Return type:

int

matador.compute.slurm.get_slurm_header(slurm_dict, walltime_hrs, num_nodes=None)[source]¶

Write a SLURM script header from a set of slurm parameters.

Parameters:

slurm_dict (dict) – dictionary of SLURM environment variables.
walltime_hrs (int) – allowed walltime in hours

Keyword Arguments:

num_nodes (int) – overrides $SLURM_JOB_NUM_NODES with a custom value.

Returns:

the SLURM file header.

Return type:

header (str)

matador.compute.slurm.write_slurm_submission_script(slurm_fname, slurm_dict, compute_string, walltime_hrs, template=None)[source]¶

Write a full slurm submission script based on the input settings.

Parameters:

slurm_fname (str) – the desired filename for the submission script
slurm_dict (dict) – dictionary of SLURM environment variables
compute_string (str) – the compute commands to run
walltime_hrs (int) – maximum walltime in hours

Keyword Arguments:

template (str) – filename containing job preamble, e.g. module loads