.. index:: run3_geom .. _run3_geom: Example 1: High-throughput geometry optimisations with CASTEP ------------------------------------------------------------- .. _ex1: Example 1.1: Using run3 locally ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this example, we will suppose that you want to perform a geometry optimisation on several different polymorphs of TiO\ :sub:`2` from the ICSD. The files for this example can be found in ``examples/run3_tutorial``, `here `_. Setting up the files ^^^^^^^^^^^^^^^^^^^^ By default, run3 expects the following files in the job folder: * one ``.res``/SHELX file `per structure` to optimise * one ``$seed.cell`` file that contains the CASTEP CELL keywords common to every structure, e.g. pseudopotentials, kpoint spacing, etc. * one ``$seed.param`` file that contains the CASTEP PARAM keywords common to every calculation, e.g. ``cut_off_energy``, ``xc_functional``, ``task``, ``geom_max_iter``, etc. * any required pseudopotential files. .. tip:: If you have a database set up with structures from the OQMD these could be obtained via ``matador query --db oqmd_1.1 -f TiO2 --icsd --res``. .. tip:: Alternatively, you can turn many file types into ``.res`` using the various ``3shx`` scripts (shx standing for SHELX), e.g. ``cell3shx *.cell``. The job folder should look something like this:: $ ls O2Ti-OQMD_112497-CollCode171670.res O2Ti-OQMD_2575-CollCode9852.res O2Ti-OQMD_7500-CollCode41493.res O2Ti-OQMD_117323-CollCode182578.res O2Ti-OQMD_3070-CollCode15328.res O2Ti-OQMD_84685-CollCode97008.res O2Ti-OQMD_13527-CollCode75179.res O2Ti-OQMD_31247-CollCode657748.res O2Ti-OQMD_97161-CollCode154036.res O2Ti-OQMD_19782-CollCode154035.res O2Ti-OQMD_5979-CollCode31122.res TiO2.cell O2Ti-OQMD_2475-CollCode9161.res O2Ti-OQMD_7408-CollCode41056.res TiO2.param with ``.param`` file containing:: $ cat TiO2.param task : geometryoptimization xc_functional : LDA cut_off_energy : 300.0 eV geom_force_tol : 0.1 spin_polarized : false fix_occupancy : false max_scf_cycles : 100 opt_strategy : speed page_wvfns : 0 perc_extra_bands : 40 num_dump_cycles : 0 backup_interval : 0 geom_method : LBFGS geom_max_iter : 300 mix_history_length : 20 finite_basis_corr : 0 fixed_npw : false write_cell_structure : true write_checkpoint : none write_bib : false bs_write_eigenvalues : false calculate_stress : true and ``.cell`` file containing:: $ cat TiO2.cell kpoint_mp_spacing: 0.07 %block species_pot QC5 %endblock species_pot symmetry_generate symmetry_tol: 0.01 snap_to_symmetry .. highlight:: bash Calling run3 ^^^^^^^^^^^^ Once these files are in place, we can begin the geometry optimisations. To run the current host machine, simply call:: $ run3 TiO2 This will start a single node CASTEP job on the current machine, using all available cores. If you are on a local cluster without a queuing system, and wish to run on several nodes at once (say ``node3``, ``node6`` and ``node8``), the oddjob script can be used as follows:: $ oddjob 'run3 TiO2' -n 3 6 8 This will start 3 single node CASTEP jobs on the desired nodes. If instead your nodes are called ``cpu00010912``, ``cpu323232`` and ``cpu123123``, the ``--prefix`` flag is needed:: $ oddjob 'run3 TiO2' --prefix cpu -n 00010912 323232 123123 .. tip:: On a supercomputer with a queuing system, e.g. PBS or slurm, run3 must be called in your submission script. Array jobs are typically an effective way of spreading out over multiple nodes. An example of this kind can be found in `example 1.2 `__. Monitoring your calculations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you look at the job folder as run3, er... runs, you will see several files and folders being created. Firstly, 3 ``.txt`` files will be made: * ``jobs.txt``: this file contains a list of jobs that, at some point, __started__ running. * ``finished_cleanly.txt``: this file lists jobs that completed without error. * ``failures.txt``: this file lists jobs that produced an error. Every structure in progress will have a ``.lock`` file to prevent clashes with other nodes. Several folders will also be created: * ``logs/``: log file per structure containing a complete history of the run. * ``input/``: a backup of the starting configuration as a ``.res`` file. * ``completed/``: all successful calculations will end up here, usually as a ``.res`` file with the final configuration, a concatenated ``.castep`` file containing every job step, and if requested (via ``write_cell_structure: true``), CASTEP's ``-out.cell`` file. * ``bad_castep/``: all failed calculations end up here, including all auxiliary files. * ``/``: a folder is created per hostname (e.g. when running on multiple nodes) that contains the interim calculations. On failures/timeouts, all files in here are moved back to the main job folder. Eventually, all jobs will hopefully be moved to ``completed/``, then you are done! Example 1.1.1: High-throughput geometry optimisations with CASTEP with per-structure parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are a few occasions where you might need a custom ``.param`` file for each structure, for example, if using the implicit nanotube ``%devel_code`` in CASTEP. These calculations are performed in exactly the same was as above, except a ``.param`` file must be made containing the required DFT parameters AND the nanotube parameters. In this case, run3 must now be called as:: $ run3 --custom_params TiO2 .. tip:: If you have a .res file that contains a PyAIRSS "REM NT_PROPS" line, this will be ignored. Example 1.2: High-throughput geometry optimisations with CASTEP on a supercomputer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each HPC facility has its own quirks, so in this example we will try to be as explicit as possible. The set up of the job is exactly the same as in `example 1 `__, but we now must add run3 to our job submission script. The following examples are for the SLURM queuing system on the BlueBear machine at the University of Birmingham and PBS on ARCHER (Tier-1), but run3 has also been tested on CSD3 (Tier-2), HPC Midlands-Plus (Tier-2), Thomas (Tier-2) and several local group-scale clusters. Example 1.2.1: SLURM on BlueBear ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this job, we will submit a run3 job that performs CASTEP calculations across 2 24-core nodes per structure. Let us presume we have many thousand structures to run. The submission script looks as follows:: $ cat run3.sub #!/bin/bash -l ###### MACHINE/USER-SPECIFIC OPTIONS ###### #SBATCH --ntasks 48 #SBATCH --nodes 2-2 #SBATCH --time 24:00:00 #SBATCH --qos ##SBATCH --qos bbshort #SBATCH --mail-type ALL #SBATCH --account= module purge export PATH="$HOME/bin/CASTEP-17.21:$HOME/.conda/bin" module load bluebear module load mpi/impi/2017.1.132-iccifort-2017.1.132 unset I_MPI_PMI_LIBRARY # RUN3 COMMANDS # (assuming installation guide followed at # https://matador-db.readthedocs.io/en/latest/install.html) source activate matador run3 -nc 48 -v 4 --executable castep.mpi --ignore_jobs_file TiO2 Let's unpick a few of the flags used to call run3 here: * ``-nc/--ncores``: the number of cores to use per structure, per calculation. It is often worth specifying this if more than one node is being used, as the correctness of run3's core counter is queue/machine-specific. * ``-v 4``: sets the verbosity in the log file to the highest level. * ``--ignore_jobs_file``: by default run3 will for both ``.lock`` files and entries in ``jobs.txt`` before running a new structure. It is often worth disabling the ``jobs.txt`` check if it is not expected that all structures complete in one job submission (see below). try to call an executable called simply ``castep``. On many machines, CASTEP is installed as ``castep.mpi``. Now to submit this script as a 200-node array job (i.e. running a maximum of 100 structures concurrently, depending on the queue), we call the following:: $ sbatch --array=1-100 run3.job It may be that this job is not large enough to optimise all structures within the walltime limit. In this case, it can be resubmitted using the same command. Jobs that were running when the walltime ran out should automatically be pushed back into the job folder so that they will be available to the next run3 call. In the event that this does not happen (for example MPI kept control of the Python thread for too long so the queuieng system interrupted run3's clean up), ```` folder will be left hanging around in the main jobs folder. Jobs must then be manually made restartable by deleting ``.lock`` (and removing ``>`` from ``jobs.txt`` if not using ``--ignore_jobs_file``). It may also be that the intermediate CASTEP calculation was not copied over from the ```` folder: in this case, the CASTEP files can be updated by running:: $ cp -u node*/*.castep . from inside the root job folder. Example 1.2.2: PBS on ARCHER ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _ex.1.2: Instructions are almost identical to the above, but the array job script looks a little different, for the same 100 copies of 2 node jobs (this time 24 cores per node):: $ cat run3.job #!/bin/bash --login # PBS job options (name, compute nodes, job time) # PBS -N is the job name (e.g. Example_MixedMode_Job) #PBS -N my_run3_job # PBS -l select is the number of nodes requested (e.g. 128 node=3072 cores) #PBS -l select=2 # PBS -l walltime, maximum walltime allowed (e.g. 6 hours) #PBS -l walltime=24:00:00 # Replace [budget code] below with your project code (e.g. t01) #PBS -A #PBS -m abe #PBS -M #PBS -J 1-100 #PBS -r y # Make sure any symbolic links are resolved to absolute path export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR) # Change to the direcotry that the job was submitted from # (remember this should be on the /work filesystem) cd $PBS_O_WORKDIR source $HOME/.bashrc module load anaconda-compute/python3 source activate $HOME/work/.conda/matador run3 --archer -v 4 -nc 48 KSnP Notice here we have specified ``--archer``: again, run3 should be able to detect that ``mpirun`` is missing and thus try ``aprun``, but it can be worth specifying just in case. With PBS, the whole array can be submitted with just:: $ qsub run3.job