GPU-Accelerated MD Simulation Example¶
Running host-guest (CB7) absolute free energy calculation with GPU acceleration:¶
This example demonstrates how to run the DDM workflow using GPU acceleration with pmemd.cuda. GPU acceleration can significantly speed up MD simulations, especially for larger systems or when running multiple replicas.
All the example input scripts can be found in the script_examples directory. Let’s look at the GPU-optimized YAML config file gpu_config.yaml:
hardware_parameters:
working_directory: working_directory
executable: "pmemd.cuda" # GPU-accelerated executable
CUDA: True # Enable GPU acceleration
output_directory_name: gpu_md_dir
cache_directory_output: path/to/cache_directory
endstate_parameter_files:
complex_parameter_filename: script_examples/structs/cb7-mol01_Hmass.parm7
complex_coordinate_filename: script_examples/structs/cb7-mol01.rst7
number_of_cores_per_system:
complex_ncores: 1 # Typically 1 core per GPU
ligand_ncores: 1 # Typically 1 core per GPU
receptor_ncores: 1 # Typically 1 core per GPU
AMBER_masks:
receptor_mask: ":CB7" # CB7 receptor atoms
ligand_mask: ":M01" # M01 ligand atoms
workflow:
endstate_method: basic_md # Basic MD with GPU acceleration
endstate_arguments:
md_template_mdin: script_examples/basic_md.template
intermediate_states_arguments:
mdin_intermidate_config: script_examples/user_intermidate_mdin_args.yaml
igb_solvent: 2 # GB solvent model
temperature: 300
exponent_conformational_forces: [
-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2,
2.584963, 3, 3.584963, 4
]
exponent_orientational_forces: [
-4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6,
6.584963, 7, 7.584963, 8
]
restraint_type: 2 # CoM-Heavy_Atom restraints
Key GPU Configuration Changes:¶
Executable: Changed to
pmemd.cudafor GPU accelerationCUDA Flag: Set to
Trueto enable GPU featuresNo MPI Command: IMPORTANT: Do not specify
mpi_commandfor GPU simulations as this will bottleneck GPU performance and severely underutilize the GPU resourcesCore Count: Typically reduced to 1 core per system since GPU handles the computation
GPU Allocation: No need to specify
num_accelerators- this defaults to 1 GPU per simulation job automaticallyMemory Management: GPU simulations often require different memory considerations
MDIN Template¶
The MDIN template for GPU simulations follows the same process as MPI CPU options - no real difference in the template itself. Use the same template as the Basic MD Example.
For the complete MDIN template details, see the Basic MD mdin template.
The template requires only that you set DISANG=$restraint - the restraint file will be loaded during the workflow.
GPU Batch Script Example¶
Example batch file for SLURM scheduler with GPU allocation:
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1 # Change to gpu:N for N GPUs running in parallel
#SBATCH --cpus-per-task=30 # Use high number for post-energy analysis (imin=5) parallelization
#SBATCH --time=09:30:00
#SBATCH --job-name=gpu_ddm_example
#SBATCH --export=all
# Load required modules
module load amber/22
module load cuda/11.8
conda activate isddm_env
pwd
mkdir /scratch/username/gpu_workflow/
# Run GPU-accelerated DDM workflow
run_implicit_ddm.py file:jobstore_gpu_example --config_file script_examples/config_files/gpu_config.yaml --workDir /scratch/username/gpu_workflow/
Multi-GPU Scaling¶
The GPU workflow supports scaling to multiple GPUs for parallel execution:
Single GPU: Use
--gres=gpu:1(one GPU per simulation job)Multiple GPUs: Use
--gres=gpu:Nwhere N is the number of GPUsParallel Execution: Each GPU runs one simulation job in parallel
Optimal Scaling: N GPUs will run N simulation jobs simultaneously
Example for 4 GPUs:
#SBATCH --gres=gpu:4 # 4 GPUs running in parallel
#SBATCH --cpus-per-task=30 # High CPU count for post-energy analysis parallelization
GPU Performance Considerations¶
MDIN Template: Uses the same template as CPU simulations - no special GPU-specific parameters needed
System Size: GPU acceleration is most beneficial for larger systems
Multiple GPUs: Scale to N GPUs for N parallel simulation jobs
High CPU Count: Use 30+ CPUs for optimal performance - the Sander post-energy analysis (imin=5) runs on CPUs and completion time scales linearly with the number of CPUs
Resource Allocation: Ensure proper GPU allocation in your batch script
Troubleshooting GPU Issues¶
Critical: Do not use
mpi_commandwith GPU simulations - this will bottleneck performance and severely underutilize GPU resourcesGPU Allocation: Do not specify
num_accelerators- this defaults to 1 GPU per simulation job automaticallyEnsure CUDA drivers and AMBER GPU support are properly installed
Check GPU memory availability for your system size
Verify that
pmemd.cudais available in your AMBER installationMonitor GPU utilization during simulation to ensure proper acceleration
If GPU utilization is low, check that you’re not accidentally using MPI commands or multiple cores per GPU