GPU-Accelerated MD Simulation Example

Running host-guest (CB7) absolute free energy calculation with GPU acceleration:

This example demonstrates how to run the DDM workflow using GPU acceleration with pmemd.cuda. GPU acceleration can significantly speed up MD simulations, especially for larger systems or when running multiple replicas.

All the example input scripts can be found in the script_examples directory. Let’s look at the GPU-optimized YAML config file gpu_config.yaml:

hardware_parameters:
    working_directory: working_directory
    executable: "pmemd.cuda" # GPU-accelerated executable
    CUDA: True # Enable GPU acceleration
    output_directory_name: gpu_md_dir
    cache_directory_output: path/to/cache_directory

endstate_parameter_files:
    complex_parameter_filename: script_examples/structs/cb7-mol01_Hmass.parm7
    complex_coordinate_filename: script_examples/structs/cb7-mol01.rst7

number_of_cores_per_system:
    complex_ncores: 1 # Typically 1 core per GPU
    ligand_ncores: 1 # Typically 1 core per GPU
    receptor_ncores: 1 # Typically 1 core per GPU

AMBER_masks:
    receptor_mask: ":CB7" # CB7 receptor atoms
    ligand_mask: ":M01" # M01 ligand atoms

workflow:
    endstate_method: basic_md # Basic MD with GPU acceleration
    endstate_arguments:
        md_template_mdin: script_examples/basic_md.template
    intermediate_states_arguments:
        mdin_intermidate_config: script_examples/user_intermidate_mdin_args.yaml
        igb_solvent: 2 # GB solvent model
        temperature: 300
        exponent_conformational_forces: [
            -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2,
            2.584963, 3, 3.584963, 4
        ]
        exponent_orientational_forces: [
            -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6,
            6.584963, 7, 7.584963, 8
        ]
        restraint_type: 2 # CoM-Heavy_Atom restraints

Key GPU Configuration Changes:

  1. Executable: Changed to pmemd.cuda for GPU acceleration

  2. CUDA Flag: Set to True to enable GPU features

  3. No MPI Command: IMPORTANT: Do not specify mpi_command for GPU simulations as this will bottleneck GPU performance and severely underutilize the GPU resources

  4. Core Count: Typically reduced to 1 core per system since GPU handles the computation

  5. GPU Allocation: No need to specify num_accelerators - this defaults to 1 GPU per simulation job automatically

  6. Memory Management: GPU simulations often require different memory considerations

MDIN Template

The MDIN template for GPU simulations follows the same process as MPI CPU options - no real difference in the template itself. Use the same template as the Basic MD Example.

For the complete MDIN template details, see the Basic MD mdin template.

The template requires only that you set DISANG=$restraint - the restraint file will be loaded during the workflow.

GPU Batch Script Example

Example batch file for SLURM scheduler with GPU allocation:

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1  # Change to gpu:N for N GPUs running in parallel
#SBATCH --cpus-per-task=30  # Use high number for post-energy analysis (imin=5) parallelization
#SBATCH --time=09:30:00
#SBATCH --job-name=gpu_ddm_example
#SBATCH --export=all

# Load required modules
module load amber/22
module load cuda/11.8
conda activate isddm_env
pwd

mkdir /scratch/username/gpu_workflow/

# Run GPU-accelerated DDM workflow
run_implicit_ddm.py file:jobstore_gpu_example --config_file script_examples/config_files/gpu_config.yaml --workDir /scratch/username/gpu_workflow/

Multi-GPU Scaling

The GPU workflow supports scaling to multiple GPUs for parallel execution:

  • Single GPU: Use --gres=gpu:1 (one GPU per simulation job)

  • Multiple GPUs: Use --gres=gpu:N where N is the number of GPUs

  • Parallel Execution: Each GPU runs one simulation job in parallel

  • Optimal Scaling: N GPUs will run N simulation jobs simultaneously

Example for 4 GPUs:

#SBATCH --gres=gpu:4  # 4 GPUs running in parallel
#SBATCH --cpus-per-task=30  # High CPU count for post-energy analysis parallelization

GPU Performance Considerations

  1. MDIN Template: Uses the same template as CPU simulations - no special GPU-specific parameters needed

  2. System Size: GPU acceleration is most beneficial for larger systems

  3. Multiple GPUs: Scale to N GPUs for N parallel simulation jobs

  4. High CPU Count: Use 30+ CPUs for optimal performance - the Sander post-energy analysis (imin=5) runs on CPUs and completion time scales linearly with the number of CPUs

  5. Resource Allocation: Ensure proper GPU allocation in your batch script

Troubleshooting GPU Issues

  • Critical: Do not use mpi_command with GPU simulations - this will bottleneck performance and severely underutilize GPU resources

  • GPU Allocation: Do not specify num_accelerators - this defaults to 1 GPU per simulation job automatically

  • Ensure CUDA drivers and AMBER GPU support are properly installed

  • Check GPU memory availability for your system size

  • Verify that pmemd.cuda is available in your AMBER installation

  • Monitor GPU utilization during simulation to ensure proper acceleration

  • If GPU utilization is low, check that you’re not accidentally using MPI commands or multiple cores per GPU