Updates and Changelog

This section documents the major updates and changes made to the Implicit Solvent DDM package across different versions.

Version 1.1.2 - Critical GPU Bug Fixes

Release Date: October 2025 Status: Latest Stable Release

Critical Bug Fixes

GPU Distribution & Utilization

This release addresses critical GPU underutilization issues and improves GPU resource distribution across simulation jobs:

Major Fixes: - Fixed GPU Underutilization: Removed mpiexec from pmemd.cuda runs to prevent GPU underutilization - Fixed Multi-Process GPU Bug: Resolved critical issue where multiple processes were created on a single GPU, now limited to one process per GPU - GPU Isolation: Each GPU now runs only one simulation job at a time with proper CUDA_VISIBLE_DEVICES assignment - Sequential GPU Execution: Implemented proper GPU batching with sequential execution per GPU device - Enhanced GPU Distribution: Each simulation job now runs on a separate GPU with proper device assignment - MPI Command Handling: Fixed handling of mpi_command=None for CUDA simulations to prevent underutilization

GPU Configuration Improvements:

system_parameters:
  executable: "pmemd.cuda"
  mpi_command:   # No MPI for single GPU jobs, leave blank or remove from yaml config
  CUDA: True
  num_accelerators: 1  # One GPU per simulation job

System Configuration Fixes

  • Default GPU Count: Set default num_accelerators to 1 for better GPU distribution instead of auto-detection

  • CUDA Device Assignment: Proper CUDA_VISIBLE_DEVICES assignment for each GPU job

  • Sequential GPU Execution: Jobs are now chained sequentially per GPU to maximize utilization

Workflow Execution Improvements

  • GPU Job Batching: GPU jobs are now properly distributed across available devices

  • GPU Environment Isolation: Each GPU job gets its own CUDA_VISIBLE_DEVICES environment variable

  • Sequential Execution: Jobs on the same GPU run sequentially to prevent resource conflicts

  • CPU/GPU Separation: Clear separation between CPU and GPU job execution paths

  • Resource Optimization: Better resource allocation for mixed CPU/GPU workflows

Code Quality Improvements

  • Simplified System Names: Standardized receptor and ligand system naming (receptor_system, ligand_system)

  • Cleaner GPU Logic: Removed commented-out GPU batching code and implemented proper distribution

  • Better Error Handling: Improved GPU detection and assignment logic

GPU Environment Management

  • Environment Variable Assignment: Each GPU job gets its own sim.env[“CUDA_VISIBLE_DEVICES”] assignment

  • GPU ID Distribution: Jobs are distributed across GPUs using gpu_id = i % num_gpus pattern

  • Sequential Job Chaining: Jobs on the same GPU are chained using addFollowOn() for sequential execution

  • Resource Isolation: Prevents multiple simulations from competing for the same GPU resources

  • One Process Per GPU: Fixed bug where multiple processes were created on a single GPU, now strictly one process per GPU

Performance Improvements

GPU Utilization: - Eliminated GPU Underutilization: Fixed issues where GPUs were not being fully utilized - Fixed Multi-Process GPU Bug: Resolved critical issue where multiple processes were created on a single GPU - Proper Resource Distribution: Each simulation now gets dedicated GPU resources - GPU Environment Isolation: Each GPU job runs with isolated CUDA_VISIBLE_DEVICES environment - Sequential GPU Execution: Prevents GPU resource conflicts while maintaining efficiency - Better Resource Tracking: Enhanced logging for GPU job distribution and execution

Workflow Efficiency: - Faster GPU Jobs: Removed unnecessary MPI overhead for single GPU jobs - Better Resource Allocation: Optimized CPU/GPU job separation - Improved Job Chaining: Sequential execution per GPU prevents resource conflicts

Configuration Changes

Breaking Changes: - Default GPU Count: num_accelerators now defaults to 1 instead of auto-detection - MPI Command Handling: mpi_command=None now properly handled for CUDA simulations - System Naming: Standardized system naming conventions

Migration Guide: For existing configurations: 1. Update num_accelerators to 1 for single GPU per job 2. Set mpi_command: null for CUDA simulations to avoid underutilization 3. Verify GPU device assignment in logs

Dependencies & Environment

Updated Dependencies: - Toil: Updated to version 8.2.0 (from 5.12.0) - Version Bump: Updated package version to 1.1.2

System Requirements: - CUDA Support: Requires CUDA-compatible AMBER installation - GPU Memory: Optimized for single GPU per simulation job - Resource Management: Better handling of multi-GPU systems

Testing & Validation

  • Re-enabled Workflow Tests: Fixed and re-enabled comprehensive workflow tests

  • GPU Distribution Testing: Added validation for proper GPU job distribution

  • Resource Utilization Testing: Verified GPU utilization improvements

Key Files Modified

  • implicit_solvent_ddm/config.py - Fixed default GPU count and MPI command handling

  • implicit_solvent_ddm/runner.py - Implemented proper GPU batching and sequential execution

  • implicit_solvent_ddm/simulations.py - Fixed CUDA execution list handling

  • implicit_solvent_ddm/alchemical.py - Standardized system naming

  • setup.py - Updated version and Toil dependency

Critical: This release fixes GPU underutilization issues that significantly impact simulation performance.

Version 1.1.1 - GPU Acceleration Support

Release Date: October 2025 Status: Previous Development Release

Major Features

GPU Acceleration Support

The package now includes comprehensive GPU acceleration support for molecular dynamics simulations using CUDA-enabled AMBER executables.

Key Features: - Automatic GPU Detection: The system automatically detects available GPUs when CUDA: True is set - Flexible GPU Allocation: Support for specifying the number of GPUs per simulation - CUDA-Aware Job Scheduling: Intelligent job distribution across available GPU resources - Fallback Support: Graceful fallback to CPU execution when GPUs are unavailable

Supported GPU Executables: - pmemd.cuda - GPU-accelerated PMEMD engine - Custom CUDA-enabled AMBER executables

Configuration Changes

New Configuration Parameters

The following new parameters have been added to the configuration system:

System Settings: - CUDA (bool): Enable/disable GPU acceleration (default: False) - num_accelerators (int): Number of GPUs to request (default: 0 - auto-detect)

Example Configuration:

system_parameters:
  executable: "pmemd.cuda"  # GPU-enabled executable
  mpi_command: "srun"       # or "mpirun"/"mpiexec"
  CUDA: True                # Enable GPU acceleration
  num_accelerators: 2       # Number of GPUs (0 = auto-detect)
  output_directory_name: "gpu_simulation"

Complete CUDA Configuration Example:

# CUDA-enabled configuration example
system_parameters:
  executable: "pmemd.cuda"
  mpi_command: "srun"
  CUDA: True
  num_accelerators: 2       # Use 2 GPUs
  memory: "10G"            # Increased memory for GPU jobs
  disk: "20G"              # Increased disk space
  output_directory_name: "cuda_ddm_run"

endstate_parameter_files:
  complex_parameter_filename: "path/to/complex.parm7"
  complex_coordinate_filename: "path/to/complex.rst7"

number_of_cores_per_system:
  complex_ncores: 8        # CPU cores for complex simulation
  ligand_ncores: 4         # CPU cores for ligand simulation
  receptor_ncores: 4       # CPU cores for receptor simulation

AMBER_masks:
  receptor_mask: ":RECEPTOR"
  ligand_mask: ":LIGAND"

workflow:
  endstate_method: "basic_md"
  endstate_arguments:
    md_template_mdin: "path/to/template.mdin"
  intermediate_states_arguments:
    mdin_intermediate_file: "path/to/intermediate.mdin"
    igb_solvent: 2
    temperature: 300
    exponent_conformational_forces: [-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
    exponent_orientational_forces: [-4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
    restraint_type: 2

Implementation Details

SystemSettings Class Updates: - Added CUDA boolean field with automatic GPU detection - Added num_accelerators field with intelligent defaults - Enhanced __post_init__ method for GPU environment validation

Simulation Class Enhancements: - Updated command-line argument generation for CUDA executables - Enhanced GPU-aware job scheduling logic - Improved environment variable handling for CUDA_VISIBLE_DEVICES

Runner Class Improvements: - Added GPU job batching and resource management - Implemented intelligent GPU allocation across simulation batches - Enhanced error handling for GPU resource conflicts

Key Code Changes:

In config.py:

@dataclass
class SystemSettings:
    CUDA: bool = field(default=False)
    num_accelerators: int = field(default=0)

    def __post_init__(self):
        if self.CUDA and self.num_accelerators == 0:
            try:
                from numba import cuda
                self.num_accelerators = len(cuda.gpus)
            except ImportError:
                raise RuntimeError("CUDA requested but 'cuda' module not available.")

In simulations.py:

def setup(self):
    if self.CUDA and self.system_type in ["complex", "receptor"]:
        self.exec_list.append(self.executable)
    # ... rest of setup logic

Performance Improvements

  • GPU Acceleration: Significant speedup for large system simulations

  • Resource Optimization: Better utilization of available computational resources

  • Memory Management: Enhanced memory handling for GPU-accelerated simulations

  • Job Scheduling: Improved parallel execution with GPU-aware scheduling

Backward Compatibility

  • All existing CPU-only configurations remain fully functional

  • Default behavior unchanged (CPU execution)

  • No breaking changes to existing API or configuration format

  • Seamless upgrade path from previous versions

Migration Guide

For Existing Users:

To enable GPU acceleration, simply add the following to your configuration:

system_parameters:
  executable: "pmemd.cuda"  # Change from "pmemd.MPI" to "pmemd.cuda"
  CUDA: True                # Add this line
  num_accelerators: 1       # Optional: specify number of GPUs

Hardware Requirements: - CUDA-enabled GPU with sufficient memory - CUDA-compatible AMBER installation - Appropriate CUDA drivers and runtime

Software Dependencies: - AMBER with CUDA support - CUDA toolkit (version 10.0 or higher recommended) - Python packages: numba (for GPU detection)

GPU-Enabled PyMBAR Analysis

For GPU-accelerated free energy analysis using PyMBAR, you can enable JAX CUDA support to leverage GPU computing for MBAR calculations.

Installation Requirements:

Follow the JAX installation guide for NVIDIA GPU support with CUDA 12:

# Install JAX with CUDA 12 support
pip install -U "jax[cuda12]"

Verification:

Check your CUDA version:

nvcc --version

Configuration:

Once JAX with CUDA support is installed, PyMBAR will automatically detect and use GPU acceleration when available. The analysis will be performed using JAX’s GPU-accelerated operations, significantly speeding up MBAR calculations for large datasets.

GPU Allocation:

Note that only one GPU will be used per simulation job. This means each individual simulation (such as complex lambda windows, endstate simulations, charge lambda windows, etc.) will utilize a single GPU for acceleration. Multiple simulations can run in parallel across different GPUs when available.

Benefits: - Accelerated MBAR free energy calculations - Faster convergence for large simulation datasets - Reduced analysis time for complex systems - Automatic GPU detection and utilization

References: - JAX Installation Guide - JAX CUDA Support

Known Issues and Limitations

  • GPU memory requirements may be higher than CPU simulations

  • Some small systems may not benefit significantly from GPU acceleration

  • CUDA_VISIBLE_DEVICES environment variable management requires careful configuration in multi-GPU setups

Version 1.0.0 - Initial Stable Release

Release Date: December 19, 2024 Status: First Stable Release for Publication

This version represents the first stable release of the Implicit Solvent DDM package, containing the exact code used for the paper submission and publication.

Key Features: - Complete DDM workflow implementation - Implicit solvent support (GBSA models) - Multi-engine compatibility (AMBER executables) - Parallel computing support (SLURM/PBS) - Automated restraint generation - Temperature replica exchange (TREMD) - Integrated MBAR analysis

For detailed information about v1.0.0 features, see Installation and Implementation Details.