refactor: add OpenMP parallelization for LJ virial reduction, thermostats, and Langevin#7529
Open
yyya18 wants to merge 17 commits into
Open
refactor: add OpenMP parallelization for LJ virial reduction, thermostats, and Langevin#7529yyya18 wants to merge 17 commits into
yyya18 wants to merge 17 commits into
Conversation
…icle_thermo velocity scaling
…T, NHC, FIRE, LJ) Cover 6 remaining hot-path per-atom loops that were not parallelized in the prior merge-openmp branch: - md_func.cpp: rescale_vel() — velocity rescaling factor apply - msst.cpp: vel_sum() — norm2 reduction, propagate_vel() — exp-based velocity propagation (highest compute density among uncovered loops) - nhchain.cpp: vel_baro() — NPT per-atom velocity scaling - fire.cpp: check_fire() — triple reduction + velocity mixing + zero - esolver_lj.cpp: runner() — N² neighbor pair computation with schedule(dynamic) for load balancing, per-thread virial accumulation All optimizations use schedule(static) with nat>=256 threshold (LJ uses dynamic,32 for neighbor-count imbalance). No data dependencies changed — all loops are per-atom independent. No conflict with prior merge-openmp branch.
The 'if' clause is only valid on '#pragma omp parallel', not on '#pragma omp for' when used inside an explicit parallel region. This caused a compile error: 'if' is not valid for '#pragma omp for'.
…Langevin, and FIRE - Replace #pragma omp critical with 9 independent #pragma omp atomic in LJ runner virial reduction (esolver_lj.cpp) to eliminate lock contention at high thread counts. - Add thread-safe random number generators (md_func.h/cpp): gaussrand_thread_safe() and uniform_rand_thread_safe() using thread_local std::mt19937, enabling OpenMP parallelization of thermostat and Langevin functions that were previously serial due to thread-unsafe std::rand() / gaussrand(). - Parallelize CSVR thermostat noise summation with reduction(+:) and velocity scaling with parallel for (verlet.cpp). - Parallelize Anderson thermostat with thread-safe RNG and parallel for (verlet.cpp). - Parallelize Langevin post_force with thread-safe RNG and parallel for, moving fictitious_force inside the loop body for thread privacy (langevin.cpp). - Parallelize FIRE check_force with reduction(max:max) (fire.cpp). All new #pragma omp parallel for directives use default(none) with explicitly listed shared variables.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds OpenMP parallelization to the remaining serial hot paths
in the MD module, complementing existing NEP/DPMD OpenMP optimizations.
Changes
1. LJ runner virial lock elimination (
esolver_lj.cpp)#pragma omp critical(single lock for 9 virial components)with 9 independent
#pragma omp atomicoperations.2. Thread-safe RNG infrastructure (
md_func.h/cpp)gaussrand_thread_safe()anduniform_rand_thread_safe()using
thread_local std::mt19937.blocked by thread-unsafe
std::rand()and static state ingaussrand().3. Thermostat parallelization (
verlet.cpp)reduction(+:)and velocity scaling with
parallel for.parallel forwith thread-safe RNG.4. Langevin post_force parallelization (
langevin.cpp)std::rand()withuniform_rand_thread_safe().parallel for; movefictitious_forceinside the loop bodyfor thread privacy.
5. FIRE check_force parallelization (
fire.cpp)parallel forwithreduction(max:max).