Skip to content

refactor: add OpenMP parallelization for LJ virial reduction, thermostats, and Langevin#7529

Open
yyya18 wants to merge 17 commits into
deepmodeling:developfrom
Audrey-777:refactor/md-openmp
Open

refactor: add OpenMP parallelization for LJ virial reduction, thermostats, and Langevin#7529
yyya18 wants to merge 17 commits into
deepmodeling:developfrom
Audrey-777:refactor/md-openmp

Conversation

@yyya18

@yyya18 yyya18 commented Jun 26, 2026

Copy link
Copy Markdown

Summary

This PR adds OpenMP parallelization to the remaining serial hot paths
in the MD module, complementing existing NEP/DPMD OpenMP optimizations.

Changes

1. LJ runner virial lock elimination (esolver_lj.cpp)

  • Replace #pragma omp critical (single lock for 9 virial components)
    with 9 independent #pragma omp atomic operations.
  • Eliminates thread serialization at the reduction point.

2. Thread-safe RNG infrastructure (md_func.h/cpp)

  • Add gaussrand_thread_safe() and uniform_rand_thread_safe()
    using thread_local std::mt19937.
  • Enables parallelization of thermostat/Langevin functions previously
    blocked by thread-unsafe std::rand() and static state in gaussrand().

3. Thermostat parallelization (verlet.cpp)

  • CSVR: parallelize noise summation with reduction(+:)
    and velocity scaling with parallel for.
  • Anderson: add parallel for with thread-safe RNG.

4. Langevin post_force parallelization (langevin.cpp)

  • Replace std::rand() with uniform_rand_thread_safe().
  • Add parallel for; move fictitious_force inside the loop body
    for thread privacy.

5. FIRE check_force parallelization (fire.cpp)

  • Add parallel for with reduction(max:max).

Audrey-777 and others added 17 commits May 30, 2026 21:04
…T, NHC, FIRE, LJ)

Cover 6 remaining hot-path per-atom loops that were not parallelized
in the prior merge-openmp branch:

- md_func.cpp: rescale_vel() — velocity rescaling factor apply
- msst.cpp: vel_sum() — norm2 reduction, propagate_vel() — exp-based
  velocity propagation (highest compute density among uncovered loops)
- nhchain.cpp: vel_baro() — NPT per-atom velocity scaling
- fire.cpp: check_fire() — triple reduction + velocity mixing + zero
- esolver_lj.cpp: runner() — N² neighbor pair computation with
  schedule(dynamic) for load balancing, per-thread virial accumulation

All optimizations use schedule(static) with nat>=256 threshold
(LJ uses dynamic,32 for neighbor-count imbalance).
No data dependencies changed — all loops are per-atom independent.
No conflict with prior merge-openmp branch.
The 'if' clause is only valid on '#pragma omp parallel', not on
'#pragma omp for' when used inside an explicit parallel region.
This caused a compile error: 'if' is not valid for '#pragma omp for'.
…Langevin, and FIRE

- Replace #pragma omp critical with 9 independent #pragma omp atomic
  in LJ runner virial reduction (esolver_lj.cpp) to eliminate lock
  contention at high thread counts.

- Add thread-safe random number generators (md_func.h/cpp):
  gaussrand_thread_safe() and uniform_rand_thread_safe() using
  thread_local std::mt19937, enabling OpenMP parallelization of
  thermostat and Langevin functions that were previously serial
  due to thread-unsafe std::rand() / gaussrand().

- Parallelize CSVR thermostat noise summation with reduction(+:)
  and velocity scaling with parallel for (verlet.cpp).

- Parallelize Anderson thermostat with thread-safe RNG and
  parallel for (verlet.cpp).

- Parallelize Langevin post_force with thread-safe RNG and
  parallel for, moving fictitious_force inside the loop body
  for thread privacy (langevin.cpp).

- Parallelize FIRE check_force with reduction(max:max) (fire.cpp).

All new #pragma omp parallel for directives use default(none)
with explicitly listed shared variables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants