Refactor: optimize MD and NEP hot paths with OpenMP-ready loops#7518
Refactor: optimize MD and NEP hot paths with OpenMP-ready loops#7518Audrey-777 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors several Molecular Dynamics (MD) hot paths and NEP/DPMD solver loops to be OpenMP-ready, while also consolidating MD unit test setup/teardown into shared fixtures and introducing small helper APIs for MD kinetic/stress state computation.
Changes:
- Added OpenMP-ready reductions/loops in MD helpers and per-atom update paths, and introduced
MDKineticState/MDStressStatehelper APIs. - Reduced per-step allocations in NEP/DPMD by reusing interface buffers and parallelized several NEP loops (coord fill, energy/virial reductions, force copy-back).
- Refactored MD unit tests to use shared fixtures (
md_test_fixture.h) and migrated runner ownership tostd::unique_ptrin tests andrun_md.cpp.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| source/source_md/test/verlet_test.cpp | Switches Verlet tests to shared integrator fixture. |
| source/source_md/test/nhchain_test.cpp | Uses shared MD test base + unique_ptr runner ownership. |
| source/source_md/test/msst_test.cpp | Switches MSST tests to shared integrator fixture and updates casts for unique_ptr. |
| source/source_md/test/md_test_fixture.h | Adds shared gtest fixtures for MD integrators, MD_func, and LJ pot tests. |
| source/source_md/test/md_func_test.cpp | Uses shared fixture; adds direct tests for new kinetic/stress state helpers. |
| source/source_md/test/lj_pot_test.cpp | Uses shared fixture and migrates esolver ownership to unique_ptr. |
| source/source_md/test/langevin_test.cpp | Switches Langevin tests to shared integrator fixture. |
| source/source_md/test/fire_test.cpp | Switches FIRE tests to shared integrator fixture and updates casts for unique_ptr. |
| source/source_md/test/CMakeLists.txt | Updates MD test build dependency list (per diff). |
| source/source_md/run_md.cpp | Introduces unique_ptr MD runner creation and removes manual delete. |
| source/source_md/md_statistics.h | Adds MDKineticState / MDStressState structs. |
| source/source_md/md_func.h | Declares new state helper APIs and includes md_statistics.h. |
| source/source_md/md_func.cpp | Adds OpenMP-ready loops/reductions; implements state helpers; refactors stress/temp computations. |
| source/source_md/md_base.cpp | Adds OpenMP-ready per-atom loops for update_pos / update_vel. |
| source/source_esolver/esolver_nep.h | Adds persistent NEP buffers and atom index mapping fields. |
| source/source_esolver/esolver_nep.cpp | Reuses NEP buffers; adds OpenMP-ready reductions and loop parallelism. |
| source/source_esolver/esolver_dp.h | Adds persistent DP buffers for cell/coord and raw model outputs. |
| source/source_esolver/esolver_dp.cpp | Reuses DP buffers and avoids repeated temporary vector allocations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| if (cal_stress) | ||
| { | ||
| ModuleBase::matrix t_vector; | ||
|
|
||
| temp_vector(unit_in.nat, vel, allmass, t_vector); | ||
|
|
||
| for (int i = 0; i < 3; ++i) | ||
| { | ||
| for (int j = 0; j < 3; ++j) | ||
| { | ||
| stress(i, j) = virial(i, j) + t_vector(i, j) / unit_in.omega; | ||
| } | ||
| } | ||
| stress = calc_stress_state(unit_in.nat, unit_in.omega, vel, allmass, virial).stress; | ||
| } |
There was a problem hiding this comment.
Fixed. compute_stress() now fills the caller-provided stress matrix in place again instead of move-assigning from calc_stress_state(...).stress, so the hot path keeps reusing the existing 3x3 storage. calc_stress_state() remains available for tests/value-returning callers.
| #include <memory> | ||
| #include <vector> |
There was a problem hiding this comment.
Fixed. md_test_fixture.h now temporarily undefines private / protected before including gtest, , and , then restores the macros before including project headers, keeping the access-control hack scoped to project code.
| temp_vector(natom, vel, allmass, state.temperature_tensor); | ||
| state.stress.create(3, 3); | ||
|
|
There was a problem hiding this comment.
Fixed. calc_stress_state() now uses state.stress.create(3, 3, false) because all 9 entries are filled immediately.
| @@ -475,17 +506,45 @@ | |||
| { | |||
| t_vector.create(3, 3); | |||
There was a problem hiding this comment.
Fixed. temp_vector() now uses t_vector.create(3, 3, false) because all 9 entries are overwritten before return.
Summary
This PR refactors selected MD and NEP/DPMD hot paths and adds low-risk OpenMP-ready loop optimizations.
Main changes:
std::unique_ptrfor MD runner lifetime management.This PR intentionally does not include LJ runner parallelization, MPI domain decomposition, CUDA work, benchmark scripts, or unrelated PW/Hamilt changes.
Validation
Local checks passed:
git diff --check upstream/develop...HEADMODULE_MD_funcMODULE_MD_LJ_potMODULE_MD_fireMODULE_MD_verletMODULE_MD_nhcMODULE_MD_msstPerformance Notes
A 2048-atom LJ NVE case on a 16-logical-CPU machine did not show stable end-to-end speedup:
This is expected because this PR does not parallelize the LJ runner, which remains the main full-MD bottleneck.
MPI-only and MPI+OpenMP hybrid tests were also slower than serial/OpenMP-only because the current MD/LJ path does not partition LJ force work across MPI ranks.