Skip to content

feat(pt_expt): dpa1(attn_layer=0) graph-native NeighborGraph forward#5583

Draft
wanghan-iapcm wants to merge 44 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-dpmodel-graph-dpa1
Draft

feat(pt_expt): dpa1(attn_layer=0) graph-native NeighborGraph forward#5583
wanghan-iapcm wants to merge 44 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-dpmodel-graph-dpa1

Conversation

@wanghan-iapcm

@wanghan-iapcm wanghan-iapcm commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the graph-native forward path for dpa1(attn_layer=0) (the factorizable, mixed-types case), built on the NeighborGraph foundation from #5581. Geometry enters the descriptor only through per-edge edge_vec; the neighbor-axis reduction becomes a segment_sum over edge centers. For pt_expt this becomes the default forward (force/virial via a single autograd backward through edge_vec).

What it adds

  • dpmodel: edge_env_mat (per-edge env-mat 4-vector), DescrptBlockSeAtten._call_graph + DescrptDPA1.call_graph, model call_lower_graph (energy), neighbor_graph_from_ijs + an optional ASE O(N) carry-all builder.
  • pt_expt: edge_energy_deriv (autograd grad(E, edge_vec)edge_force_virial) + forward_common_lower_graph (energy + force + virial + atom_virial).
  • The dense DescrptDPA1.call becomes a thin adapter (from_dense_quartet → call_graph) preserving the 5-tuple ABI; a shape-static converter keeps it jax.jit / torch.export-traceable.

Default behavior

  • pt_expt defaults graph-eligible dpa1(attn_layer=0, concat tebd, no exclude_types) models to the carry-all graph (it has the autograd force/virial path).
  • dpmodel/jax keep the dense default (they compute force/virial analytically; the graph lower is energy-only), and agree with pt_expt at non-binding sel.
  • Ineligible configs (attention, strip tebd, exclude_types, linear/ZBL) fall back to the dense path unchanged. neighbor_graph_method="legacy" forces dense; "dense"/"ase" force the graph.

Parity (graph vs legacy dense lower, fp64 CPU)

energy force virial atom_virial
max abs diff 0 ~1e-19 ~1e-18 ~1e-18

atom_virial matches the canonical TF==pt-legacy full-to-src convention. dpa1 descriptor + model consistency suites green across dp/jax/pt_expt.

Known limitations

  • Default-flip is pt_expt-only; full carry-all default for dp/jax needs analytical/jax graph force (follow-up).
  • make_fx (forward + grad) traces; full .pt2 AOTI export is a follow-up (PR-B). The carry-all builders (build_neighbor_graph/from_ijs) still use nonzero (eager-only); their static variants land with the export PR.
  • Single-rank only; CUDA unvalidated (CPU box); ASE is opt-in O(N) (vesin O(N) is a follow-up); no jax graph force / dpa2-3 message-passing yet.

Also folds in three follow-up fixes to the #5581 foundation from @OutisLi's review (dangling spec refs → design discussion, edge_force_virial jax int-sum short-circuit, Array typing).

Summary by CodeRabbit

  • New Features
    • Added graph-native “lowering” for DPA1 (enabled when compatible), including graph-native forward execution and descriptor→model output lowering.
    • Introduced opt-in neighbor_graph_method routing for energy/force/virial workflows, with carry-all graphs and new graph-output fitting/post-processing utilities.
    • Added new neighbor-graph builders/utilities (including ASE-based carry-all, (i,j,S) conversion, and edge environment matrices) and exported them publicly.
  • Bug Fixes
    • Improved traced edge_force_virial behavior when node capacity is provided, and improved masking/ABI parity for invalid/virtual atoms.
  • Tests
    • Added extensive parity/regression coverage for descriptor graph lowering, energy parity (including virtual atom masking), force/virial derivatives, neighbor-graph conversion, and FX tracing.

Han Wang added 26 commits June 25, 2026 17:26
…_graph

The dense path masks excluded type pairs; the graph path does not yet, so
raise NotImplementedError instead of silently diverging.
serialize roundtrip + dpmodel->pt_expt interop on the attn_layer=0 graph path
are already covered by test_dpa1.py::test_consistency (lines 86-113), which
routes through the graph forward via the Task-3 dense-call adapter.
…all back to dense

Task 3's adapter routed ALL attn_layer==0 through the graph, but the graph
only supports tebd_input_mode='concat', no exclude_types, and needs mapping
for ghosts. strip-mode / exclude / mapping-None-with-ghosts attn_layer=0
models raised/IndexError'd. uses_graph_lower() now encodes full eligibility
and ineligible configs fall back to the legacy dense body unchanged.
Fixes test_compressed_forward (attn_layer=0 strip).
…pt graph mask key; legacy opt-out in Option-B test

- _resolve_graph_method/_call_common_graph use getattr(atomic_model,'descriptor',None)
  so Linear/ZBL models (no descriptor) fall back to dense instead of AttributeError
- pt_expt _call_common_graph override adds the all-ones mask key for dense parity
- test_dpa1_graph_model_energy dense refs use neighbor_graph_method='legacy'
  to opt out of the now-default carry-all graph (decision deepmodeling#17 default-flip)
…nse default

dpmodel/jax compute force/virial analytically inside call_common (energy_derv_r);
the energy-only graph lower drops it -> KeyError when force is requested. Only
pt_expt has the autograd graph force/virial path, so only pt_expt defaults
eligible models to the graph. dpmodel base _resolve_graph_method no longer
auto-routes; pt_expt overrides it to re-enable AUTO.
…x int-sum, Array typing)

- swap dangling memory/spec_unified_edge_nlist.md refs -> public design
  discussion (#4) so the references resolve
- edge_force_virial: short-circuit n_out=int(node_capacity) when supplied so
  the static jax/export path never calls int() on a traced sum(n_node)
- derivatives.py: move Array import under TYPE_CHECKING (+ from __future__
  import annotations) for subpackage uniformity
@wanghan-iapcm wanghan-iapcm requested a review from iProzd June 25, 2026 09:43
@wanghan-iapcm wanghan-iapcm requested a review from OutisLi June 25, 2026 09:43
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0f81eaf0-7570-4d71-ba7d-5de96a26887d

📥 Commits

Reviewing files that changed from the base of the PR and between 7cfb2a8 and 58ef3fb.

📒 Files selected for processing (7)
  • deepmd/dpmodel/descriptor/dpa1.py
  • deepmd/dpmodel/model/make_model.py
  • deepmd/dpmodel/utils/neighbor_graph/ase_builder.py
  • deepmd/dpmodel/utils/neighbor_graph/env.py
  • deepmd/dpmodel/utils/neighbor_graph/from_ijs.py
  • deepmd/pt_expt/model/make_model.py
  • source/tests/common/dpmodel/test_dpa1_graph_model_energy.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • deepmd/dpmodel/utils/neighbor_graph/from_ijs.py
  • deepmd/dpmodel/utils/neighbor_graph/ase_builder.py
  • deepmd/pt_expt/model/make_model.py
  • deepmd/dpmodel/descriptor/dpa1.py

📝 Walkthrough

Walkthrough

The PR adds graph-based neighbor construction, DPA1 graph lowering, graph-routed energy execution in the array API and PyTorch model paths, and tests that compare the graph flow against dense references.

Changes

Graph-native execution

Layer / File(s) Summary
Neighbor-graph constructors
deepmd/dpmodel/utils/*, deepmd/dpmodel/utils/neighbor_graph/*
Adds dense-to-graph conversion, sparse (i, j, S) conversion, ASE-backed carry-all graph building, environment-matrix construction, derivative support updates, public exports, and validation for the new neighbor-graph utilities.
DPA1 graph lower
deepmd/dpmodel/descriptor/dpa1.py, source/tests/common/dpmodel/test_dpa1_call_graph_*.py, source/tests/pt_expt/descriptor/test_dpa1.py
Adds graph-lower eligibility checks, descriptor graph routing, and the block-level graph forward path, with parity, fallback, and tracing coverage.
DPModel graph energy
deepmd/dpmodel/model/make_model.py, source/tests/common/dpmodel/test_call_lower_graph.py, source/tests/common/dpmodel/test_dpa1_graph_model_energy.py
Adds neighbor_graph_method routing, graph-native descriptor-plus-fitting execution, lower-level graph output assembly, and dense-vs-graph energy parity coverage.
PyTorch graph output transform
deepmd/pt_expt/model/edge_transform_output.py, source/tests/pt_expt/model/test_edge_energy_deriv.py
Adds graph-native force/virial derivative assembly and tests for the edge-based transformation helpers.
PyTorch graph model routing
deepmd/pt_expt/model/make_model.py, source/tests/pt_expt/model/test_dpa1_graph_lower.py, source/tests/pt_expt/model/test_dos_graph.py
Adds PyTorch graph-method selection and graph-native model lowering, with parity coverage for energy and non-energy DPA1-based models.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Suggested labels

Core

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: a graph-native NeighborGraph forward path for DPA1 with attn_layer=0.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
source/tests/common/dpmodel/test_from_ijs.py (1)

55-86: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add one torch-backed ASE smoke test alongside these parity checks.

These tests only cover NumPy inputs, but build_neighbor_graph_ase is also used from the torch pt_expt graph path. A minimal CPU torch case here would lock in that contract and catch host-conversion regressions in the ASE builder much earlier.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/common/dpmodel/test_from_ijs.py` around lines 55 - 86, Add a
small CPU torch smoke test for build_neighbor_graph_ase alongside the existing
NumPy parity checks, since the function is also exercised through the pt_expt
path. Reuse the same test module and symbols (build_neighbor_graph_ase,
build_neighbor_graph) but pass torch tensors for coord, atype, and box, then
assert the neighbor set matches the in-tree reference to catch host-conversion
regressions. Keep the new case minimal and focused on verifying torch-backed
inputs work end to end.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/dpmodel/model/make_model.py`:
- Around line 406-423: The graph path in make_model.py is treating padded
virtual atoms as real by passing the full flattened atype into
call_lower_graph() and setting mask to all ones. Update the lower-graph route in
the same model methods to either fall back to the legacy/dense path whenever any
local type is negative, or propagate a real-atom mask through the graph lowering
so atom_energy and energy are only exposed for valid atoms. Be sure the fix is
applied consistently in the call_lower_graph handling and the related graph-path
code referenced by the same symbols.
- Around line 318-347: Graph routing in model prediction is missing
`charge_spin`, so `self._call_common_graph()` can silently change behavior for
charge-spin models while the legacy `model_call_from_call_lower()` path still
receives `charge_spin=cs`. Update the routing in `make_model.py` around
`graph_method` / `neighbor_graph_method` to detect when `cs` is present and
force those cases through the legacy dense-nlist path instead of calling
`_call_common_graph()` or `call_lower_graph()` until the graph lower layer can
accept `charge_spin`. Apply the same fallback in the other affected prediction
block referenced by the comment so both paths stay behaviorally consistent.

In `@deepmd/dpmodel/utils/neighbor_graph/ase_builder.py`:
- Around line 86-90: The ASE neighbor-graph path in ase_builder.py is passing
backend tensors straight into np.asarray() for coord and box, which breaks when
make_model.py provides raw PyTorch tensors. Update the ASE builder flow around
the coord_np/box_np construction to explicitly detach tensors, move them to CPU,
and then convert to NumPy before reshaping, so neighbor_graph_method="ase" works
for CUDA and autograd-backed inputs.

In `@deepmd/dpmodel/utils/neighbor_graph/env.py`:
- Around line 82-87: The `EnvMat.call` neighbor-graph path is overriding all
near-zero lengths with `1`, which changes `denom` and breaks parity with the
dense formula for real overlapping atoms when `protection > 0`. Update the
`safe_len` handling in `env.py` so only masked padding edges use an arbitrary
finite fallback, while real zero/near-zero edges still preserve `length +
protection` semantics in the `t0` computation.

In `@deepmd/dpmodel/utils/neighbor_graph/from_ijs.py`:
- Around line 97-100: Normalize box to the same backend/device as coord before
any reshaping in from_ijs so it cannot remain on a mismatched CPU/GPU device. In
the box-handling branch, coerce box with the same dev/xp path used for i, j, and
nframe_id before calling xp.reshape, then continue with xp.take and S conversion
as before. Keep the fix localized to the from_ijs flow and preserve the existing
backend inference from coord.

In `@deepmd/pt_expt/model/make_model.py`:
- Around line 283-350: The graph-native default in forward_common_lower_graph is
unsafe for padded or virtual atoms because it flattens the full atype tensor and
then uses an all-ones mask, which can let negative-type slots contribute
energy/forces. Update this path to either fall back to the legacy lower whenever
any local atom type is negative, or pass the real-atom mask through the graph
lower so only valid atoms are carried by call_lower_graph and the related
builder logic in the referenced lower-graph code paths.
- Around line 283-293: The graph auto-routing currently selects the lower-graph
path even for models that require charge-spin input, but
`forward_common_lower_graph()` and `_call_common_graph()` do not accept
`charge_spin`, so those models lose part of their inputs. Update
`_resolve_graph_method()` to keep charge-spin-capable models on the dense/legacy
path unless the graph implementation is extended to consume `charge_spin`, and
make sure the routing logic checks for that input before choosing the graph
path. Use the existing method names `forward_common_lower_graph`,
`_call_common_graph`, and `_resolve_graph_method` to keep the fix scoped to the
model dispatch flow.

---

Nitpick comments:
In `@source/tests/common/dpmodel/test_from_ijs.py`:
- Around line 55-86: Add a small CPU torch smoke test for
build_neighbor_graph_ase alongside the existing NumPy parity checks, since the
function is also exercised through the pt_expt path. Reuse the same test module
and symbols (build_neighbor_graph_ase, build_neighbor_graph) but pass torch
tensors for coord, atype, and box, then assert the neighbor set matches the
in-tree reference to catch host-conversion regressions. Keep the new case
minimal and focused on verifying torch-backed inputs work end to end.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: db2db2ec-2d28-4dd2-ae92-b2b72ddaf4b2

📥 Commits

Reviewing files that changed from the base of the PR and between 99523dc and c2e0d96.

📒 Files selected for processing (21)
  • deepmd/dpmodel/descriptor/dpa1.py
  • deepmd/dpmodel/model/make_model.py
  • deepmd/dpmodel/utils/__init__.py
  • deepmd/dpmodel/utils/neighbor_graph/__init__.py
  • deepmd/dpmodel/utils/neighbor_graph/ase_builder.py
  • deepmd/dpmodel/utils/neighbor_graph/builder.py
  • deepmd/dpmodel/utils/neighbor_graph/derivatives.py
  • deepmd/dpmodel/utils/neighbor_graph/env.py
  • deepmd/dpmodel/utils/neighbor_graph/from_ijs.py
  • deepmd/dpmodel/utils/neighbor_graph/graph.py
  • deepmd/pt_expt/model/edge_transform_output.py
  • deepmd/pt_expt/model/make_model.py
  • source/tests/common/dpmodel/test_call_lower_graph.py
  • source/tests/common/dpmodel/test_dpa1_call_graph_block.py
  • source/tests/common/dpmodel/test_dpa1_call_graph_descriptor.py
  • source/tests/common/dpmodel/test_dpa1_graph_model_energy.py
  • source/tests/common/dpmodel/test_edge_env_mat.py
  • source/tests/common/dpmodel/test_from_ijs.py
  • source/tests/pt_expt/descriptor/test_dpa1.py
  • source/tests/pt_expt/model/test_dpa1_graph_lower.py
  • source/tests/pt_expt/model/test_edge_energy_deriv.py

Comment thread deepmd/dpmodel/model/make_model.py
Comment thread deepmd/dpmodel/model/make_model.py Outdated
Comment thread deepmd/dpmodel/utils/neighbor_graph/ase_builder.py Outdated
Comment thread deepmd/dpmodel/utils/neighbor_graph/env.py Outdated
Comment thread deepmd/dpmodel/utils/neighbor_graph/from_ijs.py
Comment on lines +283 to +293
def forward_common_lower_graph(
self,
atype: torch.Tensor,
n_node: torch.Tensor,
edge_index: torch.Tensor,
edge_vec: torch.Tensor,
edge_mask: torch.Tensor,
do_atomic_virial: bool = False,
fparam: torch.Tensor | None = None,
aparam: torch.Tensor | None = None,
) -> dict[str, torch.Tensor]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Default graph routing ignores charge_spin.

forward_common_lower_graph() / _call_common_graph() have no charge_spin input, but _resolve_graph_method() now auto-selects the graph path for eligible descriptors. That means charge-spin models silently lose part of their input unless callers force "legacy". Please keep those models on the dense path until the graph lower can consume charge_spin.

Also applies to: 376-442

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/model/make_model.py` around lines 283 - 293, The graph
auto-routing currently selects the lower-graph path even for models that require
charge-spin input, but `forward_common_lower_graph()` and `_call_common_graph()`
do not accept `charge_spin`, so those models lose part of their inputs. Update
`_resolve_graph_method()` to keep charge-spin-capable models on the dense/legacy
path unless the graph implementation is extended to consume `charge_spin`, and
make sure the routing logic checks for that input before choosing the graph
path. Use the existing method names `forward_common_lower_graph`,
`_call_common_graph`, and `_resolve_graph_method` to keep the fix scoped to the
model dispatch flow.

Comment thread deepmd/pt_expt/model/make_model.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepmd/dpmodel/descriptor/dpa1.py (1)

669-716: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Honor the optional type_embedding default before calling _call_graph.

Line 669 advertises type_embedding as optional, but None is forwarded into _call_graph and later passed to xp.asarray. Either make it required or populate it from self.type_embedding.call() before use.

Proposed fix
         xp = array_api_compat.array_namespace(graph.edge_vec)
         dev = array_api_compat.device(graph.edge_vec)
+        if type_embedding is None:
+            type_embedding = self.type_embedding.call()
+        atype = xp.asarray(atype, device=dev)
+        type_embedding = xp.asarray(type_embedding, device=dev)
         grrg_node, rot_mat_node = self.se_atten._call_graph(
             graph, atype, type_embedding=type_embedding
         )
@@
         # descriptor-level concat_output_tebd
         if self.concat_output_tebd:
-            tebd = xp.asarray(type_embedding, device=dev)
-            atype_local = xp.asarray(atype, device=dev)
-            atype_embd = xp.take(tebd, atype_local, axis=0)  # (nf*nloc, tebd_dim)
+            atype_embd = xp.take(type_embedding, atype, axis=0)  # (nf*nloc, tebd_dim)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa1.py` around lines 669 - 716, The optional
type_embedding handling in the descriptor graph path is inconsistent because
dpa1.py’s graph-native forward method forwards None into
DescrptBlockSeAtten._call_graph and later into xp.asarray. Update the forward
flow so type_embedding is resolved before use: either make it required in the
public signature or, if absent, fetch it from self.type_embedding.call() before
calling _call_graph and before the concat_output_tebd branch. Keep the fix
localized around forward and the self.se_atten._call_graph path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/dpmodel/descriptor/dpa1.py`:
- Around line 625-627: The dense-path handling of sw in dpa1.py applies the 4-D
nlist_mask before sw has the trailing singleton dimension, which can
mis-broadcast the raw switch values in the graph adapter. Update the sw flow in
the descriptor logic around the dense-path masking so sw is reshaped to (nf,
nloc, nnei, 1) before calling xp.where with nlist_mask, using the same
sw/nlist_mask path in dpa1’s descriptor code.

---

Outside diff comments:
In `@deepmd/dpmodel/descriptor/dpa1.py`:
- Around line 669-716: The optional type_embedding handling in the descriptor
graph path is inconsistent because dpa1.py’s graph-native forward method
forwards None into DescrptBlockSeAtten._call_graph and later into xp.asarray.
Update the forward flow so type_embedding is resolved before use: either make it
required in the public signature or, if absent, fetch it from
self.type_embedding.call() before calling _call_graph and before the
concat_output_tebd branch. Keep the fix localized around forward and the
self.se_atten._call_graph path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 89804960-df1b-4a87-8104-06fe0bb20ecc

📥 Commits

Reviewing files that changed from the base of the PR and between c2e0d96 and 5a487c7.

📒 Files selected for processing (3)
  • deepmd/dpmodel/descriptor/dpa1.py
  • deepmd/dpmodel/utils/neighbor_graph/builder.py
  • source/tests/pt_expt/model/test_dpa1_graph_lower.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/dpmodel/utils/neighbor_graph/builder.py

Comment on lines +625 to +627
nlist_mask = (nlist != -1)[:, :, :, None]
sw = xp.where(nlist_mask, sw, xp.zeros_like(sw))
sw = xp.reshape(sw, (nf, nloc, nnei, 1))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Reshape sw before applying the 4-D mask.

The dense path adds the trailing singleton before masking; doing the mask first can mis-broadcast raw (nf, nloc, nnei) switch values in the graph adapter.

Proposed fix
-        nlist_mask = (nlist != -1)[:, :, :, None]
-        sw = xp.where(nlist_mask, sw, xp.zeros_like(sw))
         sw = xp.reshape(sw, (nf, nloc, nnei, 1))
+        nlist_mask = (nlist != -1)[:, :, :, None]
+        sw = xp.where(nlist_mask, sw, xp.zeros_like(sw))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
nlist_mask = (nlist != -1)[:, :, :, None]
sw = xp.where(nlist_mask, sw, xp.zeros_like(sw))
sw = xp.reshape(sw, (nf, nloc, nnei, 1))
sw = xp.reshape(sw, (nf, nloc, nnei, 1))
nlist_mask = (nlist != -1)[:, :, :, None]
sw = xp.where(nlist_mask, sw, xp.zeros_like(sw))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa1.py` around lines 625 - 627, The dense-path
handling of sw in dpa1.py applies the 4-D nlist_mask before sw has the trailing
singleton dimension, which can mis-broadcast the raw switch values in the graph
adapter. Update the sw flow in the descriptor logic around the dense-path
masking so sw is reshaped to (nf, nloc, nnei, 1) before calling xp.where with
nlist_mask, using the same sw/nlist_mask path in dpa1’s descriptor code.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.15670% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.33%. Comparing base (99523dc) to head (53ec9a0).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/pt_expt/model/edge_transform_output.py 91.83% 4 Missing ⚠️
deepmd/pt_expt/model/make_model.py 86.66% 4 Missing ⚠️
deepmd/dpmodel/descriptor/dpa1.py 96.25% 3 Missing ⚠️
deepmd/dpmodel/model/make_model.py 95.08% 3 Missing ⚠️
deepmd/dpmodel/utils/neighbor_graph/ase_builder.py 93.93% 2 Missing ⚠️
deepmd/dpmodel/utils/neighbor_graph/builder.py 97.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5583      +/-   ##
==========================================
+ Coverage   82.30%   82.33%   +0.03%     
==========================================
  Files         887      891       +4     
  Lines      100161   100758     +597     
  Branches     4043     4058      +15     
==========================================
+ Hits        82433    82955     +522     
- Misses      16269    16345      +76     
+ Partials     1459     1458       -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…g keys, not just energy)

call_lower_graph now reduces via the general fit_output_to_model_output (any
fitting output, not hard-coded 'energy'); pt_expt adds
fit_output_to_model_output_graph mirroring take_deriv but differentiating
w.r.t. edge_vec per scalar component via edge_energy_deriv, so force/virial/
atom_virial generalize to any r_differentiable output. Fixes the KeyError on
non-energy models (dos/dipole/polar/property) with attn_layer=0 descriptors.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepmd/dpmodel/model/make_model.py (1)

585-589: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Reject multi-rank graph metadata instead of ignoring it.

n_local / comm_dict are accepted but ignored, while the reduction uses an all-local mask. A caller passing multi-rank graph data would double-count halo nodes as owned atoms. Since PR-A is single-rank only, fail fast when these are provided.

Proposed guard
 def call_lower_graph(
     self,
     atype: Array,
     n_node: Array,
     edge_index: Array,
     edge_vec: Array,
     edge_mask: Array,
     n_local: Array | None = None,
     fparam: Array | None = None,
     aparam: Array | None = None,
     comm_dict: dict | None = None,
 ) -> dict[str, Array]:
@@
     """
+    if n_local is not None or comm_dict is not None:
+        raise NotImplementedError(
+            "call_lower_graph currently supports single-rank graphs only"
+        )
     xp = array_api_compat.array_namespace(edge_vec)

Also applies to: 646-653

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/model/make_model.py` around lines 585 - 589, The model
reduction path in make_model should not silently accept multi-rank graph
metadata: in the relevant method(s) that take n_local and comm_dict, add a
fail-fast guard that rejects non-None values instead of proceeding with the
all-local mask. Update the same handling in both affected spots so callers of
the model-building/reduction flow get an explicit error when passing multi-rank
graph data, rather than having halo nodes counted as owned atoms.
🧹 Nitpick comments (1)
source/tests/pt_expt/model/test_dos_graph.py (1)

127-161: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Assert graph returns dense keys and cover eval mode.

The current intersection hides missing graph outputs, including derivative keys, and default Module.training=True skips the create_graph=False inference path.

Proposed test tightening
     `@pytest.mark.parametrize`(
         "make_model",
         [_make_dos, _make_dipole, _make_polar, _make_property],
     )  # one builder per fitting kind
-    def test_graph_matches_dense(self, make_model) -> None:
+    `@pytest.mark.parametrize`("training", [True, False])
+    def test_graph_matches_dense(self, make_model, training) -> None:
@@
         ds = _make_descriptor()
         m = make_model(ds)
+        m.train(training)
         graph = m.call_common(self.coord, self.atype, None, do_atomic_virial=True)
@@
-        shared = [
-            k
-            for k in graph
-            if k in dense and graph[k] is not None and dense[k] is not None
-        ]
-        # at least the reduced + per-atom output must be present and shared
-        assert len(shared) >= 2
-        for k in shared:
+        dense_keys = {k for k, v in dense.items() if v is not None}
+        graph_keys = {k for k, v in graph.items() if v is not None}
+        assert dense_keys <= graph_keys
+        for k in sorted(dense_keys):
             torch.testing.assert_close(
                 graph[k].to(torch.float64), dense[k].to(torch.float64), **tol
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt_expt/model/test_dos_graph.py` around lines 127 - 161, Tighten
test_graph_matches_dense so it verifies the graph path returns the same expected
keys as the dense legacy path instead of only comparing the intersection, which
can hide missing outputs like derivative entries. Use the existing m.call_common
flow and compare against the dense result by asserting the key sets (or expected
shared keys) match for each make_model builder, and add an eval-mode case on the
model instance so the inference path with create_graph=False is exercised
instead of relying on the default Module.training=True behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/dpmodel/model/make_model.py`:
- Around line 555-566: The `make_model` path is converting a traced graph value
to a Python int via `int(n_node[0])`, which breaks JAX/JIT-style tracing in the
`dpmodel` backend. Update the `make_model` logic (and the matching reshape path
in the same class) to derive `nloc` from the static atom count using
`atype.shape[0] // nf` instead of reading `n_node[0]`, so the reshape to `(nf,
nloc)` remains compatible with graph tracing.

In `@deepmd/pt_expt/model/edge_transform_output.py`:
- Around line 109-116: The masked intensive reduction in
edge_transform_output.py is still summing padded atoms in the numerator and uses
a denominator shape that is not safe for higher-rank tensors. Update the masked
branch in the reduction logic around the kk_redu handling to apply mask before
summing vv, then divide by a denominator reshaped to match the non-atom
dimensions so it broadcasts correctly for outputs like (nf, 3, 3). Use the
existing tensor/atom_axis logic in this block and keep the unmasked and
extensive paths unchanged.

---

Outside diff comments:
In `@deepmd/dpmodel/model/make_model.py`:
- Around line 585-589: The model reduction path in make_model should not
silently accept multi-rank graph metadata: in the relevant method(s) that take
n_local and comm_dict, add a fail-fast guard that rejects non-None values
instead of proceeding with the all-local mask. Update the same handling in both
affected spots so callers of the model-building/reduction flow get an explicit
error when passing multi-rank graph data, rather than having halo nodes counted
as owned atoms.

---

Nitpick comments:
In `@source/tests/pt_expt/model/test_dos_graph.py`:
- Around line 127-161: Tighten test_graph_matches_dense so it verifies the graph
path returns the same expected keys as the dense legacy path instead of only
comparing the intersection, which can hide missing outputs like derivative
entries. Use the existing m.call_common flow and compare against the dense
result by asserting the key sets (or expected shared keys) match for each
make_model builder, and add an eval-mode case on the model instance so the
inference path with create_graph=False is exercised instead of relying on the
default Module.training=True behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5a42be10-6fff-44b0-b303-ddf2472280ba

📥 Commits

Reviewing files that changed from the base of the PR and between 5a487c7 and 7cfb2a8.

📒 Files selected for processing (5)
  • deepmd/dpmodel/model/make_model.py
  • deepmd/pt_expt/model/edge_transform_output.py
  • deepmd/pt_expt/model/make_model.py
  • source/tests/common/dpmodel/test_call_lower_graph.py
  • source/tests/pt_expt/model/test_dos_graph.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/common/dpmodel/test_call_lower_graph.py

Comment thread deepmd/dpmodel/model/make_model.py Outdated
Comment on lines +109 to +116
if mask is not None:
model_ret[kk_redu] = torch.sum(
vv.to(redu_prec), dim=atom_axis
) / torch.sum(mask, dim=-1, keepdim=True)
else:
model_ret[kk_redu] = torch.mean(vv.to(redu_prec), dim=atom_axis)
else:
model_ret[kk_redu] = torch.sum(vv.to(redu_prec), dim=atom_axis)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Apply the mask to the intensive numerator and reshape it rank-safely.

Line 110 sums all atoms even when mask marks padded atoms invalid, then divides by the real-atom count. Also, the (nf, 1) denominator is not broadcast-safe for multi-rank shapes like (nf, 3, 3).

Proposed fix
         if vdef.intensive:
             if mask is not None:
+                v_red = vv.to(redu_prec)
+                mask_view = mask.to(dtype=redu_prec, device=v_red.device).reshape(
+                    mask.shape + (1,) * len(shap)
+                )
                 model_ret[kk_redu] = torch.sum(
-                    vv.to(redu_prec), dim=atom_axis
-                ) / torch.sum(mask, dim=-1, keepdim=True)
+                    v_red * mask_view, dim=atom_axis
+                ) / torch.sum(mask_view, dim=atom_axis)
             else:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if mask is not None:
model_ret[kk_redu] = torch.sum(
vv.to(redu_prec), dim=atom_axis
) / torch.sum(mask, dim=-1, keepdim=True)
else:
model_ret[kk_redu] = torch.mean(vv.to(redu_prec), dim=atom_axis)
else:
model_ret[kk_redu] = torch.sum(vv.to(redu_prec), dim=atom_axis)
if mask is not None:
v_red = vv.to(redu_prec)
mask_view = mask.to(dtype=redu_prec, device=v_red.device).reshape(
mask.shape + (1,) * len(shap)
)
model_ret[kk_redu] = torch.sum(
v_red * mask_view, dim=atom_axis
) / torch.sum(mask_view, dim=atom_axis)
else:
model_ret[kk_redu] = torch.mean(vv.to(redu_prec), dim=atom_axis)
else:
model_ret[kk_redu] = torch.sum(vv.to(redu_prec), dim=atom_axis)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/model/edge_transform_output.py` around lines 109 - 116, The
masked intensive reduction in edge_transform_output.py is still summing padded
atoms in the numerator and uses a denominator shape that is not safe for
higher-rank tensors. Update the masked branch in the reduction logic around the
kk_redu handling to apply mask before summing vv, then divide by a denominator
reshaped to match the non-atom dimensions so it broadcasts correctly for outputs
like (nf, 3, 3). Use the existing tensor/atom_axis logic in this block and keep
the unmasked and extensive paths unchanged.

…ual-atom fallthrough, env protection parity, ASE/from_ijs device)
@wanghan-iapcm wanghan-iapcm marked this pull request as draft June 25, 2026 15:18
Han Wang and others added 12 commits June 25, 2026 23:39
The dpa1 graph dense->graph adapter (from_dense_quartet -> call_graph)
derived the per-edge neighbor type from the local owner atype[mapping[
neighbor]]. That equals the neighbor's true extended type atype_ext[
neighbor] for every PHYSICAL input (a ghost is a periodic image of its
owner), but diverges for an arbitrary external quartet whose mapping is
inconsistent -- e.g. the naively-permuted mapping in the universal
descriptor fixture. The dense path reads atype_ext[neighbor] directly,
so test_exclude_types (which routes the graph-eligible attn_layer=0
dpa1 through the adapter) failed graph-vs-dense parity.

Thread an optional per-edge nei_type override into call_graph/_call_graph;
the adapter computes it from atype_ext[neighbor] (matching the converter's
compact=False edge ordering). Geometry-native (carry-all) builders pass
None and keep using atype[src], correct for all real inputs.

Adds a non-vacuous regression test (corrupts ghost types so ghost type
!= owner type and asserts a ghost is actually a neighbor).
…o ghost-free

The test_exclude_types failure on deepmodeling#5583 was NOT a graph bug -- it was an
invalid test fixture. extend_coord_with_ghosts tiles the local atype and
aidx (nlist.py:369-370), so atype_ext[k] == atype[mapping[k]] is a hard
single-rank invariant: a ghost is a periodic image of its owner and shares
its type. The ghost-free graph (src = mapping[neighbor], nei_type =
atype[src]) is therefore correct.

The shared TestCaseSingleFrameWithNlist fixtures permuted `mapping` as
`mapping[:, perm]` WITHOUT remapping the local-index VALUES through
inv_perm (which they DO apply to nlist), desyncing atype_ext from mapping
and even breaking local self-mapping -- a quartet extend_coord_with_ghosts
can never emit. The ghost-free graph (correct) then diverged from dense on
this impossible input.

- Fix the 4 duplicated fixture copies: mapping permutation now mirrors the
  nlist one, `inv_perm[mapping[:, perm]]` (universal/common/cases/cases.py
  -- the copy the universal CI test imports -- plus common/dpmodel,
  common/test_mixins, pd/model/test_env_mat).
- Revert the per-edge nei_type channel (commit 99c707a) back to the
  ghost-free atype[src] gather; it was treating a symptom of the bad fixture.
- Replace the now-invalid inconsistent-mapping regression test with
  test_single_rank_extension_keeps_type_invariant, which pins
  atype_ext[k]==atype[mapping[k]] on extend_coord_with_ghosts output.

Multi-rank (real halo ghosts with independent types) needs the extended
atype table + comm fold and is the documented PR-B follow-on.
…tomic_graph; drop _graph_descriptor_fitting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant