Releases: pathsim/fastsim
Release list
v0.16.0 — native BVP solver, AlgebraicConstraint block, sparse implicit linear solver
Highlights
Native BVP1D block — scipy.solve_bvp rebuilt natively (Kierzenka–Shampine 4th-order Lobatto-IIIa/Simpson collocation + residual-based mesh refinement) with the Newton Jacobian from auto-differentiation of the traced fun/bc/icond. Matches scipy to 1e-7/1e-8 and is 80–340x faster (cold/warmstarted). Supports free parameters (eigenvalues, unknown fluxes) and interior/multipoint conditions at arbitrary ports (beyond scipy). Allocation-free hot path.
AlgebraicConstraint block — solves F(x, u) = 0 for x each evaluation (warmstarted Newton, AD Jacobian). The base primitive for instantaneous algebraic relations (chemical equilibrium, flash/VLE, steady-state operating points, implicit constitutive laws); a zeroed rate recovers the quasi-steady-state approximation.
Sparse implicit linear solver — LinearSolver now caches the sparse symbolic LU (pattern-keyed) and solves in place, shared across the implicit stage solvers, the DAE inner-Newton, and the BVP collocation. Measured speedups vs the previous implementation: DAE 2.2x, large banded-sparse stiff systems ~1.5x, small stiff 1.15–1.35x — bit-identical results.
Other changes
- BVP1D and all traced blocks now take inputs dynamically —
n_inputsremoved (no block declares an input count). - Method-of-lines stiff PDE benchmarks (Brusselator, heat) added to the suite to keep the sparse implicit path measured (
mol_pde). - Hand-written block classes (BVP1D, AlgebraicConstraint, Scope, Spectrum) unified onto the central registry docstrings (pathsim format) and the standard
info()introspection.
Full test suite: 397 Rust + 1365 Python tests passing.
v0.15.1 — tracer coverage + tape-lowering optimization
Patch over v0.15.0 (clippy lint gate fix only; runtime identical).
Tracer now covers array methods (x.sum()/dot()/clip()/...), extended ufuncs (radians, fmin/fmax, exp2, copysign, logaddexp, heaviside, expit), np.interp, constant factories as assignment targets (arange/linspace/eye/diag), extended indexing (constant fancy lists, negative steps, Ellipsis, newaxis), and mixed scalar/array ufunc dispatch. Python % and np.remainder now lower with correct floored-mod semantics. New tape-lowering pipeline (value-numbering canonicalization + chain fusion into Reduce/Dot kernels) cuts AD-Jacobian tapes ~38%; codegen output unchanged. Backed by a tracer coverage corpus and differential fuzzers (traced vs eager numpy).
v0.15.0 — tracer coverage + tape-lowering optimization
Tracer now covers array methods (x.sum()/dot()/clip()/...), extended ufuncs (radians, fmin/fmax, exp2, copysign, logaddexp, heaviside, expit), np.interp, constant factories as assignment targets (arange/linspace/eye/diag), extended indexing (constant fancy lists, negative steps, Ellipsis, newaxis), and mixed scalar/array ufunc dispatch. Python % and np.remainder now lower with correct floored-mod semantics. New tape-lowering pipeline (value-numbering canonicalization + chain fusion into Reduce/Dot kernels) cuts AD-Jacobian tapes ~38%; codegen output unchanged. Backed by a tracer coverage corpus and differential fuzzers (traced vs eager numpy).
v0.14.0 — struct-only codegen (hierarchical + library + pure-discrete)
Struct API now honors structure (hierarchical: per-block blk_i_alg/blk_i_deriv) and layout (library: blocks.{h,c} + solver.{h,c}), and supports pure-discrete models (n_state==0). The plain API is removed; struct is the sole codegen path (reentrant, embeddable via get_signal/set_signal). FMU export intact.
v0.13.0
FMI 3.0 Model Exchange FMU export: turn any fastsim Simulation (or Subsystem) into a portable, self-contained source FMU.
FMU export (sim.to_fmu(...), block.to_fmu(...))
- FMI 3.0 source FMU: emits
modelDescription.xmlplus C sources (fmi3.h,model.{c,h},fmu.c,buildDescription.xml) packaged as a.fmuzip. Built on the struct-everything codegen path (reentrantmodel_t, no globals), so the FMU compiles to a single translation unit. - Continuous Model Exchange: states, derivatives, outputs and parameters map to FMI value references straight from the codegen
ModelLayout(single source of truth for the variable map and theSIG_enum). Verified against the native run via self-import.
Directional derivatives (fmi3GetDirectionalDerivative)
- Analytic forward-mode AD lowered to C (
model_jvp): a tangent pass parallel to the primal, mirroring the SSA autodiff rules. Covers the full Jacobian surface: knowns over states, inputs and parameters; unknowns over derivatives and outputs (∂y/∂x,∂ẋ/∂u,∂y/∂u,∂ẋ/∂p). - Tangents for
min/maxreductions (subgradient select),fmod, and 1-D LUTs (segment slope); afastsim_digammaC helper backslgamma/tgammaderivatives.
Events
- Full event interface:
fmi3GetEventIndicators/fmi3CompletedIntegratorStep/fmi3UpdateDiscreteStatesfor zero-crossing, condition and periodic events.valuesOfContinuousStatesChangedis reported only for state-modifying effects.
Subsystem (open-system) export
- A
Subsystem's interface inputs become FMI input variables (set viafmi3SetFloat64), interface outputs become FMI outputs, internal block outputs become locals. Resolves interface ports through arbitrary nesting and fan-out. Parameters are nowtunable.
Closed continuous systems behave exactly as before.
v0.12.0 — vectorized Dot/Reduce + wider tracer surface
Two performance & ergonomics additions on top of the consolidated SSA core. No Python API change; still a drop-in for pathsim.
Vectorized Dot/Reduce (~1.55x on the tape matvec).
- The canonical 4-lane multiply-add
dot/reducenow live in the op manifest (ssa::op), shared by the nativeF64Builder, the interpreter, and the flat tape — so all three agree bit-for-bit (previously the native and tape paths used different reduction orders). - The tape's
Dot/Reducegather their operands into a contiguous scratch and run the 4-lane kernel, killing the per-element libmfma()call on the portable build and breaking the FP-add dependency chain. - Bench
jit_tape/matvec_dot: -36% (n=8), -35% (n=32), p<0.01.
Wider tracer surface.
np.deg2rad/np.rad2deg/np.square/np.reciprocalandnp.diff/np.cumsumnow trace, evaluated bit-for-bit like numpy. They lower to compositions of existing ops (no new SSA op), so they get autodiff and C codegen for free — custom Python that uses them still compiles to a fused tape and to C.
Verified: Rust 372 lib + integration + differential fuzzer (interpret==tape bit-exact), native/codegen vector parity, clippy -D warnings, Python 607 passed / 302 subtests.
v0.11.0 — SSA graph as the single source of truth
Minor release consolidating the SSA-graph architecture. The SSA graph is now fastsim's physical single source of truth, and the module tree reflects it. No Python API change: still a drop-in replacement for pathsim, all tests green.
Highlights:
- New
ssamodule — the symbolic-numeric core (pyo3-free, always compiled): the op graph, the op manifest, the canonical f64 semantics, the fast tape evaluator, the optimizer, autodiff, and the native/symbolicBuilder. The block runtime closures, the IR,compile, and the C codegen all attach here. - New
tracermodule — the Python tracing frontend, one of several producers of an SSA graph (the misnamedjitmodule is gone). - Op manifest (
ssa/op.rs) — the op vocabulary, canonical f64 semantics, flat-tape opcodes + mapping, and codegen C math-function names now live in one place; codegen is a thin backend. - Typed flat→structured slot seam (
blockops::slot_kind), replacing string parsing in the IR decoder. - Block scheduler renamed to
utils::schedule::Schedule, leaving exactly oneGraphtype in the crate (the SSA op graph).
Verified: Rust 372 lib + integration + differential fuzzer (2000 seeds, interpret==tape bit-exact), clippy -D warnings, Python 599 passed / 302 subtests, codegen C-compile matrix. Compiled vs interpreted event handling confirmed bit-identical.
v0.10.8 — SSA graph as single source of truth
Internal architecture refactor. No Python API change — still a drop-in replacement for pathsim; all tests green.
- Extracted the SSA graph core into its own
ssamodule (graph, op manifest, tape, optimize, autodiff, build), pyo3-free and always compiled. The Python tracing frontend is nowtracer(the misnamedjitmodule is gone). - New
ssa/op.rsop manifest: the op vocabulary, canonical f64 semantics (apply_*), the flat-tape opcodes + Node→opcode mapping, and the codegen C math-function names all live in one place. codegen is now a thin backend. - Typed the flat→structured slot seam via
blockops::slot_kind(no more string parsing of slot names in the IR decoder). - Renamed the block scheduler to
utils::schedule::Schedule, so there is exactly oneGraphtype in the crate (the SSA op graph).
Verified: Rust 372 lib + integration + differential fuzzer (2000 seeds, interpret==tape bit-exact), clippy -D warnings, Python 599 passed / 302 subtests, codegen C-compile matrix.
v0.10.7
Fixes Simulation.compile() silently discarding the source simulation's solver choice.
CompiledSimulation
- Solver inheritance:
compile()now carries over the source simulation's solver, adaptive tolerances (tolerance_lte_abs/tolerance_lte_rel) and timestepdt, so a compiled run integrates the same problem with the same method. Previously it fell back to the default explicitRKBS32, which on a stiff model is stability-bound and took orders of magnitude more steps than the implicit solver the user selected (e.g. a stiff Van der Pol run ballooned to millions of micro-steps). - Adaptive gating: the compiled run loops now gate adaptive stepping by the solver's own adaptivity (
adaptive && solver.is_adaptive), mirroringSimulation. A fixed-step solver combined with events no longer drives the event locator's step size down todt_min(a runaway).
Simulation
- Added a
solvergetter returning the active solver's class name, sosim.compile().solver == sim.solver.
Override any inherited setting afterwards via set_solver / dt / log on the compiled object.
v0.10.6
Brings CompiledSimulation (the statically compiled run object from Simulation.compile()) to parity with Simulation.
CompiledSimulation
- Logging:
compile()logs aCOMPILEsummary and eachrun()prints the sameSOLVERsetup line and interleavedTRANSIENTprogress aSimulationrun does. The continuous run loop now drives the standalone solver's per-steptake_step(mirroringintegrate, numerics unchanged) so progress is reported per step. - API: a
logtoggle andset_timeon the compiled object; detailed docstrings on every public method and property, in the established pybinding style. - Internals: the compiled runtime moved to its own module (
src/compile/runtime.rs), leavingcompile/mod.rsas the compiler.