格点积分优化 by Auxiliarycirclefzy · Pull Request #371 · abacusmodeling/abacus-develop

Auxiliarycirclefzy · 2026-06-07T07:04:32Z

我们是 bxcx-044b 小组
小组成员：涂敞2300011032 冯梓烨2300011022 闫奕成2300011070
对于格点积分模块，在这个PR中我们进行了并行的优化以及数据重排分块的优化
辛苦老师批阅！

* Increase md_nstep from 3 to 4 * Increase md_nstep from 3 to 4 in INPUT file

…#6884) * Feature: Support ML EXX for training script. * Update the interface to libnpy * Refactor: Update the interface of libnpy in ml_tools * Refactor: Implement the class ML_Base, which is the base class of KEDF_ML * Feature: Add support to ML_EXX for KSDFT and OFDFT * Fix: Update hamilt_pw.cpp * Update ml_base.h and ml_base.cpp * Fix: Modify pot_ml_exx.cpp to avoid negative value of rho * Divide ml_base.cpp to ml_base.cpp and ml_base_pot.cpp * Fix: Update pot_ml_exx.cpp

…odeling#6881) * Refactor: save memory for kinetic and overlap force and stress * Test: add UT for ekinetic_new and overlap_new * Fix: error of force and stress after refactor * Fix: UT for ekinetic and overlap * Fix: gamma_only error of force_stress of edm * Refactor: unify force/stress calculation for overlap and ekinetic operators * Fix: overlap force stress error for nspin=2 * split test to serial part and parallel part --------- Co-authored-by: dyzheng <zhengdy@bjaisi.com>

…les (deepmodeling#6878) * update the examples of 02_NAO_Gamma * update * udpate * update * update tests in 02_NAO_Gamma * small updates of write_HS.hpp * update the format of H(k) and S(k) * update write_HS.hpp * update * update the number of md steps to make it equal to the input parameter, now md steps starts from 1, originally it starts from 0 * update 02_NAO_Gamma examples * add examples 002 and 003 in 02_NAO_Gamm * update examples 41 and 42 * updates of 43 and 57 examples * update example 17 in 03_NAO_multik * update 44 example of 03_NAO_multik * update 092 in 01_PW * update 01_PW examples * update 04_FF examples * update 05_rtTDDFT examples * update 06_SDFT examples * update 07_OFDFT examples * update 15_rtTDDFT_GPU examples * update 16 and 17 examples in 15_rtTDDFT_GPU * update 02 * fix bug * fix bug * update * update 16_SDFT_GPU * update * update 02 data * update 005 example in 02_NAO_Gamma * add 006 in 02 * update CASES_CPU.txt * fix a bug in 08_EXX 06 * fix bugs * update alllog test * fix a bug, when reading the orbital files and something went banana, the code should not quit immediately * update of some formats * fix a small bug * update examples in 03_NAO_multik * update * update 35 example for pchg * update dipole output in rt examples * update 01 example in rt-TDDFT * update rt-TDDFT input files * update some INPUT files in rt-TDDFT * Fix: Add missing return true in read_orb_file function to prevent double free error * fix unittests * update CASES_CPU.txt in 03_NAO_multik * Modify output filename from INPUT to INPUT.info in driver.cpp * update catch_properties --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

Replace token-based authentication with OIDC (OpenID Connect) for codecov-action. This is more secure and eliminates the need to manage upload tokens. Changes: - Add use_oidc: true to codecov-action configuration - Add id-token: write permission at workflow level - Remove token parameter from codecov-action (ignored when using OIDC) This improves security and follows codecov-action best practices. Generated by the task: njzjz-bot/njzjz-bot#25.

…esolver (deepmodeling#6892) * Refactor: Encapsulate timer functionality in timer_wrapper.h * Refactor timer code and clean_esolver function 1. Remove #ifdef __MPI from timer code, encapsulate in timer_wrapper.h 2. Move ESolver clean logic to after_all_runners method 3. Replace clean_esolver calls with direct delete p_esolver 4. Remove #ifdef __MPI from delete p_esolver 5. Add Cblacs_exit(1) in after_all_runners for LCAO calculations --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

* Refactor: Encapsulate timer functionality in timer_wrapper.h * Refactor timer code and clean_esolver function 1. Remove #ifdef __MPI from timer code, encapsulate in timer_wrapper.h 2. Move ESolver clean logic to after_all_runners method 3. Replace clean_esolver calls with direct delete p_esolver 4. Remove #ifdef __MPI from delete p_esolver 5. Add Cblacs_exit(1) in after_all_runners for LCAO calculations * Refactor: Move heterogeneous parallel code to source_base/module_device --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

…ing#6888) * Feature: add Hessian operator <\phi|\nabla_x\nabla_y|\phi> * fix: UT of twocenterintegral --------- Co-authored-by: dyzheng <zhengdy@bjaisi.com>

* Refactor: Encapsulate timer functionality in timer_wrapper.h * Refactor timer code and clean_esolver function 1. Remove #ifdef __MPI from timer code, encapsulate in timer_wrapper.h 2. Move ESolver clean logic to after_all_runners method 3. Replace clean_esolver calls with direct delete p_esolver 4. Remove #ifdef __MPI from delete p_esolver 5. Add Cblacs_exit(1) in after_all_runners for LCAO calculations * Refactor: Move heterogeneous parallel code to source_base/module_device * Refactor heterogeneous parallel code and migrate exx_info to module_xc 1. Refactor global.h: - Removed heterogeneous parallel code (CUDA/ROCm error checking macros) - Added include for source_base/module_device/device_check.h - Removed GlobalC::exx_info declaration 2. Migrate exx_info: - Added GlobalC::exx_info declaration to exx_info.h - Created exx_info.cpp with GlobalC::exx_info definition - Removed exx_info definition from global.cpp - Removed duplicate exx_info definition from exx_helper.cpp 3. Update build system: - Added exx_info.cpp to xc_ library in CMakeLists.txt - Added exx_info.o to OBJS_XC in Makefile.Objects - Fixed formatting in Makefile.Objects 4. Ensure compatibility: - Verify pure PW compilation works with exx_info.cpp - Verify GPU compilation works with refactored code This refactoring improves code modularity by separating heterogeneous parallel functionality from global variables and moving EXX-related global variables to their own module. --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

…kage (deepmodeling#6898) * Refactor: Encapsulate timer functionality in timer_wrapper.h * Refactor timer code and clean_esolver function 1. Remove #ifdef __MPI from timer code, encapsulate in timer_wrapper.h 2. Move ESolver clean logic to after_all_runners method 3. Replace clean_esolver calls with direct delete p_esolver 4. Remove #ifdef __MPI from delete p_esolver 5. Add Cblacs_exit(1) in after_all_runners for LCAO calculations * Refactor: Move heterogeneous parallel code to source_base/module_device * Refactor heterogeneous parallel code and migrate exx_info to module_xc 1. Refactor global.h: - Removed heterogeneous parallel code (CUDA/ROCm error checking macros) - Added include for source_base/module_device/device_check.h - Removed GlobalC::exx_info declaration 2. Migrate exx_info: - Added GlobalC::exx_info declaration to exx_info.h - Created exx_info.cpp with GlobalC::exx_info definition - Removed exx_info definition from global.cpp - Removed duplicate exx_info definition from exx_helper.cpp 3. Update build system: - Added exx_info.cpp to xc_ library in CMakeLists.txt - Added exx_info.o to OBJS_XC in Makefile.Objects - Fixed formatting in Makefile.Objects 4. Ensure compatibility: - Verify pure PW compilation works with exx_info.cpp - Verify GPU compilation works with refactored code This refactoring improves code modularity by separating heterogeneous parallel functionality from global variables and moving EXX-related global variables to their own module. * Move GlobalC::restart to source_io/restart 1. Move GlobalC::restart declaration from global.h to restart.h 2. Move GlobalC::restart definition from global.cpp to restart.cpp 3. Keep the same functionality and usage 4. Improve code modularity by centralizing restart-related code in source_io module 5. Ensure compatibility with both pure PW and GPU compilation modes * Remove unnecessary global.h includes and fix line_search.cpp compilation error * update global.h * update global.h * update global.h * update global.h * update stress_pw.cpp * update global.h * update global.h in module_pwdft * update global.h * update module_stodft * delete global.h in source_io * fix source_io * delete inclusion of global.h in source_io * Refactor: Remove unnecessary includes and clean up global.h references * delete global.h in source_lcao * update * update * fix * update * fix * update source_cell * update source_esolver * update esolver * update * update module_charge * update module_pot * continue * update fix * fix * update dftu * update deepks * ifx * delete globalc.h in module_ri * fix * fix * fix dftu_io * fix diago_lapack.cpp * updates * update rdmft * update module_rt * update td operator * update module_pwdft/operator etc * solve fft * update xc * update * delete global.h and global.cpp, finally after nearly 20 years * fix op_exx_lcao * fix * fix --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

The gint_gpu_vars.h file already exists in the kernel directory. This temp_gint directory was left over from a previous refactoring. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…ling#6886) * Add files via upload * Add files via upload * Add files via upload * Add files via upload * Delete source/ctrl_output_td.h * Add files via upload * Add files via upload * Add files via upload * Add files via upload * Update td_info.cpp * Update td_current_io_comm.cpp --------- Co-authored-by: Mohan Chen <mohanchen@pku.edu.cn>

…eling#6903)

* Fix: Add override to Pot_ML_EXX::cal_v_eff to avoid compilation warning. * Fix: Provide a clearer, friendlier error when ML KEDF is used without ENABLE_MLALGO. * Fix: Add validation for out_elf and spin=4 combo.

* Refactor: Encapsulate timer functionality in timer_wrapper.h * Refactor timer code and clean_esolver function 1. Remove #ifdef __MPI from timer code, encapsulate in timer_wrapper.h 2. Move ESolver clean logic to after_all_runners method 3. Replace clean_esolver calls with direct delete p_esolver 4. Remove #ifdef __MPI from delete p_esolver 5. Add Cblacs_exit(1) in after_all_runners for LCAO calculations * Refactor spar_exx.h: add English comments and improve dependency structure - Added detailed English comments to cal_HR_exx function - Moved implementation to cpp file and added explicit instantiations - Improved header file organization with sections - Removed unnecessary LCAO_hamilt.hpp include - Enhanced endif comments for better code readability * Remove empty LCAO_hamilt.hpp file The LCAO_hamilt.hpp file was empty after moving its implementation to spar_exx.cpp. This commit removes the unused header file and updates all references to it. * Fix circular dependency between exx_info.h and xc_functional.h - Removed #include xc_functional.h from exx_info.h - Removed #include exx_info.h from xc_functional.h This breaks the circular dependency between these two header files, allowing them to compile independently. * Fix dependencies in LCAO sparse format headers - Removed unnecessary #include source_lcao/hamilt_lcao.h from spar_dh.h, spar_hsr.h, and spar_u.h - Added direct dependencies to spar_dh.h: matrix.h, parallel_orbitals.h, two_center_bundle.h, ORB_read.h - Adjusted include order in spar_hsr.h and spar_u.h - Added necessary include to spar_hsr.cpp for HamiltLCAO access * Add necessary includes to cpp files for compilation - Added xc_functional.h include to esolver_ks_pw.cpp for XC_Functional class access - Added xc_functional.h include to input_conv.cpp for XC_Functional class access - Added parallel_comm.h include to op_exx_pw.cpp for KP_WORLD communication - Added global_variable.h and exx_info.h includes to stress_pw.cpp for GlobalC namespace access These changes fix compilation errors caused by the dependency refactoring. * add #include <RI/global/Tensor.h> in spar_hsr.h --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn> Co-authored-by: Xiaoyang Zhang <tsfxwbbzxy@163.com>

* Fix header * Fix header

…ptimized doc (deepmodeling#6858) * Feature&Doc: support fix_axes and fix_ibrav for relax_new=false * Fix: UT error of fixed_axes

…g#6912) * Feature: support ELF with non-collinear spin (nspin = 4) * fix: UT for write_elf

* Refactor: Encapsulate timer functionality in timer_wrapper.h * Refactor timer code and clean_esolver function 1. Remove #ifdef __MPI from timer code, encapsulate in timer_wrapper.h 2. Move ESolver clean logic to after_all_runners method 3. Replace clean_esolver calls with direct delete p_esolver 4. Remove #ifdef __MPI from delete p_esolver 5. Add Cblacs_exit(1) in after_all_runners for LCAO calculations * move exx_helper to module_pwdft * rename pw files * Refactor: Move and rename nonlocal_pw files to module_pwdft directory * Refactor: Move and rename velocity_pw, veff_pw, and meta_pw files to module_pwdft directory * Refactor: Move and rename all operator_pw files to module_pwdft directory and clean up * Refactor: Rename stress_func_xxx files to stress_xxx by removing _func suffix * Rename V*_in_pw files to more concise names and update references This commit includes: 1. Renamed files in module_pwdft directory: - VL_in_pw.cpp/h → vl_pw.cpp/h - VNL_in_pw.cpp/h → vnl_pw.cpp/h - VNL_grad_pw.cpp → vnl_pw_grad.cpp - VSep_in_pw.cpp/h → vsep_pw.cpp/h 2. Updated CMakeLists.txt and Makefile.Objects to use new filenames 3. Updated include paths in 41 files across the codebase: - source_cell/test/klist_test.cpp and klist_test_para.cpp - source_esolver/esolver_fp.h, esolver_ks_pw.cpp, esolver_ks_pw.h - source_estate/module_pot/pot_sep.h, potential_new.h, setup_estate_pw.h - source_estate/test/elecstate_pw_test.cpp - source_io/test/for_testing_input_conv.h, for_testing_klist.h - source_lcao/LCAO_set.h - source_psi/psi_initializer.h and related files - source_pw/module_ofdft/of_stress_pw.h - source_pw/module_pwdft/* (multiple files) - source_pw/module_stodft/sto_stress_pw.h 4. Verified compilation success with make -j30 The renaming follows consistent naming conventions and makes filenames more concise. * fix Makefile.Objects * Fix CI/CD error: Update operator_pw paths to new op_pw locations This commit fixes the CI/CD build error by updating references to the old operator_pw directory structure: 1. Updated source/source_hsolver/test/CMakeLists.txt: - Changed all 7 references from '../../source_pw/module_pwdft/operator_pw/operator_pw.cpp' to '../../source_pw/module_pwdft/op_pw.cpp' 2. Updated source/source_hsolver/test/diago_mock.h: - Changed '#include "source_pw/module_pwdft/operator_pw/operator_pw.h"' to '#include "source_pw/module_pwdft/op_pw.h"' The operator_pw directory has been renamed and its files moved to the module_pwdft root directory with op_pw_ prefixes, so these path updates are necessary to ensure CI/CD builds succeed. --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

…deepmodeling#6910) * Fix: Hefei-NAMD interface of file syns_nao.csr and parameter cal_syns * refactor: esolver to ctrl_scf_lcao

…add unit tests (deepmodeling#6917) * add change * add numrial fast * add check for cal_bandgap Boundary * Revert "add change" This reverts commit 78444e9. * add back mlago * add back mlago format

* Refactor: Encapsulate timer functionality in timer_wrapper.h * Refactor timer code and clean_esolver function 1. Remove #ifdef __MPI from timer code, encapsulate in timer_wrapper.h 2. Move ESolver clean logic to after_all_runners method 3. Replace clean_esolver calls with direct delete p_esolver 4. Remove #ifdef __MPI from delete p_esolver 5. Add Cblacs_exit(1) in after_all_runners for LCAO calculations * Fix uninitialized variables in source_pw directory This commit fixes 37 uninitialized variables (19 int, 18 double) in 19 files within the source_pw directory. All variables are initialized to 0 or 0.0 to prevent undefined behavior and improve code safety. Affected files: - source_stodft/sto_wf.cpp - source_stodft/sto_tool.h - source_stodft/sto_iter.h - source_stodft/sto_iter.cpp - source_stodft/sto_forces.cpp - source_pwdft/stress_func_loc.cpp - source_pwdft/soc.cpp - source_pwdft/parallel_grid.cpp - source_pwdft/onsite_projector.cpp - source_pwdft/operator_pw/exx_pw_ace.cpp - source_pwdft/onsite_proj_tools.h - source_pwdft/nonlocal_maths.hpp - source_pwdft/fs_nonlocal_tools.h - source_pwdft/elecond.cpp - source_pwdft/forces_cc.cpp - source_pwdft/VNL_in_pw.cpp - source_pwdft/forces_scc.cpp - source_pwdft/forces.cpp - source_pwdft/VNL_grad_pw.cpp * Fix uninitialized variables in source_lcao directory This commit fixes uninitialized variables (int and double) in 18 files within the source_lcao directory. All variables are initialized to 0 or 0.0 to prevent undefined behavior and improve code safety. Affected files: - spar_hsr.cpp, spar_dh.cpp - wavefunc_in_pw.cpp - module_rt/velocity_op.cpp, module_rt/norm_psi.cpp, module_rt/propagator.cpp, module_rt/td_folding.cpp, module_rt/boundary_fix.cpp - module_ri/exx_abfs-io.cpp, module_ri/module_exx_symmetry/irreducible_sector_bvk.cpp, module_ri/module_exx_symmetry/symmetry_rotation.cpp - module_lr/dm_trans/dmr_complex.cpp, module_lr/esolver_lrtd_lcao.cpp - module_operator_lcao/dspin_force_stress.hpp, module_operator_lcao/dftu_force_stress.hpp - module_hcontainer/func_folding.cpp, module_hcontainer/test/test_hcontainer.cpp - module_gint/set_ddphi.cpp * fix: initialize all uninitialized variables in source_relax and source_md * fix: initialize all uninitialized variables in source_base * fix: initialize all uninitialized variables in source_cell and source_estate * fix: initialize all uninitialized variables in source_hsolver and source_io * fix: initialize all uninitialized variables in source_lcao and source_psi * fix some uninitalized variables * move exx_helper to module_pwdft * rename pw files * Refactor: Move and rename nonlocal_pw files to module_pwdft directory * Refactor: Move and rename velocity_pw, veff_pw, and meta_pw files to module_pwdft directory * Refactor: Move and rename all operator_pw files to module_pwdft directory and clean up * Refactor: Rename stress_func_xxx files to stress_xxx by removing _func suffix * Rename V*_in_pw files to more concise names and update references This commit includes: 1. Renamed files in module_pwdft directory: - VL_in_pw.cpp/h → vl_pw.cpp/h - VNL_in_pw.cpp/h → vnl_pw.cpp/h - VNL_grad_pw.cpp → vnl_pw_grad.cpp - VSep_in_pw.cpp/h → vsep_pw.cpp/h 2. Updated CMakeLists.txt and Makefile.Objects to use new filenames 3. Updated include paths in 41 files across the codebase: - source_cell/test/klist_test.cpp and klist_test_para.cpp - source_esolver/esolver_fp.h, esolver_ks_pw.cpp, esolver_ks_pw.h - source_estate/module_pot/pot_sep.h, potential_new.h, setup_estate_pw.h - source_estate/test/elecstate_pw_test.cpp - source_io/test/for_testing_input_conv.h, for_testing_klist.h - source_lcao/LCAO_set.h - source_psi/psi_initializer.h and related files - source_pw/module_ofdft/of_stress_pw.h - source_pw/module_pwdft/* (multiple files) - source_pw/module_stodft/sto_stress_pw.h 4. Verified compilation success with make -j30 The renaming follows consistent naming conventions and makes filenames more concise. * fix Makefile.Objects * Fix CI/CD error: Update operator_pw paths to new op_pw locations This commit fixes the CI/CD build error by updating references to the old operator_pw directory structure: 1. Updated source/source_hsolver/test/CMakeLists.txt: - Changed all 7 references from '../../source_pw/module_pwdft/operator_pw/operator_pw.cpp' to '../../source_pw/module_pwdft/op_pw.cpp' 2. Updated source/source_hsolver/test/diago_mock.h: - Changed '#include "source_pw/module_pwdft/operator_pw/operator_pw.h"' to '#include "source_pw/module_pwdft/op_pw.h"' The operator_pw directory has been renamed and its files moved to the module_pwdft root directory with op_pw_ prefixes, so these path updates are necessary to ensure CI/CD builds succeed. * fix * fix relax_sync.cpp --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

…deepmodeling#6920) * Refactor: rename to delete new from names of operators in source_lcao * Fix: stress of nonlocal * Fix: complining error --------- Co-authored-by: dyzheng <zhengdy@bjaisi.com>

* Fix docs for init_chg * Fix warning for init_chg * Align format for pseudo warning

…ount) (deepmodeling#7357) dspInitHandle uses MY_RANK % dsp_count but dspDestoryHandle used raw MY_RANK, causing heap corruption when MY_RANK >= dsp_count. Fixes issue deepmodeling#7269.

…md > 1` evolution strategy (deepmodeling#7360) * Remove unnecessary cout in TDDFT current file * Fix RT-TDDFT EXX bug when using estep_per_md * Modify cout format * Fix a compiling issue with respect to std::vector * Update test 08_EXX/14_NO_TDDFT_PBE0

…guard (deepmodeling#7361) * refactor(device): remove dead code from DeviceContext, add dsp_count guard Remove unused device_type subsystem from DeviceContext: - Delete set_device_type(), get_device_type(), is_cpu(), is_gpu(), is_dsp() methods (all zero callers verified via exhaustive search) - Delete is_initialized(), is_gpu_enabled() (zero callers) - Delete device_type_ private field (only consumed by removed methods) - Delete standalone get_device_type(const DeviceContext*) function (zero callers; all 48 call sites use the template version get_device_type(const Device*)) - Delete forward declaration in device_helpers.h Add assert(PARAM.inp.dsp_count > 0) guard in driver.cpp to prevent modulo-by-zero undefined behavior. All other DeviceContext members retained (init(), get_device_id(), get_device_count(), get_local_rank() — all have active callers). Build verified with cmake --build (MPI+LCAO). * fix(dsp): replace assert with runtime WARNING_QUIT for dsp_count assert() is removed in release builds (NDEBUG), leaving modulo-by-zero\nunprotected. Replace with WARNING_QUIT that works in all builds.\n\nAlso remove now-unused #include <cassert> from the #ifdef __DSP block.\n\nAddresses PR review feedback on deepmodeling#7361.

…odeling#7364) * Remove useless headers * add type.h include * Remove headers in test files

- set_phi_dphi_kernel: add WantPhi non-type template parameter and dispatch from the launch site. The dphi-only callers (gint_tau) pass phi=nullptr; with WantPhi==false the compiler drops the phi[] stores and the per-iw `phi != nullptr` branch entirely. - phi_dot_dphi_kernel / phi_dot_dphi_r_kernel: replace the shared- memory tree reduce with a single-warp warpReduceSum and drop the dynamic shared-memory allocation at the launch sites. Launch configuration is pinned at blockDim.x == 32; a comment guards the invariant. - Plain `if` (not `if constexpr`) on WantPhi keeps the code C++11-compliant — ABACUS targets C++11 and nvcc otherwise emits warning deepmodeling#2912-D. WantPhi is still a non-type template parameter, so the compiler folds the constant and eliminates the dead branch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ecision loss (deepmodeling#7368) Across CPU and GPU gint paths, accumulator buffers (hr_gint, phi_dm, rho, and the vbatched GEMM C output) are now always allocated as double, even when the input phi/dm/vr_eff are fp32. Multiplies stay in fp32 (cheap), but per-block and global reductions are widened to fp64 so that summing many atom-pair contributions into the same element does not drift. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ling#7365) * Remove parameter.h * Continue remove parameter.h * Remove parameter.h dependency in pw_basis * Remove dependency in pw_basis_k

…epmodeling#7369) 1. Add check_value callback to yukawa_potential to error out when both yukawa_potential and uramping are enabled simultaneously 2. Skip uramping_update() when Yukawa is enabled (U calculated directly from charge density every iteration) 3. Return true from u_converged() when Yukawa is enabled (U is self-consistently calculated, no ramping convergence needed)

…7370)

…chain setup (deepmodeling#7376)

…ling#7373) * Remove parameter.h * Continue remove parameter.h * Remove parameter.h dependency in pw_basis * Remove dependency in pw_basis_k * refactor(source_basis): remove last parameter.h dependencies Decouple module_ao and module_nao from source_io/parameter.h: - ORB_atomic_lm / ORB_nonlocal_lm: replace PARAM.globalv.global_out_dir with ModuleBase::get_quit_out_dir() (new getter mirroring the existing set_quit_out_dir injection point). - two_center_bundle: thread orbital_dir as a build_orb parameter; replace the two deepks_setorb guards with ndesc>0 / alpha_ non-null checks that are equivalent under the build_alpha invariant. source_basis is now free of parameter.h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(tool_quit): rename get_quit_out_dir to get_global_out_dir Align the getter name with the original PARAM.globalv.global_out_dir it replaces. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ng#7382) * feat(tests): add tests/17_DS_DFTU test suite for DFT+U with deep potential spin constraints * feat(tests): add tests/17_DS_DFTU with CI-disabled configs and READMEs Add the 17_DS_DFTU test suite for DeltaSpin and DFT+U functionality: - 47 test cases covering LCAO/PW basis, collinear/noncollinear spin, DFT+U, DeltaSpin, and their combinations - Comment out tests in tests/CMakeLists.txt and tests/17_DS_DFTU/CMakeLists.txt to prevent CI failure until DeltaSpin code is merged into develop - Add single-line README to each test directory (printed during Autotest.sh) - Rewrite CASES_CPU.txt with clear English comments explaining disabled tests

…7389) * Remove debug output in renormalize_psi Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Fix: CD potential now applied to correct spin channel instead of always spin 0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Refactor: Replace raw new/delete with std::vector in cal_vw_potential_phi for automatic memory management Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Refactor: Replace raw new/delete with std::vector in cal_CD_potential for automatic memory management Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Refactor: Replace abs(x)*abs(x) with std::norm for clarity and consistency with RK2 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Refactor: Remove unused <iostream> include Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Refactor: Remove dead nspin <= 0 checks that can never trigger Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… diagonalization (deepmodeling#7388) * fix(sdft): add CT (Chebyshev Trace) iter_header for pure SDFT without diagonalization Pure SDFT (nbands=0) does not perform KS diagonalization, yet the SCF iteration table borrowed the ks_solver label (CG/DA/etc.). Add a "CT" entry to iter_header_dict and use it when esolver_type=sdft with nbands=0. Mixed SDFT (nbands>0) keeps the actual ks_solver label since it still diagonalizes KS orbitals. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(sdft): add unit tests for SDFT iter_header CT label Verify pure SDFT (nbands=0) outputs "CT" in ITER column, and mixed SDFT (nbands>0) outputs the actual ks_solver label (e.g. "DA"). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* update 18_md examples * update out_chg 2 * update out_pot function * feat(module_io): 优化初始电荷密度/势能输出，支持 out_freq_ion 控制和动态文件名 - 添加 gen_ini_filename() 辅助函数，统一生成初始电荷密度/势能文件名 - out_freq_ion=0 时输出单个固定名称文件（不带 g#） - out_freq_ion>0 时每个几何步输出独立文件（带 g#） - 更新文档，说明两种模式的区别修改文件： - docs/advanced/output_files/output-specification.md - source/source_io/module_chgpot/write_init.cpp - source/source_io/module_chgpot/write_init.h - source/source_io/module_parameter/read_input_item_output.cpp * fix(module_io): 修正 out_freq_ion=0 时初始电荷密度/势能输出逻辑 - out_freq_ion=0 时，每个几何步都输出（覆盖同一个文件） - out_freq_ion>0 时，只在 istep 是 out_freq_ion 倍数时输出 - 更新所有相关文档和注释修改文件： - docs/advanced/output_files/output-specification.md - source/source_io/module_chgpot/write_init.cpp - source/source_io/module_chgpot/write_init.h - source/source_io/module_parameter/read_input_item_output.cpp * fix a bug about out_pot * fix bugs * update * update ELF and add openmp parallel * update elf * update elf * update example reference data * enable elf for rt-tddft, but results are wrong * fix elf test * fix elf test in 03_NAO_multik * fix output of write_elf * fix bug * update potential file, fix bug * fix elf test in ofdft * fix: Move write_pot_init to ElecState::init_scf for correct timing The write_pot_init was being called in ESolver_FP::before_scf before the effective potential was computed. This caused pot_ini.cube to contain: - All zeros for calculation=scf / first ionic step (istep=0) - Converged potential from previous ionic step for relax/md with istep>0 The fix moves write_pot_init to ElecState::init_scf, which is called after pot->init_pot(charge) computes the effective potential from the initial charge density. This ensures pot_ini.cube correctly contains the effective potential corresponding to the initial charge density. Changes: - Modified ElecState::init_scf signature to accept istep, out_dir, inp parameters - Added write_pot_init call after pot->init_pot() in init_scf - Updated pw::setup_pot to pass through the new parameters - Updated all callers (LCAO and PW) to provide the new parameters - Removed the premature write_pot_init call from ESolver_FP::before_scf * Remove unused parameters from ElecState::init_scf - Removed unused 'symm' and 'wfcpw' parameters from init_scf function - Updated all call sites to match the new signature - Simplified function interface by removing parameters not used in implementation * Fix missing io_basic library link in elecstate tests - Added io_basic library dependency to MODULE_ESTATE_elecstate_base test - Added io_basic library dependency to MODULE_ESTATE_elecstate_pw test - Fixes undefined reference to ModuleIO::write_pot_init * update init_scf * fix * fix bug * remove dependence of parameter for write_cube.cpp * fix bugs * fix bug * add a new file init_scf * update estate tests * delete useless inclusion --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

- gint_info.cpp: replace serial init_atoms_ with #pragma omp parallel for using iat2it/iat2ia for atom type lookup, thread-private staging for conflict-free BigGrid add_atom merging - gint_rho.cpp: pre-allocate phi/phi_dm thread buffers to max_phi_len to eliminate per-BigGrid resize() malloc overhead - gint_vl.cpp: pre-allocate phi/phi_vldr3 thread buffers to max_phi_len - gint_fvl.cpp: pre-allocate 6 thread buffers (phi, phi_vldr3, phi_vldr3_dm, dphi_x/y/z) to max_phi_len, eliminate per-BigGrid 6x resize() overhead

* Fix: use correct PSI for GPU in cal_tau for ELF * Refactor: split long cast into readable lines in after_scf

…deepmodeling#7418) * Remove obsolete cross-device copy constructor in HamiltPW * Delete corresponding .h code

…eepmodeling#7420)

…7421) * Fix Molden GTO normalization and coordinate conversion Title: Improve ABACUS Molden output for wavefunction analysis Summary: This PR fixes several Molden conversion issues in tools/molden/molden.py while keeping the default workflow unchanged as much as possible. Changes: - Correct the primitive Gaussian coefficient convention when writing Molden GTO data. The NAO-to-GTO fit uses unnormalized radial primitives, while Molden readers usually normalize primitive Gaussian functions internally. - Fix Cartesian_angstrom coordinate conversion. Coordinates in Angstrom are now converted to Bohr for the Molden [Atoms] AU section by dividing by 0.529177210903. - Add optional multi-start NAO-to-GTO fitting. A single -r value keeps the old single-start behavior; comma-separated -r values enable multi-start fitting and keep the fit with the lowest nonlinear error. - Add optional Molden [Nval] output via --write-nval. The values are read from UPF z_valence. This option is disabled by default. Notes: - The changes are limited to the Molden converter. Validation: - Ran the existing molden.py unit tests successfully. - Checked that default output does not contain [Nval]. - Checked that --write-nval writes C/O/H valence charges for the PhenolDimer test case. - Checked that Cartesian_angstrom coordinates are written at the correct Bohr scale. * Show default values in molden.py CLI help

…g#7383) * update ML KEDF output * Refactor OFDFT ML KEDF logging output to use ofs_running stream Summary of changes: 1. Modified ML_Base::set_device() to accept std::ostream& ofs_running parameter instead of using std::cout directly 2. Updated KEDF_ML::set_para() to pass ofs_running through the call chain 3. Modified KEDF_ML::init_data() to accept ofs_running parameter 4. Updated NN_OFImpl constructor to accept ofs_running parameter for logging nnode/nlayer 5. Modified Cal_MLKEDF_Descriptors::set_para() to accept ofs_running parameter for logging nkernel 6. Updated ML_EXX class methods (set_para, init_data, localTest) to use ofs_running 7. Updated all call sites to pass GlobalV::ofs_running 8. Changed 'NN' to 'Neural Network' in device initialization messages 9. Fixed 'WARNING: ML >= TF' message in KEDF_Manager::get_energy() to use ofs_running 10. Reformatted KEDF_ML::set_para() and cal_tool->set_para() calls with one parameter per line All ML KEDF related output messages now write to the running log file instead of stdout. * fix * fix * update the output formats * update KEDF * output format update * update * fix a potential bug when the net.pt model cannot be found * update kedf and exx * update --------- Co-authored-by: abacus_fixer <mohanchen@pku.eud.cn>

… for Native Windows system) (deepmodeling#7423) * Native Windows port (Phase 1 scaffolding): serial PW build on MinGW-w64 Lay the groundwork for a native Windows serial plane-wave build (no MPI, no LCAO, no ELPA/PEXSI/hybrid). Targets MinGW-w64 GCC, which ships the POSIX headers ABACUS uses and accepts its GCC attributes, so the source needs only minimal, Linux-safe portability shims. - source_base/fs_compat.h (new): portable ModuleBase::make_directory() wrapping _mkdir (Windows) / mkdir(path,0755) (POSIX). The Windows CRT mkdir takes no permission-mode argument. - global_file.cpp, global_function.cpp: route the 7 mkdir(path,0755) call sites through the helper; drop unistd.h/sys/stat.h includes. - CMakeLists.txt: * gate find_package(ScaLAPACK REQUIRED) on ENABLE_MPI so the serial build does not require a distributed-memory library; * define _USE_MATH_DEFINES/NOMINMAX/_CRT_SECURE_NO_WARNINGS on WIN32; * skip -O3 -g default flags and the -lm link for MSVC; * skip the post-install abacus symlink on Windows. - tools/windows/build-native-serial.ps1 (new): MinGW configure/build helper. - docs/advanced/install_windows_native.md (new): native-build documentation. All changes are guarded or platform-neutral; the Linux build is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Native Windows port (Phase 1): serial PW build compiles, links, runs With these fixes the native Windows serial plane-wave build (abacus_pw_ser.exe, MinGW-w64 GCC + OpenBLAS + FFTW) compiles, links, and runs examples/02_scf/01_pw_Si2 to SCF convergence with a deterministic total energy (-215.5057 eV, bit-identical across runs). Build-system fixes: - cmake/FindBlas.cmake, cmake/FindLapack.cmake: the wrappers delegate to CMake's builtin FindBLAS/FindLAPACK, but on the case-insensitive Windows filesystem the wrapper matched itself and recursed forever. Drop our module dir from CMAKE_MODULE_PATH around the builtin call (no-op on Linux). Source portability fixes (all guarded or platform-neutral; Linux unaffected): - module_fft/fft_base.h, fft_cpu.h: remove __attribute__((weak)) from the FFT virtuals. The weak-without-definition pattern relied on the ELF linker resolving unbound weak symbols to null; on Windows/PE (MinGW) it produced null vtable slots, so the first FFT dispatch (FFT_Bundle::setupFFT) called address 0 and segfaulted. Base virtuals get trivial default bodies; the float overrides become concrete via ENABLE_FLOAT_FFTW=ON. - module_parameter/input_conv.h: port the POSIX <regex.h> expression parser to C++ <regex> (MinGW has no <regex.h>). - module_container/base/core/cpu_allocator.cpp: replace posix_memalign with _aligned_malloc/_aligned_free on Windows, applied consistently to both allocate overloads and free. - module_restart/restart.cpp: map POSIX S_IRUSR/S_IWUSR to _S_IREAD/_S_IWRITE and include <io.h> for low-level open/read/write/close on Windows. Tooling/docs: - tools/windows/build-native-serial.ps1: use the verified flags (BLA_VENDOR=OpenBLAS, ENABLE_FLOAT_FFTW=ON, COMMIT_INFO=OFF, the GCC-16 force-include workaround). - docs/advanced/install_windows_native.md: document the gcc-fortran package, the verified build/run, and every source change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix all-zero seeded random wavefunctions in serial (non-MPI) PW build psi_initializer::random_t, in the pw_seed>0 branch, generates per-stick random amplitude/phase into stickrr/stickarg and then distributes them into the gathered tmprr/tmparg arrays via stick_to_pool() -- but that call is guarded by #ifdef __MPI. In a serial build tmprr/tmparg therefore stay zero-initialized, so every seeded random wavefunction is all-zero. This later trips Gram-Schmidt orthonormalization ("psi_norm <= 0.0") and aborts the run. The path is never hit in CI because the integration tests run under MPI. Add the serial counterpart: copy each stick directly into tmprr/tmparg using the same mapping as stick_to_pool()'s rank-0 branch (out[ixy2is_[ir]*nz + iz] = stick[iz]). ixy2is_ is populated for both serial and MPI builds via pw_wfc_->getfftixy2is(). Verified on a representative set of 15 tests/01_PW cases run with the native Windows serial PW build (abacus_pw_ser.exe): all converged total energies now match the official result.ref references to <= ~7e-7 eV. Before this fix the 6 cases using pw_seed with random wavefunctions aborted; the other 9 already matched to ~1e-9 eV. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows: use the existing toolchain + serial test harness, drop bespoke scripts Per review feedback, the native-Windows support should plug into ABACUS's existing build/test infrastructure (like any other backend/variant) rather than carry its own scripts. Build: add a Windows toolchain variant, mirroring toolchain_gnu.sh / build_abacus_gnu.sh: - toolchain/toolchain_windows.sh -- installs the MinGW-w64 prerequisites via pacman on MSYS2 (gcc, gfortran, openblas, fftw, cmake, ninja) plus bc for the test harness; records the prefix in install/setup like the Linux variants. - toolchain/build_abacus_windows.sh -- configures + builds the serial PW binary (ENABLE_MPI/LCAO=OFF, OpenBLAS+FFTW) and writes abacus_env.sh. Removed the one-off tools/windows/build-native-serial.ps1. Test: reuse tests/integrate/Autotest.sh instead of a separate script. Added a serial mode: with -n 0 the harness runs the binary directly (no mpirun), so a serial build (any OS) reuses the standard catch_properties.sh / result.ref comparison. Added tests/integrate/CASES_SERIAL_PW.txt listing serial-PW cases. Validation (build_abacus_windows.sh, then Autotest.sh -n 0 -f CASES_SERIAL_PW.txt): all 15 01_PW cases run; total energies/forces/stresses match the Linux result.ref to ~1e-7 relative. The few WARNINGs (016/017 etot ~1e-7 eV; 003/009/019 stress/force) are absolute-threshold exceedances from cross-platform / cross-BLAS floating point, classified WARNING (not ERROR) by the harness. docs/advanced/install_windows_native.md updated to describe the toolchain + serial-Autotest flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows test: run the whole 01_PW suite, drop the curated case list Per review: the serial PW build should be checked against the existing PW test suite (tests/01_PW) via the standard harness, not a hand-picked subset. - Remove tests/integrate/CASES_SERIAL_PW.txt. The canonical list already exists at tests/01_PW/CASES_CPU.txt and is used by the standard ctest registration (tests/01_PW/CMakeLists.txt runs Autotest.sh from that directory). Serial runs just add -n 0: cd tests/01_PW bash ../integrate/Autotest.sh -a <abacus_pw_ser.exe> -n 0 - .gitattributes: force LF for *.sh and CASES_*.txt so the toolchain scripts, Autotest.sh and the bash-parsed case lists work on a fresh Windows checkout (core.autocrlf would otherwise rewrite them to CRLF). - docs/advanced/install_windows_native.md: document the whole-01_PW serial run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows toolchain: provide a generic `abacus` command after build Mirror the Linux toolchain UX: `source abacus_env.sh` then run `abacus`. build_abacus_windows.sh now copies the configured binary (abacus_pw_ser.exe) to abacus.exe in the build dir. Native Windows symlinks need elevation (so the CMake `abacus` symlink step is skipped on WIN32); the .exe copy lets a bare `abacus` resolve in the MSYS2 shell and in cmd/PowerShell. abacus_env.sh already puts that directory (and the MinGW runtime DLLs via the toolchain setup) on PATH. Verified: source abacus_env.sh; abacus --version -> runs from any directory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix Binstream binary file I/O on Windows (force binary fopen mode) Binstream::Binstream/open pass the caller's fopen mode ("r"/"w"/"a") straight through. On Windows that opens in *text* mode, which translates CRLF and treats 0x1A as EOF, corrupting the binary wavefunction/charge files Binstream is built to read -> "Error in Binstream: Some data didn't be read". On POSIX "r" == "rb", so the bug is Windows-only. Binstream is always a binary stream, so append "b" to the mode when the caller omitted it. Harmless no-op on Linux. Fixes these serial 01_PW cases on the native Windows build (verified): - 056_PW_IW (init_wfc=file: read wfc from binary file) - 057_PW_SO_IW (SOC + init_wfc=file) - 075_PW_CHG_BINARY (binary charge I/O) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix uninitialized structure factor in serial bspline_sf (wrong energy) Structure_Factor::bspline_sf (nbspline>0, B-spline structure factor) scatters each real-space plane into tmpr via Parallel_Grid::zpiece_to_all, which is guarded by #ifdef __MPI. In a serial build tmpr is never filled (it is new double[nrxx], uninitialized), so real2recip(tmpr, strucFac) produces a garbage structure factor -> grossly wrong total energy, force and stress. CI never hits this path (integration tests run under MPI). Add the serial branch: fill tmpr directly using the SAME real-space layout as zpiece_to_all's serial path, rho[ir*nczp + znow] (xy outer, z innermost; nczp==nz, znow==iz when serial). Verified on tests/01_PW/032_PW_15_CF_CS_bspline (native Windows serial): energy and stress now match the reference to ~1e-8 (was ~1480 eV / 30000 kbar off); residual force ~5e-3 is B-spline interpolation + cross-platform float noise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(windows): note pw_seed cross-platform non-reproducibility (078 is not a bug) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * toolchain(windows): clarify to run abacus_env.sh inside a mingw bash Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(lcao): guard null deref of DeePKS overlap_orb_alpha when DeePKS is off before_scf() unconditionally dereferenced *(two_center_bundle_.overlap_orb_alpha) to pass it to deepks.build_overlap(). overlap_orb_alpha is only built when DeePKS is enabled (descriptor orbitals); with DeePKS off it is a null unique_ptr, so forming the reference is undefined behaviour (caught as an abort in a debug libstdc++ build; benign in release as the DeePKS stub ignores it). Guard the call on the integrator being present. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows toolchain: add LCAO + MPI (MS-MPI + ScaLAPACK) build Extend the native-Windows toolchain to the full supported configuration, mirroring build_abacus_gnu.sh: - toolchain_windows.sh: also pacman-install cereal (LCAO), msmpi (MPI), and scalapack (distributed LCAO eigensolver). Documents that the MS-MPI runtime is a separate system-wide Microsoft redistributable. - build_abacus_windows.sh: build MPI + LCAO by default (abacus_basic_para.exe); ENABLE_MPI / ENABLE_LCAO env toggles select serial / PW-only. Point FindMPI at the MinGW MS-MPI import lib; ScaLAPACK is found automatically when ENABLE_MPI. abacus_env.sh now also exports OPENBLAS_NUM_THREADS=1 (required so OpenBLAS's multithread buffer allocator does not fail under multiple MPI ranks). - docs/advanced/install_windows_native.md: document the LCAO+MPI build, parallel testing (mpiexec / mpirun shim), and the known serial gamma-only LCAO bug (use the MPI build, which is correct to ~1e-11 even on a single rank). Validated against 01_PW / 02_NAO_Gamma / 03_NAO_multik via the standard harness: under MPI all three pass within the cross-platform error range; residual differences are float noise at strict absolute thresholds, gauge-dependent outputs, or excluded features (SCAN/meta-GGA needs LibXC, DFT+U needs MPI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * toolchain(windows): make the unmodified test harness drive MS-MPI Running tests/integrate/Autotest.sh directly failed with "no mpirun found": MS-MPI ships only mpiexec, and the harness invokes `mpirun -np N`. Three Windows-specific gaps, all fixed in build_abacus_windows.sh so the standard harness works unchanged: * mpirun shim. The build now drops an `mpirun`->`mpiexec` shim next to the binary (on PATH via abacus_env.sh). MS-MPI's `-n`/`-np <N> <prog>` syntax matches what the harness passes, so forwarding args is enough. * OpenBLAS thread pinning. MSYS2's OpenBLAS is OpenMP-threaded (links libgomp), so OMP_NUM_THREADS -- not OPENBLAS_NUM_THREADS -- caps its threads. Autotest sets OMP_NUM_THREADS=nproc/np, so each rank spawned a multithreaded BLAS, the ranks oversubscribed the cores, and OpenBLAS's buffer allocator died ("Memory allocation still failed after 10 retries"). The shim and abacus_env.sh now pin OMP_NUM_THREADS=1 (ABACUS is built USE_OPENMP=OFF, so parallelism is MPI; the BLAS pin costs nothing). * DLL bundling. mpiexec does not propagate PATH to child ranks when stdout is redirected to a file (as the harness does), so the child abacus.exe failed to load libopenblas/libfftw3/libscalapack ("error while loading shared libraries"). The build now copies the dependent MinGW/OpenBLAS/FFTW/ScaLAPACK DLLs next to abacus.exe; Windows searches the application directory before PATH, making the binary self-contained. Verified end to end with the default invocation `bash Autotest.sh -a abacus` (np=4, via the shim): 01_PW/001, 02_NAO_Gamma/scf_afm (gamma-only LCAO), and 03_NAO_multik/scf_pp_upf201 all pass. Corrects the earlier docs/notes that cited OPENBLAS_NUM_THREADS and a hand-made shim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * toolchain(windows): add MS-MPI Bin (MSMPI_BIN) to PATH in abacus_env.sh The mpirun shim died with `exec: mpiexec: not found`: MSYS2's MinGW shell does not inherit the Windows PATH, and MS-MPI's mpiexec.exe lives in its own Bin dir (only msmpi.dll is in System32). The MSMPI_BIN env var (set by the MS-MPI installer) *is* inherited, so abacus_env.sh now prepends `cygpath -u "$MSMPI_BIN"` to PATH, making both `mpiexec` and the shim resolve. Verified from a minimal PATH: which mpiexec/mpirun both resolve and 01_PW/001 passes via the default harness invocation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: restore Linux link of FFT_CPU<float> and harden parse_expression Two issues from code review of the Windows-port commits: 1. FFT_CPU<float> undefined references on Linux (regression). The port removed __attribute__((weak)) from the FFT virtuals (it left null vtable slots on PE/MinGW and crashed). But the real FFT_CPU<float> methods live in fft_cpu_float.cpp, which is compiled only when ENABLE_FLOAT_FFTW=ON. With weak gone and float off (the Linux default), the FFT_CPU<float> vtable -- still emitted wherever the class is constructed (FFT_Bundle) -- referenced undefined symbols: undefined reference to `ModuleBase::FFT_CPU<float>::setupFFT()' ... Provide trivial FFT_CPU<float> method definitions in the always-compiled fft_cpu.cpp, guarded by `#if !defined(__ENABLE_FLOAT_FFTW)`, so every vtable slot is valid on any ABI without weak and without pulling in libfftw3f. The float CPU path stays unreachable at runtime (FFT_Bundle::setupFFT WARNING_QUITs for single/mixing CPU FFT unless the macro is set). When the macro is on, the stubs are excluded and fft_cpu_float.cpp supplies the real definitions -- no duplicate symbols. Verified by linking the float vtable TU against fft_cpu.o in both macro states (off: links via stubs; on: links via fft_cpu_float.o), and that dropping both reproduces the reported errors. 2. parse_expression (input_conv.h) could push indeterminate values into vec. If std::regex_search found no match, sub_str stayed empty and was parsed anyway; in the non-multiplication branch `T occ` was uninitialized and the `convert >> occ` extraction was unchecked. Now: a no-match token is an input error (WARNING_QUIT), occ is value-initialized, and a failed extraction fails fast. Consistent with the other expression parsers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fft: make the weak-vtable trick Windows-safe without touching Linux code Rework the FFT_CPU<float> vtable handling so Linux builds byte-for-byte as upstream and only Windows gets a delta. My earlier port had (a) removed __attribute__((weak)) outright and (b) added trivial float stubs in fft_cpu.cpp -- both changed working Linux core code, and (b) didn't even reach targets that compile fft_bundle.cpp without linking fft_cpu.cpp (e.g. MODULE_HAMILT_XCTest_VXC), so Linux still failed to link: undefined reference to `ModuleBase::FFT_CPU<float>::setupFFT()' ... Root cause: the upstream virtuals are __attribute__((weak)) so the ELF linker nulls the unused FFT_CPU<float> vtable slots when ENABLE_FLOAT_FFTW is off. MinGW/PE has no equivalent -- weak template members there collide ("multiple definition") or leave null slots that crash on dispatch (verified both empirically with g++). Fix, keeping Linux untouched: * Introduce ABACUS_FFT_WEAK = __attribute__((weak)) on non-Windows, empty on _WIN32, and use it in place of the raw attribute in fft_base.h / fft_cpu.h. Preprocessing with -U_WIN32 reproduces the upstream headers exactly (14 weak attrs, no extra defs); fft_cpu.cpp is reverted to pristine. * On Windows the empty macro makes the slots ordinary symbols; the build already sets ENABLE_FLOAT_FFTW=ON, so fft_cpu_float.cpp supplies the real FFT_CPU<float> methods. The non-pure FFT_BASE<T> virtuals (which had no body, relying on weak) get trivial bodies in a `#if defined(_WIN32)` block -- never executed (abstract base; backends override what they use). This block is compiled only on Windows. Verified with MinGW g++: constructing FFT_CPU<float> and dispatching through its vtable links (no multiple-definition, no undefined base/derived refs) and runs (no null-vtable crash); and the Linux-simulated preprocess output matches upstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * toolchain(windows): cap default build parallelism by available RAM The Windows build defaulted to -j nproc. On a 20-core box, 20 concurrent -O3 compilations of heavy template TUs (source_cell/module_symmetry/symmetry.cpp, read_pp_upf201.cpp, ...) exhausted memory and ninja died with "cc1plus.exe: out of memory allocating N bytes" -- even with 31 GB RAM. Default -j is now min(nproc, MemTotalGB / 3) (~3 GB budget per job), read from /proc/meminfo; an explicit -j still overrides, and the chosen value is printed with a hint to lower it if cc1plus runs out of memory. Falls back to nproc if /proc/meminfo is unreadable. Not a code issue -- the sources compiled fine up to the OOM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(windows): remove install_windows_native.md This was a working note for the native-Windows build trial, not reference documentation for the repository. Drop it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add optional DFT-D4 support * Docs and tests * Install dftd4 from toolchain in GitHub test * Fix stress calculation * Add regtest * Add D4S model * Add citations

…EMM (deepmodeling#7395) * perf(gint): shape-exact bucketing + tile ladder + wide-LDS vbatched GEMM Optimize the GPU gint batched-GEMM path (gemm_{nn,tn}_vbatch, driven from phi_mul_phi / phi_mul_dm) for FP64 on V100/A100-class GPUs. - phi_operator_gpu: replace the single max-shape vbatch launch with shape-exact bucketing. Atom pairs are grouped by (nw1, nw2) via a dense NW_MAX*NW_MAX counting-sort table, pre-enumerated once per batch in set_bgrid_batch, so each bucket hands the kernel a scalar (m, n, k) and the tile ladder picks the tightest tile per shape -- no cross-species tile waste, no over-launched blocks. A guard aborts if any atom nw >= NW_MAX. - dgemm_vbatch: scalar (m, n, k) dispatch (drops the per-batchid M/N/K device arrays) feeding a 4x2 (NN) / 4x4 (TN) BLK_{M,N} ladder over {8,16,32,48}. - gemm_{nn,tn}_vbatch: K-inner shared-memory layout + wide (double2/float4) LDS inner loop -- one 16-byte LDS feeds VK FMAs per (m,n); PAD keeps the shmem stride 16-byte aligned and warp access bank-conflict-free. C accumulators stay double regardless of input type T, preserving the mixed-precision fp64-accumulator fix (deepmodeling#7368); the phi_operator kernel optimizations from deepmodeling#7366 (WantPhi dispatch, single-warp reduce) are retained. FP64 15-case GPU benchmark: end-to-end ~1.05x (A800) / ~1.04x (V100), with cal_gint_vl up to ~1.5x and cal_gint_rho up to ~1.65x; energies and pressures match develop to ~1e-10 on every case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(gint): derive shape-bucket stride from ucell.nwmax, drop hardcoded NW_MAX The (nw1, nw2) shape-bucketing in phi_mul_phi / phi_mul_dm flattened pairs into a dense table key via `nw1 * NW_MAX + nw2`, with NW_MAX a hardcoded 64. That was both a magic number and an artificial ceiling: a basis with nw > 64 would abort(), and 64 was only a guess at the real max. The true upper bound is already known to the code as ucell.nwmax (max orbital count over all atom types), exposed via gint_gpu_vars_->nwmax. Use it: set nw_stride_ = nwmax + 1 once in the ctor so the bucket table is sized exactly to the basis -- no cap to maintain. A runtime stride can't index std::array<int, NW_MAX*NW_MAX>, so the three counting-sort tables (counts / base / cursor) move to mutable std::vector members allocated once and re-zeroed per call. For typical nwmax~25 that's ~676 ints vs the old fixed 4096, so the hot path zeroes less and never reallocates. The set_bgrid_batch() abort guard becomes a structurally-unreachable assert, since nwmax is by definition the largest nw. Drop now-unused includes (<array>, <cstdio>, <cstdlib>); add <cassert>. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(gint): clarify GEMM kernel comments, hoist shape-bucket struct Follow-up cleanup on the shape-exact vbatched GEMM path. No behavior change. - gemm_{nn,tn}_vbatch, dgemm_vbatch, gint_helper: rewrite the kernel comments to describe the actual mechanism (K-inner shared-memory layout, wide vector loads feeding VK FMAs per load, the tile ladder, fp64 cross-item accumulation) and drop the internal "V1/V3/Phase" development shorthand that carried no meaning outside the original work log. - phi_operator_gpu: the local `Bucket` struct was declared identically inside both phi_mul_phi and phi_mul_dm. Hoist it to a named GemmShapeBucket type and reuse a single buckets_ member vector (cleared, not reallocated) across both, reserved once in the ctor -- one less per-call heap allocation on the hot path. - phi_operator_gpu: pair_scratch_offset_ is fully overwritten in Pass 1 before Pass 2 reads it, so resize() it instead of assign(..., -1); the -1 sentinel was never observed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Refactor: move exx files * fix TO_STRING --------- Co-authored-by: linpz <linpz@mail.ustc.edu.cn> Co-authored-by: PeizeLin <78645006+PeizeLin@users.noreply.github.com>

…al_optimization Merge Rearrange_data into Grid_integral_optimization

…ptimization merge OPENMP to Grid_integral_optimization

Cstandardlib and others added 30 commits January 21, 2026 10:53

Update init_wfc message to be user-friendly (deepmodeling#6876)

c76692f

Fix compile warning of iteration index(int) undefined behavior (deepm…

b497e7b

…odeling#6874)

[Tests]Fix md_nsteps in 04_FF (50DP&101NEP) (deepmodeling#6872)

c82406b

* Increase md_nstep from 3 to 4 * Increase md_nstep from 3 to 4 in INPUT file

change col_major to row_major (deepmodeling#6885)

175ad10

Refactor wavefunction initialization in lcao_others (deepmodeling#6887)

2182aca

Update PEXSI_VERSION to use GitHub source (deepmodeling#6894)

c1c0d01

Feature: add Hessian operator <\phi|\nabla_x\nabla_y|\phi> (deepmodel…

0c8b6dc

…ing#6888) * Feature: add Hessian operator <\phi|\nabla_x\nabla_y|\phi> * fix: UT of twocenterintegral --------- Co-authored-by: dyzheng <zhengdy@bjaisi.com>

refactor(gint): remove obsolete temp_gint directory (deepmodeling#6899)

272d6b4

The gint_gpu_vars.h file already exists in the kernel directory. This temp_gint directory was left over from a previous refactoring. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Refactor&Doc: simplify read_atoms.cpp and update stru.md doc (deepmod…

3561573

…eling#6903)

Fix: enhance several warning messages (deepmodeling#6908)

e9dc0b7

* Fix: Add override to Pot_ML_EXX::cal_v_eff to avoid compilation warning. * Fix: Provide a clearer, friendlier error when ML KEDF is used without ENABLE_MLALGO. * Fix: Add validation for out_elf and spin=4 combo.

Fix: Minor fixes for EXX PW (deepmodeling#6877)

25b3125

* Fix header * Fix header

Feature&Doc: support fix_axes and fix_ibrav for relax_new=false and o…

4ed50cb

…ptimized doc (deepmodeling#6858) * Feature&Doc: support fix_axes and fix_ibrav for relax_new=false * Fix: UT error of fixed_axes

Feature: support ELF with non-collinear spin (nspin = 4) (deepmodelin…

d28f7ca

…g#6912) * Feature: support ELF with non-collinear spin (nspin = 4) * fix: UT for write_elf

Fix: Hefei-NAMD interface of file syns_nao.csr and parameter cal_syns (…

55b847b

…deepmodeling#6910) * Fix: Hefei-NAMD interface of file syns_nao.csr and parameter cal_syns * refactor: esolver to ctrl_scf_lcao

Replace goto with while loop in occupy.cpp (deepmodeling#6919)

c243e11

Fix: Robust handling of bandgap boundary conditions in ElecState and …

95f6450

…add unit tests (deepmodeling#6917) * add change * add numrial fast * add check for cal_bandgap Boundary * Revert "add change" This reverts commit 78444e9. * add back mlago * add back mlago format

Refactor: rename to delete new from names of operators in source_lcao (…

db59dcc

…deepmodeling#6920) * Refactor: rename to delete new from names of operators in source_lcao * Fix: stress of nonlocal * Fix: complining error --------- Co-authored-by: dyzheng <zhengdy@bjaisi.com>

Docs: Fix doc and warning for init_chg, nspin=1 (deepmodeling#6929)

77d4f65

* Fix docs for init_chg * Fix warning for init_chg * Align format for pseudo warning

Cstandardlib and others added 30 commits May 18, 2026 14:56

fix(dsp): correct dspDestoryHandle to use cluster ID (MY_RANK % dsp_c…

8e50659

…ount) (deepmodeling#7357) dspInitHandle uses MY_RANK % dsp_count but dspDestoryHandle used raw MY_RANK, causing heap corruption when MY_RANK >= dsp_count. Fixes issue deepmodeling#7269.

Modify headers in source_basis to lower the dependency's depth (deepm…

f09748a

…odeling#7364) * Remove useless headers * add type.h include * Remove headers in test files

[Refactor] Remove parameter.h is some files in source_basis (deepmode…

e99d344

…ling#7365) * Remove parameter.h * Continue remove parameter.h * Remove parameter.h dependency in pw_basis * Remove dependency in pw_basis_k

Add the USE_KML option and change abstol/orfac for KML (deepmodeling#…

280e16f

…7370)

Refactor OpenMP directives for clarity (deepmodeling#7374)

b0bca19

Toolchain: Allow controling parallel jobs for ABACUS build after tool…

78fc55c

…chain setup (deepmodeling#7376)

Update version to v3.11.0-beta.3. (deepmodeling#7397)

29d37f4

进行了数据重排分块优化

ed5b212

Fix: use correct \psi for GPU in cal_tau for ELF (deepmodeling#7378)

d33c34f

* Fix: use correct PSI for GPU in cal_tau for ELF * Refactor: split long cast into readable lines in after_scf

Refactor: Remove obsolete cross-device copy constructor in HamiltPW (…

46fe015

…deepmodeling#7418) * Remove obsolete cross-device copy constructor in HamiltPW * Delete corresponding .h code

relax the parameter constraints of device DSP to allow bndpar+kpar (d…

435c09b

…eepmodeling#7420)

Add optional DFT-D4 support (deepmodeling#7380)

69a663f

* Add optional DFT-D4 support * Docs and tests * Install dftd4 from toolchain in GitHub test * Fix stress calculation * Add regtest * Add D4S model * Add citations

Refactor: move exx files (deepmodeling#7352)

d83621a

* Refactor: move exx files * fix TO_STRING --------- Co-authored-by: linpz <linpz@mail.ustc.edu.cn> Co-authored-by: PeizeLin <78645006+PeizeLin@users.noreply.github.com>

Merge remote-tracking branch 'origin/Rearrange_data' into Grid_integr…

db50223

…al_optimization Merge Rearrange_data into Grid_integral_optimization

Merge remote-tracking branch 'origin/MPI_OPENMP' into Grid_integral_o…

399d42b

…ptimization merge OPENMP to Grid_integral_optimization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

格点积分优化#371

格点积分优化#371
Auxiliarycirclefzy wants to merge 472 commits into
abacusmodeling:developfrom
Auxiliarycirclefzy:Grid_integral_optimization

Auxiliarycirclefzy commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Auxiliarycirclefzy commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants