Efficient Low-Scaling Calculation of THC-SOS-LR-CC2 and THC-SOS-ADC(2) Excitation Energies Through Density-Based Integral-Direct Tensor Hypercontraction
Abstract
In recent years, rapid improvements in computer hardware, as well as theoretical and algorithmic advances have enabled the calculation of ever larger systems in computational chemistry. In this avenue, we present efficient implementations of the scaled opposite-spin (SOS) second-order approximate coupled cluster (CC2) method and the closely related second-order algebraic diagrammatic construction (ADC(2)) method. Our implementations applies the least-squares tensor hypercontraction (THC) approximation, for which a new density-based integral-direct reformulation of the grid-projection of the electron integral tensor is presented. Together with screening based on local Cholesky orbitals stemming from the decomposition of the one-particle densities (CDD) in the Laplace integration and optimized block-sparse linear algebra, effectively scaling variants of linear-response (LR) SOS-CC2 and SOS-ADC(2) are obtained. The derived CDD-THC-SOS-LR-CC2/ADC(2) methods are shown to be capable of targeting excitation energies for systems with up to ∼1000 atoms and ∼10,000 basis functions on a single compute node.
Affiliations: † Chair of Theoretical Chemistry, Department of Chemistry, 9183University of Munich (LMU), D-81377 Munich, Germany; ‡ Max Planck Institute for Solid State Research, D-70569 Stuttgart, Germany
License: © 2025 The Authors. Published by American Chemical Society CC BY 4.0 This article is licensed under CC-BY 4.0
Article links: DOI: 10.1021/acs.jctc.5c00230 | PubMed: 40353457 | PMC: PMC12120923
Relevance: Relevant: mentioned in keywords or abstract
Full text: PDF (3.4 MB)
Introduction
An accurate description of electronic excited states of chemical systems is crucial for the useful interplay of theory and experiment,ref. ref1 where spectroscopy measures transitions between different quantum states, covering electronic, vibrational, or rotational excitations. Among the spectroscopic methods probing transitions between the ground and electronically excited states, UV–vis spectroscopy, which involves the absorption and emission of photons in the ultraviolet (UV) and visible (vis) regions, and X-ray absorption spectroscopy (XAS), which involves the excitation of core electrons, are two of the most widely used methods.ref2−ref3ref4 Thus, in order to relate the experimentally observed absorption/emission bands to electronic transitions, accurate methods to compute vertical excitation energies are required.
While exact excitation energies canin the limit of the Born–Oppenheimer approximation and a finite basis setbe obtained through the full configuration interaction (FCI) method,ref5,ref6 its exponentially scaling cost renders it inapplicable for all but the smallest systems. Therefore, considerable effort has been put into the formulation of approximate methods, also in combination with reduced scaling techniques to reach ever larger system sizes, preferably without sacrificing accuracy. Here, considerable theoretical as well as algorithmic advances have been made in the past decades.ref6−ref7ref8 The former include reformulations of many of the commonly encountered methods in quantum chemistry under the paradigm of response theory, i.e., the quasidegenerate second-order perturbation corrected configuration interaction singles (CIS(D∞)),ref. ref9 time-dependent density functional theory (TDDFT),ref10−ref11ref12 the family of complete active space self-consistent field (CASSCF)ref13−ref14ref15 methods, algebraic diagrammatic construction (ADC)ref7,ref16−ref17ref18ref19 methods, as well as linear-response coupled cluster (LR-CC)ref20−ref21ref22ref23ref24 theory. Here also approximate CC models were introduced, such as the commonly used approximate CC singles and doubles (CC2).ref. ref25 Without further approximations, CC2 and the closely related second-order ADC (ADC(2)) method exhibit quintic scaling, which necessitates additional algorithmic improvements in order for the methods to be applicable to larger systems. While conceptually simple, the scaled opposite-spin (SOS) approximation by Jung et al. achieves a significant reduction of the scaling prefactor by complete neglect of the expensive same-spin terms, while at the same time largely retaining the accuracy.ref26−ref27ref28ref29ref30 Due to the required transformation of the electron repulsion integral (ERI) tensor into the molecular orbital (MO) basis, the scaling remains as . However, a reduction of the scaling exponent can be achieved by using a factorized form of the ERI tensor, such as the resolution-of-the-identity (RI)ref29,ref31−ref32ref33ref34 approximation or the Cholesky decompositionref35,ref36 in conjunction with the Laplace transformationref29,ref37,ref38 to obtain a separable form of the orbital energy denominator. Recently, the groups of Ochsenfeld and Dreuwref39,ref40 introduced atomic orbital (AO) based formulations of SOS-CC2 and SOS-ADC(2), which achieve subquadratic to linear scaling through a combination of the RI approximation with an attenuated Coulomb metric (ω-RI)ref. ref41 and Cholesky decomposed densities (CDD).ref41−ref42ref43ref44ref45ref46 To achieve even further reduction of both the memory requirements and the number of required floating point operations (FLOP), tensor hypercontraction (THC) by Martínez and co-workersref47−ref48ref49ref50 can be applied, which entirely circumvents the necessity to store and contract third- (or higher-) order tensors. THC variants of CC methods have previously been reported for CC2,ref. ref51 CCSD,ref52−ref53ref54 and CCSD(T)ref. ref55 ground state energies as well as for excitation energies based on equation-of-motion (EOM) CC2.ref. ref56
In this work, we propose a density-matrix-based and integral-direct approach for obtaining the THC-factorized ERI tensor, which adds only a small overhead when applied to an electron correlation method. In this regard, SOS-LR-CC2 and SOS-ADC(2) are ideal methods for the application of THC, since they (1) require the AO-ERI tensor to be repeatedly transformed into different MO subspaces, which can efficiently be done by matrix–matrix multiplications, and (2) only include opposite-spin (OS) contributions, for which THC allows to reformulate the expressions to only use matrix linear algebra without the occurrence of any tensors higher than second order. By combining THC with local Cholesky pseudo-MOs from the CDD approach in the Laplace integration and block-sparse linear algebra, effectively scaling formulations of SOS-LR-CC2 and SOS-ADC(2) are obtained. The proposed excited state methods are benchmarked using a set of medium- to large-size systems to assess the scaling of the error with respect to the system size. The accuracy of the ground state energies is analyzed by considering an additional set previously used by DiStasio et al.ref. ref57 for relative energies. Finally, the efficiency of the proposed methods is demonstrated for nucleic acid double helices up to ∼1000 atoms and ∼10,000 basis functions, which reveals overall scaling.
Theory
Notation
Throughout this work, we employ the following notation:
- μ, ν, λ, σ: atomic orbital indices belonging to the AO basis {χμ} of size N bf.
- α, β, γ, δ: auxiliary function basis indices belonging to the density fitting basis {χα} of size N aux (usually N aux ≈ 3·N bf).
- P, Q, R, S: auxiliary function basis indices belonging to the THC basis; in LS-THC these are equivalent to grid points belonging to the LS-THC grid of size Ngrid (usually N grid ≈ 3·N aux).
- i, j, k: occupied molecular orbital indices belonging to the MO basis {ϕi} of size Nocc.
- a, b, c: virtual molecular orbital indices belonging to the MO basis {ϕa} of size Nvirt (N virt ≫ N occ).
- : occupied local Cholesky orbitals basis of size N occ; obtained via pivoted Cholesky decomposition of the occupied one-electron density.
- p, q, r, s: general orbital indices.
- τ: root of the Laplace quadrature with N τ integration points (usually 5 ≤ N τ ≤ 10 is sufficiently accurate).
Integral-Direct Tensor Hypercontraction
Basics of Tensor Hypercontraction
In its most general form, tensor hypercontraction (THC) is a low-dimensional representation of a multidimensional tensor, whichin the context of quantum mechanicsrepresents the interactions between particles in a system. For a two-body potential , said representation of the electron–electron interactions in a real one-particle basis is the integral tensor, the elements of which are given by
However, in practical quantum chemistry calculations, the manipulation and storage of such a high-dimensional tensor quickly becomes computationally intractable as the system or the basis set size grows. To alleviate this issue, THC provides the means to formally compress the fourth-order integral tensor into 5 second-order tensors as
The factorization is exact if the number of THC auxiliary functions is at least N bf(N bf + 1)/2,ref. ref58 but only reduces computational demands and storage requirements if it is significantly smaller. If sufficient accuracy is reached with significantly less than N bf 2 THC auxiliary functions, the THC factorization enables efficient storage and manipulation of the ERI tensor, making complex calculations feasible for larger systems.
In the least-squares variant of THC (LS-THC),ref. ref48 the THC auxiliary indices are taken to be grid points of a molecular grid similar to the ones commonly used in density function theory. For LS-THC the time-determining step of factorizing the ERI tensor into the THC format is the quintic scaling grid-projection of the ERI tensor, which in the AO basis is given by
where the so-called collocation matrices X are simply the AO basis functions χμ evaluated at the THC grid nodes r P scaled by the node’s weight w P, given as
In the general MO formulation, the above equation becomes
which requires a transformation of the AO ERI tensor into the MO basis. From E, the final Z tensor in eq eq2 is obtained as
where S –1 is the inverse of the THC grid metric, i.e., the inverse of
Note that eq eq6 can be solved either by direct inversion of the grid metric,ref. ref48 which generally requires pseudo-inversion due to its rank-deficiency, or by solving the associated system of linear equations.ref59,ref60
Density-Based Integral-Direct Tensor Hypercontraction
By undoing the AO-to-MO transformations in eq eq5 , i.e.,
and by summing up the MO indices first, the expression can be reformulated in terms of general one-particle density matrices D as
whichwhen contracting the densities with the THC X matricesbecomes
To highlight the universality of this approach, we will use the intermediate as a proxy for both the collocation matrix in the AO basis and when transformed with a general density matrix, i.e.,
Therefore, the expressions for the AO-THC and the MO-THC variant only differ in an additional contraction of the THC X matrices with a density matrix. Consequently, the same routines can be used for the construction of intermediate E in both the AO and an arbitrary MO basis. The latter is particularly convenient since for many correlation methods different kinds of integrals, e.g., (oo|vo), (vo|vo), and (vv|vo) are required. Furthermore, the above expression permits an integral-direct formulation, as outlined in the following, which avoids the prohibitive storage requirements of the full ERI tensor before the transformation into the grid basis. The key idea is that the contraction of the ket side of the ERI tensor can be viewed as N grid Coulomb matrix builds with slices of the joint collocation tensor R, defined as R λσ Q = X λ Q X σ Q, acting as the density matrix. Intermediate E in the AO basis from eq eq3 can then exemplarily be formed according to algorithm 1.
We note here, that for a memory efficient implementation the joint collocation tensor R should not be constructed explicitly. Instead, the required tensor slices R (Q) can be constructed on-the-fly as a vector outer product of all elements of the X tensor belonging to the given grid point Q, i.e., .
Together with the idea to reformulate the MO-THC equations in a density-based manner, the Coulomb matrix-based approach permits an efficient and simultaneous formation of all intermediates required for the integrals occurring in CC2/ADC(2) (see Section sec2.3.1 ) according to algorithm 2.
The key ingredient for an efficient implementation is to perform the expensive formation of the J intermediate once in the ov space, since all integral types, i.e., (oo|vo), (vo|vo), and (vv|vo), share this as a common ket. We note here, that most routines for the construction of Coulomb-type matrices assume the density to be symmetric. This is not the case for slices of the joint collocation tensor R (vo), which is why the transpose is added in line 7 and the resulting matrix is scaled by a factor of 1/2. Based on the resulting J intermediate, the final intermediates E for all integral types can be formed simultaneously without further integral evaluations. We also note that the frozen-core approximation can easily be included in this formulation by simply using the frozen-core density matrix for the occupied space.
To lower the formal scaling behavior, the resolution-of-the-identity approximation (RI)ref31−ref32ref33ref34 can be inserted into eq eq10 , which allows to perform the grid-projection of the bra and the ket side of the ERI tensor separately at reduced scaling. Inserting the RI approximation into eq eq10 leads to
where V is the two-center RI integral tensor. Intermediate Y is given by
and represents one side of the grid-projected ERI tensor. Like for the E intermediate, the formally quartic scaling formation of the Y intermediate can be done in an integral-direct fashion. The final factorization then becomes
with Γ, representing one-half of the Z tensor, defined as
Instead of employing routines for the construction of Coulomb matrices, for the Y intermediate existing routines for the contraction of density matrices with three-center RI integrals can be used. Among these routines, variants optimized for the contraction of multiple densities, such as the J-engine approach to SOS-RI-MP2 by Maurer et al.ref. ref61 or the RI-J implementation by Kussmann et al.ref. ref62 are employed since N grid Coulomb matrix builds need to be performed for each Y intermediate. While these algorithms provide performance improvements over a naïve implementation, in which the Coulomb matrix kernel is simply invoked N grid times, an optimized integral kernel for this kind of contraction is certainly favorable as outlined below.
Efficient Integral-Direct Algorithm for the Y Intermediate
Instead of relying on repetitive J-engine or RI-J based evaluations of the Coulomb potential for many density-like matrices, a more efficient algorithm is proposed inspired by our previous optimal-batching schemeref. ref63 for evaluating correlation energies on the random phase approximation (RPA) level of theory. In the integral-direct variant shown in algorithm 3, the necessary 3-center-2-electron (3c2e) integrals (μ′ν′|α) and the vector outer products R are computed on-the-fly during the formation of the Y intermediate removing the unfavorable memory complexity associated with storing the full third-order tensors. The following discussion is exemplariliy carried out for the virtual-occupied subspace but is applicable to all required Γ intermediates.
The naïve application of the optimization scheme of ref ref. ref63 would predict minimal batch-sizes for the function-pair index μ′ν′ and the auxiliary basis function index α but maximal batch-sizes for the quadrature point index P, since the more points P are included per batch, the more often any computed given 3c2e integral (μ′ν′|α) can be reused for the computation of Y. In practice, however, there are diminishing returns beyond 1000 P points per batch, so the batch size is rounded to the nearest power of 2, i.e., 1024 P points per batch. The batch sizes for α and μ′ν′ are chosen as 96, which is as small as possible, while still allowing for an efficient execution of the matrix–matrix multiplications within the formation of Y in line 17. In practice, the precise batch sizes of each batch slightly deviate from 96 to match a multiple of the number of functions/function-pairs for the respective l-quantum numbers of the shells/shell-pairs within a batch, because the 3c2e integral evaluation is most efficient if operating on full shell-triplets. This approach leads to batches with ∼10,000 elements for (μ′ν′|α) and ∼100,000 elements for , making the algorithm cache-friendly, i.e., all necessary quantities can be stored in temporary static random access memory (SRAM) storage (cache) close to the processing units, even for large systems. Moreover, we always aim for multiples of 16 for the innermost loops and order each participating tensor such that the leading index matches the innermost loop. Both design decisions improve the efficiency of memory accesses (cache-lines) and are very favorable for single instruction multiple data (SIMD) vector execution.
In particular, the 3c2e integrals are stored with the auxiliary-shell index as the leading index, so that the 3c2e integral evaluation can make optimal use of SIMD vector routines parallelizing over shell-triplets, i.e., each SIMD-thread handles a separate shell-triplet. For the most efficient use of SIMD vector routines, the auxiliary basis set is considered fully uncontracted, since varying numbers of primitive Gaussian basis functions per shell would otherwise interfere with vectorization, which requires an identical workload for each thread to be efficient. In practice, this is of little concern, since the auxiliary basis sets used for RI-fitting of electron correlation energies (e.g., the Dunning RI basis setsref64−ref65ref66 employed in Section sec4 ) are usually completelyor at least mostlyuncontracted already. In addition, the transformation from Cartesian to pure (spherical harmonics) basis functions is not performed at the 3-center integral level, instead the whole Y-build is carried out in the Cartesian basis so that only input and output need to be transformed, avoiding any transformations of third-order tensors.
The 3c2e integrals are evaluated using symbolically optimized Obara–Saikaref67,ref68 recursion relations, similar to the integral kernels used for the 3-center-1-electron (3c1e) integrals within our semi-numerical exchange method (sn-LinK),ref69−ref70ref71 but adjusted for 2-electron integrals, i.e., by including recurrence relations for both the AO shell-pair and the auxiliary shell. That is, for any given l-quantum number combination the recursion relations are fully expanded symbolically for each final primitive Cartesian integral within the shell-triplet until each integral is solely expressed in terms of primitive Boys integrals. Subsequently the entire set of equations is symbolically optimized by removing redundant subexpressions within the shell-triplet using common-subexpression-elimination (CSE) as provided by the SymPyref. ref72 package.
Overall, algorithm 3 formally scales as and the matrix–matrix multiplication for the formation of Y in line 17 is by far the slowest step. This formal scaling is easily reduced to asymptotically by exploiting the sparsity of the function-pairs μ′ν′, i.e., only function-pairs belonging to significantly overlapping shell-pairs according to the Schwarz integral are considered. In addition, the batch-wise nature of the algorithm is also straightforward to OpenMP parallelize over multiple CPU cores using both the P-batch as well as the μ′ν′ batch index for parallelization. While P-batch execution is already embarrassingly parallel, parallelization over μ′ν′ requires special treatment of the race-condition associated with accumulation over that index. In practice, the workload is organized such that each thread accumulates as many μ′ν′-batches as possible in a thread-private buffer, so the serial (OpenMP critical section; mutually exclusive between threads) accumulation to the global Y needs to be performed as rarely as possible.
THC-CC2 and THC-ADC(2) for Ground and Excited States
Basics of CC2 and ADC(2)
The SOS-CC2 ground state energy in the THC approximation is defined as
where is the T1-similarity-transformed Hamiltonian and T 2 os is the two-electron excitation operator acting on two electrons with different spins.ref27,ref29 The cluster amplitudes are determined by solving the CC equations, defined by
The Fock matrices are computed using the integral-direct RI-Jref. ref73 and sn-LinKref69−ref70ref71 kernels to form the Coulomb and exchange terms with high efficiency. The explicit expressions are provided in the literature.ref29,ref74 The OS scaling factor is set to c os = 1.3, as for SOS-MP2.ref27,ref29 The solution of eqs eq16 –eq18 scales as for the MO-based THC-SOS-CC2 formulation.ref. ref56 However, the MO energies in the denominator of the double amplitudes can be decoupled using Laplace integration
to rewrite eq eq18 as
The THC matrices in eqs eq16 –eq20 are formed using the transformation matrices
according to
The single amplitudes are obtained by inserting eq eq20 into eq eq17
The working equations of the intermediates in eqs eq25 –eq27 are provided in the Supporting Information. The evaluation of the MO-THC-SOS-CC2 correction to the ground state energy is then achieved with negligible computational cost as
which can also be used to get the related MO-THC-SOS-MP2 energy correction by simply setting t μ1 to zero.
Beyond ground state energies, the SOS-LR-CC2 excitation energies are obtained as eigenvalues of the Jacobian matrix, which is defined as the derivative of the vector functions (eqs eq17 and eq18) with respect to the cluster amplitudesref. ref25 and is given by
where the diagonal double–double block (A μ2ν2 ) is equal to ϵaibj = ϵa – ϵi + ϵb – ϵj. From eq eq30 , secular matrices of simpler excited states methods are derived. The SOS-CIS(D∞) approach introduced by Head-Gordon et al.ref. ref9 can be derived from SOS-LR-CC2 theory by setting the singles part of the ground state cluster amplitudes t 1 to zero, resulting in a vanishing contribution of the T1-similarity transformed Hamiltonian. The SOS-CIS(D∞) Jacobian matrix is given by
where T 2 os denotes the MP2 double amplitudes. In addition, the secular matrix for the SOS-ADC(2) method is related to eq eq31 by the symmetrizationref29,ref30,ref75
As discussed by Krauter et al.,ref. ref30 it is important to derive the SOS-ADC(2) secular matrix from the SOS-CC2 Jabobian in order to reduce the dimension of the block. In this work, the full CISD(D∞) matrix is not symmetrized but only the block, i.e., the non-Hermiticity of the coupling blocks is retained and only the block is scaled by the factor c os.ref. ref29 Using the diagonal form of the double–double block, the doubles part of the excitation vector is obtained as
and the eigenvalue problem of SOS-LR-CC2 and SOS-ADC(2) is solved in the single excitations manifoldref29,ref74 given by
where the effective A matrix is Hermitian for ADC(2) and σ denotes the matrix-vector product.
In order to solve the nonlinear equation part of eq eq34 , the excitation energies and eigenvectors R μ1 have to be found iteratively until self-consistency is reached. If the initial guess of the eigenvector and eigenvalue is close enough to the final results, a common choice for the solution of the nonlinear problem is an algorithm based on the direct inversion in the iterative subspace (DIIS) technique,ref36,ref74,ref76 which was shown to be stable and rapidly convergent.ref29,ref36 In addition, it has the advantage of being a single root algorithm, thus allowing to aim for high-lying excited states without converging all lower-lying states. On the contrary, if the initial guess is far from the converged CC2 or ADC(2) result, the eigenvalue and the eigenvector must be pre-optimized using an alternative algorithm. For this purpose, we use a modification of the Davidson algorithm,ref74,ref77 leveraging the fact that a set of converged eigenvalues and -vectors will fulfill the linear generalized eigenproblem
Both the DIIS and the Davidson procedure for CC2 and ADC(2) are described in literatureref29,ref36,ref74,ref77 and will not be discussed here further. The time-determining step for the solution of the eigenvalue problem is the formation of the matrix-vector product
Making use of the Laplace transformation of the energy denominator, the double part of the singlet excitation vector can be rewritten as
for which the state-specific THC matrices using the transformation matrices
are given according to
Inserting eq eq38 into eq eq36 , reduced scaling expressions for the most time-consuming contributions can be formulated as
where the contained intermediates are given by
The contribution σI,(1) is computed as
with
As for the ground state, the Fock-like terms
are not reformulated using the THC factorization. Here it is noted that by applying the THC decomposition to eq eq37 , a similar expression for the SOS-ADC(2) term is obtained. In fact, the contributions σ G,(2) and σ H,(2) are obtained from eqs eq42 and eq43, respectively, by simply setting t μ1 amplitudes to zero. The contribution σ I,(2) is computed according to
where
and is computed as in eq eq49 by setting and . The explicit expressions for E and in eqs eq36 and eq37, as well as the working equations for the MO-based matrix-vector product are provided in the Supporting Information.
Reformulation of the Ground State Equations
As demonstrated in our previous work on RI-SOS-CC2,ref. ref39 the equations for the ground state energy can be reformulated in a local basis and hence the intermediates can be expressed in terms of the occupied and virtual one-electron densities
in order to benefit from the locality of the electronic structure. Despite decreasing the scaling with respect to the system size for the evaluation of most intermediates, the use of the AO basis increases the scaling with respect to the basis set size for a given system. To counteract this, the Cholesky decomposition of the occupied ground state density matrix with complete pivotingref78,ref79
is applied. In addition, the idempotency relation of the occupied and virtual pseudodensity matrices P τ and Q τ
is used.
For the solution of the SOS-CC2 equations, another set of asymmetric one-electron density matrices is generated from the T1-transformed coefficients:
To obtain an expression for the single amplitudes in the AO basis, eq eq17 in the MO basis is back-transformed to the AO basis according to
for which the one-electron density matrices in eqs eq54 –eq63 permit to rewrite Ωμ1 as
Since the X matrices are sparse due to the locality of the basis functions, the contraction with the T1-transformed one-electron densities can be performed in linear time asref39,ref60,ref70
The working equations in the Cholesky basis are provided in Table together with the asymptotic computational scaling of each step.
1: Working Equations of the Intermediates for the Solution of the CDD-THC-SOS-CC2/MP2 Equations
| Intermediates | Asymptotic Scaling | |
|---|---|---|
| (a) | ||
| (b) | ||
| (c) | ||
| (d) | ||
| (e) | ||
| (f) | ||
| (e′) | ||
| (f′) | ||
| (g) | ||
| (h) | ||
| (i) | ||
| (j) | ||
| (k) | ||
| (l) | ||
| (m) | ||
| (n) |
a Throughout this table Einstein’s summation convention is used.
The order of contractions follows the general ideas presented in ref ref. ref80. Naturally, to minimize the number of FLOP, common intermediates, such as , , and should be reused as much as possible. For this, first all possible X tensors are contracted over the orbital index. For the subsequent formation of the intermediates two possible routes have to be considered. In order to minimize the number of FLOP, the Γ-factorized form of the THC Z tensor should be used and contracted in a total of four matrix–matrix multiplications in steps (d) and (e). However, the fact that the one-electron density becomes sparse for sufficiently large systems with a significant HOMO–LUMO gap can be used here, to perform the step in operations. The first contraction of Z with intermediate in steps (d′) and (f′) scales quadratically, since is sparse, whereas Z is not. For the multiplication with the second Z matrix the fact that the resulting matrix is Schur multiplied with either the or the matrix in steps (i) and (j) can be leveraged. Since intermediate is sparse, the Schur product with intermediate will only contain elements which are significant in . Therefore, only the elements in , which are significant in need to be computed for step (j) and likewise for step (i) with intermediate . In total, this enables the formation of the expensive intermediates in time. All final contractions in steps (k)–(n) are at most quadratically scaling, which results in overall quadratic scaling for the entire CDD-THC-SOS-CC2 method if Z is used. Notice that the same strategy can be leveraged when using Γ resulting in a further reduction of the computational effort. However, the formal computational complexity is not decreased because only the final matrix–matrix multiplication of steps (e) and (f) would scale quadratically. Therefore, the overall scaling behavior for the entire CDD-THC-SOS-CC2 method is cubic if Γ is used.
Reformulation of the Excited State Equations
The strategy outlined in the previous section can likewise be applied to the matrix-vector products of SOS-LR-CC2 and SOS-ADC(2). For simplicity, we will only discuss the reformulation of eq eq36 according to
The fourth and fifth term on the right-hand side are rewritten as
where the intermediates E, given by
depend only on ground state quantities and hence are computed once and stored on disk. The terms σ G,(1), σ H,(1) and σ I,(1) are given as
with
The matrices and , defined in eqs eq44 and eq45, are computed as shown in Table . The general approach for the order of contractions and the reduction of the scaling is similar to the ideas presented for Table . The state-specific THC matrices are given by
where the densities and are defined as
contain the information about the electronic transition. As discussed in Section sec2.3.1 , the SOS-ADC(2) matrix-vector product is easily obtained by setting the single amplitudes to zero and symmetrizing the intermediates in eqs eq75 and eq76. Notice that, eq eq51 can be reformulated as
2: Working Equations for the Evaluation of the σG and σH Contributions to the CDD-THC-SOS-ADC(2)/LR-CC2 Matrix-Vector Product
| Intermediates | Asymptotic Scaling | |
|---|---|---|
| (a) | ||
| (b) | ||
| (c) | ||
| (d) | ||
| (e) | ||
| (f) | ||
| (e′) | ||
| (f′) | ||
| (g) | ||
| (h) | ||
| (i) | ||
| (j) | ||
| (k) | ||
| (l) | ||
| (m) | ||
| (n) | ||
| (o) | ||
| (p) |
a Throughout this table Einstein’s summation convention is used. The ADC(2) equations are obtained by setting t μ1 = 0.
In order to minimize the number of FLOP, common intermediates, such as , , should be reused as much as possible. For this, first all possible X tensors are contracted over the orbital index. The fact that the one-electron density and transition density become sparse can be used here to form the intermediates with scaling behavior. As discussed in Section sec2.3.2 , it is possible to carry out the computation of the intermediates using two possible routes. To obtain the theoretical quadratic scaling behavior in the asymptotic limit, the THC Z tensor can be used and contracted in a total of two matrix–matrix multiplications in steps (e′ and f′) of Table . The first contraction of Z with intermediate in steps (e′ and f′) scales quadratically, since is significantly sparse whereas Z is not. For the multiplication with the second Z matrix the fact that the resulting matrix is Schur multiplied with either the or the matrix in steps (i and j) can be leveraged. Since intermediate is sparse, the Schur product with intermediate will only contain elements that are significant in . Therefore, only the elements in , which are significant in need to be computed for step (j) and likewise for step (i) with intermediate . In total, this enables the formation of the expensive intermediates in time. In order to minimize the number of FLOP, the Γ-factorized form of the THC Z tensor should be used and contracted in a total of four matrix–matrix multiplications in steps (d)–(f) of Table . Leveraging the fact that the , , and intermediates are significantly sparse (for large systems and local electronic excitations) steps (d)-(f) in Table effectively scale as in the asymptotic limit.
The ground state intermediates are recomputed as in Table for each new matrix-vector product and used in steps (k) and (n) of Table . By leveraging the sparsity of in step (k) and in step (n), it is possible to compute with quadratic time complexity. All contractions in steps (i)-(p) are at most linear scaling, which results in overall asymptotic quadratic scaling for the entire CDD-THC-SOS-LR-CC2/ADC(2) methods.
Computational Details
The presented CDD-THC-SOS-LR-CC2 and CDD-THC-SOS-ADC(2) methods as well as the MO-based variants are implemented in the FermiONs++ ref81−ref82ref83 program. The program was compiled with the Intel C/C++ Compiler 2022.0.2 and linked against the Intel Math Kernel Library 2022.0.2 for the employed matrix algebra. All developed code wasas far as possibleparallelized with OpenMP.ref. ref84 The underlying Hartree–Fock calculations have been converged to a maximum element of the error matrix in the DIIS procedure below 10–7. In the SCF, the Coulomb matrix is evaluated using the RI-J approach by Kussmann et al.ref. ref73 using the cc-pVDZ-JKfit or cc-pVTZ-JKfitref. ref85 RI-J basis set, respectively. For the exchange matrix, the seminumerical linear scaling exchange method by Laqua et al.ref70,ref71 is used. For the solution of the least-squares equations in THC the pivoted Cholesky decomposition ansatz by Matthewsref. ref59 is used. The hand-optimized grid for the cc-pVTZ basis set by Kokkila Schumacher et al.ref. ref86 is used as a parent grid together with a pruning threshold of 10–10, unless otherwise noted. As demonstrated in the Supporting Information, it is pivotal to reorder the grid points after the pruning, since the pivoting in the Cholesky decomposition significantly reduces the sparsity of intermediates based on the pruned collocation matrices. The computational performance of the CDD-THC-SOS-LR-CC2 and CDD-THC-SOS-ADC(2) methods is improved using sparse linear algebra. The current implementation leverages block-sparse (BS) matrices, which divide the matrices into smaller blocks of maximum size 96 × 96. Further information about the BS matrix implementation is provided in the Supporting Information of ref ref. ref39. The thresholds controlling the storage sparsity (ϑa) and the matrix–matrix multiply sparsity (ϑm) are set to 10–7 and 10–9, respectively. The threshold controlling the storage sparsity of the transition density matrix is set to 10–4. The single cluster amplitudes and ground state energy at the SOS-CC2 level are optimized via the DIIS procedure, which terminates when the L2-norm of the single vector function is lower than 10–5 and the variation in the energy is lower than 10–6. For excited states calculations, the trial excitation vectors and energies are first optimized at the ADC(1)/CCS level via the Davidson procedure, until the L2-norm of residuals and the variation in the eigenvalues are below 10–3. Then, SOS-LR-CC2 or SOS-ADC(2) excitation vectors and energies are optimized using the DIIS algorithm, which terminates when the L2-norm of the residuals are lower than 10–5 and the variations in the eigenvalues are lower than 10–6. It is important to note that the ADC(1)/CCS trial vectors and eigenvalues can be pre-optimized via the Davidson procedure at the CC2 or ADC(2) level if the DIIS procedure fails to converge to the correct root. Optimized minimax grids with 7 quadrature points for the Laplace expansion are used for both ground and excited states calculations. Moreover, the frozen-core approximation is used in the THC fitting as well as in the ground state and excited states calculations. Notice that calculations are carried out using steps (d)–(f) in Tables and tbl2. Throughout, the Dunning cc-pVXZ (X ∈ {D,T}) basis setsref64,ref65 are used together with their corresponding RI basis sets.ref. ref66 All calculations are performed using a compute node with two AMD EPYC 7452 32-Core 2.35 GHz CPUs (64 cores in total), 1 TB of RAM, and 24 TB of disk space. All runtimes are reported as wall times, not CPU times.
Results
Accuracy
The present work aims to increase the efficiency of the SOS-ADC(2) and SOS-LR-CC2 methods to extend their applicability to molecules with several hundreds of atoms while retaining accuracy as far as possible. To assess the accuracy of the ground state implementation, the errors produced by the MO-based as well as by the CDD-based THC-SOS-MP2/CC2 methods are investigated with respect to MO-RI-SOS-MP2/CC2. For this purpose, the THC error ΔE THC is defined as |E MO‑RI‑SOS‑MP2/CC2 – E MO‑THC‑SOS‑MP2/CC2|, i.e., measuring the error introduced by the THC approximation alone, whereas the CDD error ΔE CDD is defined as |E MO‑THC‑SOS‑MP2/CC2 – E CDD‑THC‑SOS‑MP2/CC2|, i.e., measuring the error introduced by CDD reformulation and the associated use of sparse matrix algebra. For the assessment of the accuracy for excitation energies, the definition of the ΔE THC error and the ΔE CDD error is analogous.
THC Error
Table summarizes the THC errors in terms of the mean absolute deviations (MAD), the maximum absolute errors (MAX), and the root-mean-square deviations (RMSD) for ground state calculations on three different benchmark sets: (1) the Thiel benchmark set,ref. ref87 (2) the benchmark set used by Hohenstein et al.ref. ref56 in their work on THC-EOM-CC2, and (3) a benchmark set comprised of 27 alanine tetrapeptide conformers.ref. ref57 For these sets of small and medium sized molecules, the CDD errors are negligible, cf. Section S5 of the Supporting Information containing detailed results for the three different benchmark sets.
3: Mean Absolute Errors for the Three Different Benchmark Sets in refs , and
| ΔE THC/eV | |||
|---|---|---|---|
| SOS-MP2 | SOS-CC2 | ||
| Thielref. ref87 | |||
| [Abs. Energy] | |||
| MAD | <0.001 | 0.006 | |
| MAX | 0.001 | 0.021 | |
| RMSD | <0.001 | 0.008 | |
| Hohensteinref. ref56 | |||
| [Abs. Energy] | |||
| MAD | 0.002 | 0.024 | |
| MAX | 0.004 | 0.055 | |
| RMSD | 0.002 | 0.029 | |
| Alanine tetrapeptidesref. ref57 | |||
| [Abs. Energy] | |||
| MAD | <0.001 | 0.036 | |
| MAX | 0.001 | 0.039 | |
| RMSD | <0.001 | 0.036 | |
| [Rel. Energy] | |||
| MAD | <0.001 | 0.004 | |
| MAX | <0.001 | 0.043 | |
| RMSD | <0.001 | 0.012 | |
a Errors are reported as mean absolute deviations (MAD), maximum absolute errors (MAX), and root mean square deviations (RMSD) as obtained by the MO-THC-SOS-MP2/CC2 methods relative to the MO-RI-SOS-MP2/CC2 results. All calculations are performed with the cc-pVTZ/cc-pVTZ-RI basis set combination.
Regarding the accuracy of the THC approximation in MP2, MO-THC-SOS-MP2 shows only small deviations of the absolute energies on the order of ∼10–4 to 10–3 eV compared to the reference method MO-RI-SOS-MP2. For CC2, however, the use of THC in MO-THC-SOS-CC2 produces errors that are roughly 1 order of magnitude larger with a maximum error of 0.055 eV. These errors exhibit only modest error cancellation for relative energies, e.g., the RMSD for alanine tetrapeptidesref. ref57 reduces from 0.036 to 0.012 eV going from absolute to relative energies and the maximum error even slightly increases from 0.039 to 0.043 eV. The larger THC errors in CC2 compared to MP2 are due to the inclusion of (ov|vv)-type integrals in the THC fitting as these are necessary for the iterative solution of the CC2 equations, whereas MP2 only requires (ov|ov)-type integrals. Fitting the much larger (ov|vv) subspace with the THC gridsoriginally optimized for MP2ref. ref86is substantially more challenging which explains the observed increase in the THC error going from MP2 to CC2. Note that these findings regarding larger THC errors for CC2 compared to MP2 are mostly in line with previous work,ref51,ref52 albeit somewhat more pronounced due to our more comprehensive collection of benchmark sets and the use of the larger triple-ζ basis sets (cc-pVTZ/cc-pVTZ-RI) in contrast to the double-ζ basis sets used previously. Nevertheless, since the errors are below ∼0.1 eV, they are considered to be acceptable in most applications and are still substantially smaller than the method error of CC2 itself.
Going beyond ground state energies, the applicability of THC-SOS-CC2/ADC(2) for excitation energies is assessed by comparing MO-THC-SOS-LR-CC2/ADC(2) with the RI-based reference implementation. The absolute deviations are summarized in Table for excitations to the three lowest-lying singlet (S) and triplet (T) states of the molecules in the Thiel benchmark set,ref. ref87 as well as in the benchmark set used by Hohenstein et al.ref. ref56 Analogous to the situation for ground state energies, the CDD errors here are also virtually zero, due to the small size of the molecules within the considered benchmark sets. Detailed data including the error for each individual molecule is provided in Section S5 of the Supporting Information.
4: Errors for the Excitation Energies to the Three Lowest-Lying Singlet and Triplet Excited States of the Molecules From the Benchmark Sets From Refs and
| ΔE THC/eV | |||
|---|---|---|---|
| SOS-ADC(2) | SOS-LR-CC2 | ||
| Thielref. ref87 | |||
| [singlet] | |||
| MAD | 0.010 | 0.019 | |
| MAX | 0.050 | 0.055 | |
| RMSD | 0.012 | 0.023 | |
| [triplet] | |||
| MAD | 0.007 | 0.018 | |
| MAX | 0.026 | 0.052 | |
| RMSD | 0.008 | 0.021 | |
| Hohensteinref. ref56 | |||
| [singlet] | |||
| MAD | 0.014 | 0.017 | |
| MAX | 0.041 | 0.067 | |
| RMSD | 0.007 | 0.023 | |
| [triplet] | |||
| MAD | 0.009 | 0.014 | |
| MAX | 0.018 | 0.035 | |
| RMSD | 0.004 | 0.012 | |
a Errors are reported as mean absolute deviations (MAD), maximum absolute errors (MAX), and root mean square deviations (RMSD) from the reference energies obtained with MO-RI-SOS-LR-CC2/ADC(2). All calculations are performed with the cc-pVTZ/cc-pVTZ-RI basis set combination.
Compared to the results for ground state energies, the application of THC to SOS-ADC(2) and SOS-LR-CC2 produces errors that are larger with mean absolute errors up to ∼0.019 eV and maximum errors up to 0.067 eV. This large maximum error is exceptional, as only the excitation to the THC-SOS-LR-CC2 first singlet excited states of acridine-red from the Hohenstein benchmark set shows a deviation as large as ∼0.067 eV. Leaving this outlier, all deviations are within ∼0.055 eV, which we would consider acceptable for most applications. As expected THC-SOS-ADC(2) consistently delivers errors smaller than THC-SOS-LR-CC2. Moreover, triplet excitations are described more accurately compared to singlet excitations, with the largest observed deviation being 0.026 eV for THC-SOS-ADC(2) and 0.052 eV for THC-SOS-LR-CC2. Again, these errors are in line with previous work,ref. ref56 although the exceptional cases with large errors were not seen before and are probably a consequence of the more comprehensive data sets and the use of larger triple-ζ basis sets. Unfortunately, improving the accuracy of THC beyond the limitations imposed by the predefined THC grids is not trivial and beyond the scope of this work. Instead, we rather focus on the reduced-scaling local CDD-based reformulation and its advantages compared to the nonlocal MO-based RI reference algorithms, particularly in terms of accessible molecule sizes.
CDD Error
After having assessed the intrinsic errors associated with LS-THC, we now turn our attention toward the errors introduced by the local Cholesky-MOs and the sparse matrix algebra. Since the errors are only significant for larger molecules, the assessment provided in Tables and tbl6 is carried out for a series of linear carboxylic acids (LCAn) and adenine-thymine base pair stacks (ATn).
5: Absolute ΔE THC and ΔE CDD Errors for the Ground State Energies for a Series of LCAn and ATn Molecules
| SOS-MP2 | SOS-CC2 | |||
|---|---|---|---|---|
| ΔE THC/meV | ΔE CDD/meV | ΔE THC/meV | ΔE CDD/meV | |
| LCA40 | 0.702 | 0.076 | 37.706 | 0.076 |
| LCA80 | 1.499 | 0.767 | 44.305 | 0.767 |
| LCA120 | 2.214 | 1.550 | 51.582 | 1.610 |
| AT1 | 2.511 | 0.007 | 48.738 | 0.002 |
| AT2 | 9.928 | 0.180 | 117.397 | 0.183 |
| AT4 | 21.910 | 0.117 | 233.018 | 0.167 |
a All calculations are performed with the (CDD-)THC-based reformulations of the SOS-MP2 and SOS-CC2 methods and employ the cc-pVTZ/cc-pVTZ-RI basis set combination.
6: Absolute ΔE THC and ΔE CDD Errors for the Excitation Energies to the Lowest-Lying Singlet Excited State for a Series of LCAn Molecules
| SOS-ADC(2) | SOS-LR-CC2 | |||
|---|---|---|---|---|
| ΔE THC/meV | ΔE CDD/meV | ΔE THC/meV | ΔE CDD/meV | |
| LCA40 | 13.393 | 0.002 | 70.689 | 0.021 |
| LCA80 | 12.854 | 0.025 | 64.477 | 0.280 |
| LCA120 | 12.731 | 0.028 | 53.472 | 0.230 |
a All calculations are performed with the (CDD-)THC-based reformulations of the SOS-ADC(2) and SOS-LR-CC2 methods and employ the cc-pVTZ/cc-pVTZ-RI basis set combination.
For the LCAn model systems, the CDD error increases linearly with the system size for ground state calculations using the CDD-THC-based versions of SOS-MP2 or SOS-CC2, as shown in Table . In passing, we note that the THC error also increases linearly with the system size, matching observations in previous work.ref. ref48 Moreover, the CDD error is almost identical between the CDD-THC-SOS-MP2 and the CDD-THC-SOS-CC2 method, whereas the THC error is increased going from SOS-MP2 to SOS-CC2, as discussed in the previous section. Overall, it is demonstrated that the CDD error is on the same order as the THC error only for SOS-MP2 energies of the LCAn series, while in the remaining cases its contribution to the total CDD-THC error is negligible with a maximal error of 1.6 meV.
Regarding the accuracy for excitation energies, Table demonstrates almost constant CDD errors as well as THC errors for the targeted excited states in the LCAn series beyond LCA80. This behavior is expected since the investigated electronic excitations are localized on the carboxylic acid group. This is unlike the ATn series, in which the locality of the excitation is not preserved for different chain lengths. Overall, the CDD errors are still smaller than the THC errors and are not expected to pose significant challenges in practical applications. Moreover, the CDD error can always be converged to the desired accuracy using tighter thresholds.
Considering the demonstrated influence of the application of the LS-THC approximation together with the CDD approach on the accuracy in Tables and tbl6, the focus of the remainder of this work is on the computational improvements gained thereby in terms of runtime, asymptotic scaling behavior, and memory requirements.
Computational Efficiency
The sparsity of the ground state one-electron density is closely related to the energy gap between the highest occupied (HOMO) and the lowest unoccupied MO (LUMO). Given a significant HOMO–LUMO gap, asymptotically linear-scaling variants of many common quantum chemistry methods have been proposed.ref88,ref89 In addition, for excited state calculations the transition density R with elements R μν in eqs eq82 and eq83 must be considered, for which the sparsity is related to the locality of the excitation. In order to assess the computational complexity of the newly proposed density-based grid-projection algorithm for LS-THC and the resulting CDD-THC-SOS-LR-CC2 and CDD-THC-SOS-ADC(2) methods, a series of LCAn molecules is selected as a best-case scenario. LCAn are particularly suitable for the demonstration of the asymptotic time complexity, due to the strong locality of their electronic ground state and first singlet excited state (S1), which is mostly localized on the carboxyl group. In addition, the time complexity of the presented methods is discussed for three-dimensional systems, here representatively DNA fragments in the ATn series.
Integral-Direct Tensor Hypercontraction
First, the efficiency of the proposed density-based, integral-direct algorithm for the grid-projection of the three-center RI integrals, i.e., for the formation of intermediates Y in eq eq13 , is demonstrated. For this, a comparison between the previously published Cholesky MO-based natural blocking approach (nat. block.)ref. ref60 and density-based implementations based on thein FermiONs++ readily availableJ-engine (J-engine)ref. ref61 and RI-J (RI-J)ref. ref62 methods, as well as the newly proposed method in Section sec2.2.3 (this work) is drawn in Table .
7: Wall Time in Hours Required for the Formation of the Y (oo), Y (vo), and Y (vv) Intermediates Based on the Previously Published Natural Blocking (nat. block.) Approach as Well as Density-Based and Integral-Direct Using the J-Engine (J-engine), the RI-J Algorithm (RI-J), and the Proposed Algorithm (this work) From Section
| cc-pVDZ | cc-pVTZ | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| nat. block | J-engine | RI-J | this work | nat. block | J-engine | RI-J | this work | ||
| Y(oo) | |||||||||
| LCA20 | <0.1 | <0.1 | <0.1 | 0.1 | <0.1 | <0.1 | |||
| LCA40 | 0.1 | <0.1 | <0.1 | 0.5 | 0.1 | <0.1 | |||
| LCA80 | 1.2 | 0.1 | <0.1 | 4.9 | 0.3 | <0.1 | |||
| LCA160 | 8.7 | 0.3 | <0.1 | 52.8 | 1.2 | 0.4 | |||
| Y(vo) | |||||||||
| LCA20 | <0.1 | 0.1 | <0.1 | <0.1 | <0.1 | 1.0 | 0.2 | <0.1 | |
| LCA40 | <0.1 | 0.7 | 0.1 | <0.1 | 0.1 | 9.0 | 0.8 | <0.1 | |
| LCA80 | <0.1 | 9.1 | 0.5 | <0.1 | 0.2 | 85.2 | 3.2 | 0.3 | |
| LCA160 | 0.1 | 69.8 | 1.9 | 0.2 | 0.7 | 924.0* | 13.1 | 2.7 | |
| Y(vv) | |||||||||
| LCA20 | 0.1 | 0.1 | <0.1 | 1.3 | 0.4 | <0.1 | |||
| LCA40 | 1.3 | 0.2 | <0.1 | 11.8 | 1.5 | 0.1 | |||
| LCA80 | 14.5 | 1.0 | <0.1 | 113.8 | 5.8 | 0.4 | |||
| LCA160 | 118.0 | 3.8 | 0.3 | 1235.4* | 24.1 | 3.2 | |||
a Since the natural blocking approach was optimized for the formation of Y in the virtual-occupied subspace, only timings for Y (vo) are reported. Extrapolated values are marked with an asterisk (*).
For the natural blocking algorithm, the recommended settings from ref ref. ref60 are used, most notably, attenuated Coulomb RIref. ref41 with an attenuation strength of ω = 0.1. The latter is required to achieve sparsity in the three-center integrals, which makes this approach efficient. In contrast, all density-based approaches use the regular 1/r 12 metric. As previously reported, the natural blocking-based formation of the Y intermediate approaches linear time complexity for the largest molecule sizes and is the fastest among the approaches presented in Table , due to the efficiency of the screening in combination with the best-case sparsity of the LCAn series. In contrast, the RI-J implementation exhibits quadratic scaling due to the considerable sparsity of the R (Q) slices acting as the density matrix in the Coulomb matrix build, which is leveraged by density- and shell-pair-based screening in the RI-J implementation.ref. ref62 Both the J-engine implementationref. ref61 as well as the algorithm proposed in Section sec2.2.3 show cubic time complexity since shell-pair sparsity is leveraged while density-based screening is not employed. It is noteworthy, that, while both variants scale cubically with the system size, the J-engine based method has a significantly higher prefactor, which leads to it being slower than all other variants. In contrast, for the proposedlikewise cubically scalingapproach from Section sec2.2.3 the prefactor is low enough to render the method competitive to the natural blocking approach even for very large and untypically sparse molecules like LCAs. In turn, the proposed algorithm is especially advantageous for more realistic, less sparse systems. In order to demonstrate the effectiveness of the algorithm, the time complexity for the formation of the Y (oo), Y (vo), and Y (vv) intermediates for LCAn up to LCA160 is shown in Figure (left).

While the asymptotic time complexity is cubic as expected for all MO subspaces (oo, ov, vv), the grid-pruning technique from Matthewsref. ref59 reduces the prefactor for forming the Y (oo) intermediate significantly due to the compactness of the occupied–occupied subspace. Additionally, the strong scaling, i.e., the time-to-solution behavior for the same molecule size but increasing numbers of threads, is assessed and shown in Figure (right). Keeping the individual workload high enough on all processors is key to ideal strong scaling speedup. In this regard, good parallel efficiency and almost ideal strong scaling is observed for up to 32 threads, with only slightly diminishing returns beyond.
In summary, while the proposed grid-projection algorithm has asymptotic cubic time complexity with respect to the systems size, its prefactor is diminutive and it exhibits near-perfect strong scaling speedups. Furthermore, it is competitive to the previously published natural blocking approach while only relying on the shell-pair sparsity of the atomic orbitals, which makes it better suited for nonsparse three-dimensional systems.
THC-SOS-LR-CC2 and THC-SOS-ADC(2)
The computational complexity of the proposed CDD-THC-SOS-LR-CC2 and CDD-THC-SOS-ADC(2) implementations is investigated by taking into account the number of floating-point operations (FLOP) and wall times required to form the singles-manifold CC vector and the matrix-vector product for ground and excited state calculations, respectively. In particular, the number of FLOP and timings for the last iteration of the DIIS procedure is assessed. Note that the FLOP count and timings of LR-SOS-CC2 and SOS-ADC(2) are similar because the steps required to form the matrix-vector products are mostly identical. Figure shows the time complexity measured by the number of FLOP (top) and wall times (bottom) for the ground (left) and the excited state calculations (right) for the MO-RI-SOS-LR-CC2/ADC(2) reference implementations (black) in comparison with the proposed MO/CDD-THC-SOS-LR-CC2/ADC(2) methods (colored) for the LCAn series.

The MO-based THC-SOS-LR-CC2 and THC-SOS-ADC(2) algorithms allow for a scaling evaluation of the ground state energy and excitation energies with an effort that is ∼19 and ∼33 times smaller than the respective scaling MO-RI reference variant for the largest molecule size considered. Reformulating the equations for the SOS-CC2 single amplitudes in the local Cholesky MO basis, as described in Section sec2.3.2 and using sparse linear algebra to solve eq eq64 further decreases the computational effort to ∼58-fold. Although the formal scaling behavior is cubic due to steps (e) and (f) in Table , a scaling exponent <3 is observed for the ground state CDD-THC-SOS-CC2 method by leveraging the sparsity, as described in Section sec2.3.2 . For CDD-THC-SOS-LR-CC2 and CDD-THC-SOS-ADC(2) scaling is obtained with a ∼86-fold diminution of the effort to evaluate the matrix-vector product in eq eq72 for the largest LCA size considered. The observed scaling exponent is slightly larger than the theoretically predicted one reported in Table , due to the reduced sparsity of the density matrices within the larger triple-ζ basis. For the smaller cc-pVDZ/cc-pVDZ-RI basis set combination, the observed scaling behavior is closer to the predicted quadratic scaling with an observed time complexity of , which is shown in Section S6.1 of the Supporting Information.
A significant reduction of the computational effort is likewise achieved for three-dimensional DNA fragments with up to 12 AT pairs. On the one hand, the cubically scaling MO-based THC-SOS-CC2 and THC-SOS-LR-CC2/ADC(2) methods grant ∼16-fold and ∼26-fold reduction of the effortrelative to the original MO-RI implementationfor computing the contributions to the single amplitudes in eq eq24 and the matrix-vector product in eq eq37 , respectively, as shown in Figure . On the other hand, the CDD-THC-SOS-CC2 and CDD-THC-SOS-LR-CC2/ADC(2) reformulations allow for the solution of the singles-manifold cluster equation in eq eq64 and the calculation of the matrix-vector product of eq eq72 with a computational effort in terms of FLOP that is ∼27 times smaller than their MO-RI variant in both cases. Consequently, the reduction of the number of FLOP translates to a decrease of the wall times up to 24 times for the ground state and up to 17 times for the excitation energy calculation. Despite the observed behavior for the calculation of the excitation energies of the ATn series, the scaling exponent is expected to decrease toward ∼2 in the asymptotic limit.

Overall, employing CDD-THC-SOS-ADC(2) makes it possible to compute the excitation energy to the two lowest-lying singlet excited states of AT12 for the cc-pVDZ/cc-pVDZ-RI basis set combination in ∼7.5 days, for which the most time-consuming steps are shown in Figure with their individual contributions to the overall wall time. Most of the time is spent computing the ADC(2) excitation energies, which requires ∼72% of the total computation time, while only ∼18% of the total computation time is spent evaluating the THC Γ matrices. Within the routines to evaluate the Γ intermediates, 33% of the time is spent forming the Y intermediates using the presented integral-direct routine. Due to the differing pruned grid sizes, the calculation of the Y (vv) intermediate for the virtual–virtual subspace contributes roughly 50% of the total time for all Y intermediates. Note that the ADC(2) optimization procedure is converged with thresholds of 10–6 and 10–5 for the excitation vector and the energy, respectively. The guesses for the DIIS procedure are optimized at the ADC(1) level via Davidson optimization, which requires ∼10% of the total time. The SCF procedure, the calculation of the MP2 energy correction, and the evaluation of the ground state intermediates E from eq eq76 together requires ∼1% of the total time.

Finally, the memory and storage savings of the THC-based algorithms should be considered which extends the applicability of SOS-ADC(2) and SOS-LR-CC2 to large basis sets and much larger systems, without the need of batching the workload until the ∼1000 atoms scale is reached. Indeed, in the THC-based reformulations only second-order tensors are stored and the space complexity is at most quadratic due to the integral-direct formation of the THC fitting tensors. Moreover, within the CDD-THC-based approach, sparse linear algebra decreases the space complexity of most intermediates, further reducing the memory and storage requirements of the resulting methods. The space complexity is discussed in detail in Section S6.2 of the Supporting Information.
Conclusion
An efficient reformulation of SOS-CC2 for ground state and SOS-LR-CC2 as well as SOS-ADC(2) for excitation energies is presented. The implementation leverages the THC-factorized representation of the ERI tensor in conjunction with local Cholesky pseudo-MOs from the CDD approach together with block-sparse linear algebra for the resulting tensor contractions. For the expensive formation of the Y intermediates, which are the grid-projected three-center RI integrals, an efficient near-perfect strong-scaling integral-direct and density-based method is proposed and implemented. The presented algorithm exhibits cubic time complexity with a very low prefactor, rendering it competitive to the previously published natural blocking approach. Further reduction to quadratic or even linear time complexity would require the use of a local metric, as in the natural blocking approach. In this way, an efficient way to obtain the THC tensors for any orbital space is provided, which is leveraged by THC-based reformulations of SOS-MP2, SOS-(LR-)CC2, and SOS-ADC(2) for both ground state and excitation energy calculations.
The derived methods show reasonable accuracy for both absolute and relative ground state energies. However, due to the larger fitting space of the (ov|vv)-type integrals required in SOS-CC2, the mean absolute errors are increased from ∼10–3 to 10–2 eV compared to SOS-MP2. The LS-THC errors are more pronounced for SOS-ADC(2) and SOS-LR-CC2 excitation energies, especially for the singlet states, for which the maximum errors are as large as ∼0.055 eV. Thus, more work regarding the accuracy of the THC approximation especially for (ov|vv)-type integrals is needed in the future. One possible solution is the use of robust THC,ref. ref90 which however entails further computational complexity and is therefore not suitable for the target applications. In addition, a more comprehensive hierarchy of THC specific integration grids would enable finer grained control over the THC error so that more accurate results could be obtained for applications where this is necessary.
In contrast, the additional use of the CDD approach together with the application of sparse linear algebra, does not incur significant further errors, justifying the applicability of the presented methods. In particular, for small molecules the error introduced by the CDD approach and by sparse algebra is virtually zero as no sparsity can be exploited. For large systems the CDD algorithm proved to yield energies close to the MO-based one with an error in the range of ∼10–5 to ∼10–3 eV, which can be rigorously controlled through the associated thresholds.
Since both SOS-LR-CC2 and SOS-ADC(2) only involve Coulomb-type integral contractions, no higher than second-order tensors have to be formed and all FLOP-intensive contractions can be performed efficiently by matrix–matrix multiplications. Application of the THC approximation yields cubic scaling MO-based algorithmscontrary to the fourth-order scaling RI-based onesand grants significant speedups for the evaluation of both ground and excitation energies. For ground state calculations, leveraging Cholesky-MOs and sparse algebra further reduces the computational effort and scaling, hence providing additional speedups. However, the scaling behavior of the time-determining steps and the overall scaling of CDD-THC-SOS-MP2 and CDD-THC-SOS-CC2 is reduced to only if the Z fitting tensor is used. Contrary to ground state calculations, the use of Cholesky-MOs and sparse algebra reduces the computational scaling of CDD-THC-SOS-ADC(2)/LR-CC2 to quadratic in the asymptotic limit while decreasing the computational effort by about 25-fold for DNA fragments with approximately 800 atoms. The reformulation in the Cholesky-MOs basis and the use of sparse algebra within the LS-THC framework also allow for a significant reduction of the memory demand, which has at most quadratic space complexity. The overall demonstrated computational savings, in both FLOP and wall time required for the solution of the energy expressions, as well as the presented memory savings, render the implemented CDD-THC-based algorithms promising candidates for calculations on large molecular systems.
Supplementary Materials
References
- L. Serrano-Andrés, M. Merchán. Quantum Chemistry of the Excited State: 2005 Overview. J. Mol. Struct. Theochem, 2005. [DOI]
- Hershenson, H. Ultraviolet and Visible Absorption Spectra; Elsevier, 2012.
- Rodger, A. ; Sanders, K. Encyclopedia of Spectroscopy and Spectrometry; Elsevier, 2017; pp 495–502.
- Agarwal, B. K. X-Ray Spectroscopy: An Introduction; Springer, 2013; Vol. 15.
- Sherrill, C. D. ; Schaefer, H. F. Advances in Quantum Chemistry; Elsevier, 1999; Vol. 34, pp 143–269.
- Helgaker, T. ; Jorgensen, P. ; Olsen, J. Molecular Electronic-Structure Theory; John Wiley & Sons, 2013.
- A. Dreuw, A. Papapostolou, A. L. Dempwolff. Algebraic Diagrammatic Construction Schemes Employing the Intermediate State Formalism: Theory, Capabilities, and Interpretation. J. Phys. Chem. A, 2023. [DOI | PubMed]
- T. D. Crawford, A. Kumar, A. P. Bazanté, R. Di Remigio. Reduced-scaling coupled cluster response theory: Challenges and opportunities. Wiley Interdiscip. Rev. Comput. Mol. Sci., 2019. [DOI]
- M. Head-Gordon, M. Oumi, D. Maurice. Quasidegenerate second-order perturbation corrections to single-excitation configuration interaction. Mol. Phys., 1999. [DOI]
- Casida, M. E. Recent Advances in Density Functional Methods, 1999; Vol. 1.
- M. E. Casida, M. Huix-Rotllant. Progress in time-dependent density-functional theory. Annu. Rev. Phys. Chem., 2012. [DOI | PubMed]
- A. Dreuw, M. Head-Gordon. Single-reference ab initio methods for the calculation of excited states of large molecules. Chem. Rev., 2005. [DOI | PubMed]
- B. O. Roos, P. R. Taylor, P. E. Sigbahn. A complete active space SCF method (CASSCF) using a density matrix formulated super-CI approach. Chem. Phys., 1980. [DOI]
- B. O. Roos, K. Andersson, M. P. Fulscher, P.-A. Malmqvist, L. Serrano-Andrés, K. Pierloot, M. Merchán. Multiconfigurational perturbation theory: Applications in electronic spectroscopy. Adv. Chem. Phys., 1996. [DOI]
- K. Andersson, P. A. Malmqvist, B. O. Roos, A. J. Sadlej, K. Wolinski. Second-order perturbation theory with a CASSCF reference function. J. Phys. Chem., 1990. [DOI]
- J. Schirmer. Beyond the random-phase approximation: A new approximation scheme for the polarization propagator. Phys. Rev. A, 1982. [DOI]
- A. Dreuw, M. Wormit. The algebraic diagrammatic construction scheme for the polarization propagator for the calculation of excited states. Wiley Interdiscip. Rev. Comput. Mol. Sci., 2015. [DOI]
- F. Mertins, J. Schirmer. Algebraic propagator approaches and intermediate-state representations. I. The biorthogonal and unitary coupled-cluster methods. Phys. Rev. A, 1996. [DOI | PubMed]
- P. H. Harbach, M. Wormit, A. Dreuw. The third-order algebraic diagrammatic construction method (ADC (3)) for the polarization propagator for closed-shell molecules: Efficient implementation and benchmarking. J. Chem. Phys., 2014. [DOI | PubMed]
- H. Sekino, R. J. Bartlett. A linear response, coupled-cluster theory for excitation energy. Int. J. Quantum Chem., 1984. [DOI]
- H. Koch, P. Jørgensen. Coupled cluster response functions. J. Phys. Chem., 1990. [DOI]
- T. B. Pedersen, H. Koch. Coupled cluster response functions revisited. J. Chem. Phys., 1997. [DOI]
- K. Sneskov, O. Christiansen. Excited state coupled cluster methods. Wiley Interdiscip. Rev. Comput. Mol. Sci., 2012. [DOI]
- O. Christiansen, H. Koch, P. Jørgensen. Response functions in the CC3 iterative triple excitation model. J. Phys. Chem., 1995. [DOI]
- O. Christiansen, H. Koch, P. Jørgensen. The second-order approximate coupled cluster singles and doubles model CC2. Chem. Phys. Lett., 1995. [DOI]
- S. Grimme. Improved second-order Møller-Plesset perturbation theory by separate scaling of parallel-and antiparallel-spin pair correlation energies. J. Chem. Phys., 2003. [DOI]
- Y. Jung, R. C. Lochan, A. D. Dutoi, M. Head-Gordon. Scaled opposite-spin second order Møller-Plesset correlation energy: an economical electronic structure method. J. Chem. Phys., 2004. [DOI | PubMed]
- Y. Jung, Y. Shao, M. Head-Gordon. Fast evaluation of scaled opposite spin second-order Møller-Plesset correlation energies using auxiliary basis expansions and exploiting sparsity. J. Comput. Chem., 2007. [DOI | PubMed]
- N. O. C. Winter, C. Hättig. Scaled Opposite-Spin CC2 for Ground and Excited States with Fourth Order Scaling Computational Costs. J. Chem. Phys., 2011. [DOI | PubMed]
- C. M. Krauter, M. Pernpointner, A. Dreuw. Application of the scaled-opposite-spin approximation to algebraic diagrammatic construction schemes of second order. J. Chem. Phys., 2013. [DOI | PubMed]
- J. L. Whitten. Coulombic Potential Energy Integrals and Approximations. J. Phys. Chem., 1973. [DOI]
- B. I. Dunlap, J. W. Connolly, J. R. Sabin. On Some Approximations in Applications of Xα Theory. J. Chem. Phys., 1979. [DOI]
- M. Feyereisen, G. Fitzgerald, A. Komornicki. Use of Approximate Integrals in Ab Initio Theory. An Application in MP2 Energy Calculations. Chem. Phys. Lett., 1993. [DOI]
- K. Eichkorn, O. Treutler, H. Öhm, M. Häser, R. Ahlrichs. Auxiliary Basis Sets to Approximate Coulomb Potentials. Chem. Phys. Lett., 1995. [DOI]
- N. H. F. Beebe, J. Linderberg. Simplifications in the Generation and Transformation of Two-Electron Integrals in Molecular Calculations. Int. J. Quantum Chem., 1977. [DOI]
- P. Baudin, J. S. Marín, I. G. Cuesta, A. M. J. Sánchez de Merás. Calculation of Excitation Energies from the CC2 Linear Response Theory Using Cholesky Decomposition. J. Chem. Phys., 2014. [DOI | PubMed]
- J. Almlöf. Elimination of energy denominators in MøllerPlesset perturbation theory by a Laplace transform approach. Chem. Phys. Lett., 1991. [DOI]
- M. Häser, J. Almlöf. Laplace transform techniques in Møller–Plesset perturbation theory. J. Chem. Phys., 1992. [DOI]
- F. Sacchetta, D. Graf, H. Laqua, M. A. Ambroise, J. Kussmann, A. Dreuw, C. Ochsenfeld. An Effective Sub-Quadratic Scaling Atomic-Orbital Reformulation of the Scaled Opposite-Spin RI-CC2 Ground-State Model Using Cholesky-decomposed Densities and an Attenuated Coulomb Metric. J. Chem. Phys., 2022. [DOI | PubMed]
- M. A. Ambroise, F. Sacchetta, D. Graf, C. Ochsenfeld, A. Dreuw. Scaled Opposite-Spin Atomic-Orbital Based Algebraic Diagrammatic Construction Scheme for the Polarization Propagator with Asymptotic Linear-Scaling Effort: Theory and Implementation. J. Chem. Phys., 2023. [DOI | PubMed]
- A. Luenser, H. F. Schurkus, C. Ochsenfeld. Vanishing-Overhead Linear-Scaling Random Phase Approximation by Cholesky Decomposition and an Attenuated Coulomb-Metric. J. Chem. Theory Comput., 2017. [DOI | PubMed]
- S. A. Maurer, L. Clin, C. Ochsenfeld. Cholesky-decomposed density MP2 with density fitting: Accurate MP2 and double-hybrid DFT energies for large systems. J. Chem. Phys., 2014. [DOI | PubMed]
- M. Glasbrenner, D. Graf, C. Ochsenfeld. Efficient Reduced-Scaling Second-Order Møller-Plesset Perturbation Theory with Cholesky-Decomposed Densities and an Attenuated Coulomb Metric. J. Chem. Theory Comput., 2020. [DOI | PubMed]
- M. Beer, C. Ochsenfeld. Efficient linear-scaling calculation of response properties: Density matrix-based Laplace-transformed coupled-perturbed self-consistent field theory. J. Chem. Phys., 2008. [DOI | PubMed]
- M. Beuerle, D. Graf, H. F. Schurkus, C. Ochsenfeld. Efficient calculation of beyond RPA correlation energies in the dielectric matrix formalism. J. Chem. Phys., 2018. [DOI | PubMed]
- D. Graf, M. Beuerle, C. Ochsenfeld. Low-scaling self-consistent minimization of a density matrix based random phase approximation method in the atomic orbital space. J. Chem. Theory Comput., 2019. [DOI | PubMed]
- E. G. Hohenstein, R. M. Parrish, T. J. Martínez. Tensor hypercontraction density fitting. I. Quartic scaling second- and third-order Møller-Plesset perturbation theory. J. Chem. Phys., 2012. [DOI | PubMed]
- R. M. Parrish, E. G. Hohenstein, T. J. Martínez, C. D. Sherrill. Tensor Hypercontraction. II. Least-squares Renormalization. J. Chem. Phys., 2012. [DOI | PubMed]
- R. M. Parrish, E. G. Hohenstein, T. J. Martínez, C. D. Sherrill. Discrete Variable Representation in Electronic Structure Theory: Quadrature Grids for Least-Squares Tensor Hypercontraction. J. Chem. Phys., 2013. [DOI | PubMed]
- R. M. Parrish, E. G. Hohenstein, N. F. Schunck, C. D. Sherrill, T. J. Martínez. Exact Tensor Hypercontraction: A Universal Technique for the Resolution of Matrix Elements of Local Finite-Range N-body Potentials in Many-Body Quantum Problems. Phys. Rev. Lett., 2013. [DOI | PubMed]
- E. G. Hohenstein, S. I. L. Kokkila, R. M. Parrish, T. J. Martínez. Quartic Scaling Second-Order Approximate Coupled Cluster Singles and Doubles via Tensor Hypercontraction: THC-CC2. J. Chem. Phys., 2013. [DOI | PubMed]
- R. M. Parrish, C. D. Sherrill, E. G. Hohenstein, S. I. L. Kokkila, T. J. Martínez. Communication: Acceleration of Coupled Cluster Singles and Doubles via Orbital-Weighted Least-Squares Tensor Hypercontraction. J. Chem. Phys., 2014. [DOI | PubMed]
- R. Schutski, J. Zhao, T. M. Henderson, G. E. Scuseria. Tensor-Structured Coupled Cluster Theory. J. Chem. Phys., 2017. [DOI | PubMed]
- E. G. Hohenstein, B. S. Fales, R. M. Parrish, T. J. Martínez. Rank-Reduced Coupled-Cluster. III. Tensor Hypercontraction of the Doubles Amplitudes. J. Chem. Phys., 2022. [DOI | PubMed]
- A. Jiang, J. M. Turney, H. F. Schaefer. Tensor Hypercontraction Form of the Perturbative Triples Energy in Coupled-Cluster Theory. J. Chem. Theory Comput., 2023. [DOI | PubMed]
- E. G. Hohenstein, S. I. Kokkila, R. M. Parrish, T. J. Martínez. Tensor Hypercontraction Equation-of-Motion Second-Order Approximate Coupled Cluster: Electronic Excitation Energies in O(N4) Time. J. Phys. Chem. B, 2013. [DOI | PubMed]
- R. A. DiStasio, Y. Jung, M. Head-Gordon. A resolution-of-the-identity implementation of the local triatomics-in-molecules model for second-order Møller-Plesset perturbation theory with application to alanine tetrapeptide conformational energies. J. Chem. Theory Comput., 2005. [DOI | PubMed]
- D. A. Matthews. A Critical Analysis of Least-Squares Tensor Hypercontraction Applied to MP3. J. Chem. Phys., 2021. [DOI | PubMed]
- D. A. Matthews. Improved Grid Optimization and Fitting in Least Squares Tensor Hypercontraction. J. Chem. Theory Comput., 2020. [DOI | PubMed]
- F. H. Bangerter, M. Glasbrenner, C. Ochsenfeld. Low-Scaling Tensor Hypercontraction in the Cholesky Molecular Orbital Basis Applied to Second-Order Møller-Plesset Perturbation Theory. J. Chem. Theory Comput., 2021. [DOI | PubMed]
- S. A. Maurer, J. Kussmann, C. Ochsenfeld. Communication: A Reduced Scaling J-engine Based Reformulation of SOS-MP2 Using Graphics Processing Units. J. Chem. Phys., 2014. [DOI | PubMed]
- J. Kussmann, H. Laqua, C. Ochsenfeld. Highly Efficient Resolution-of-Identity Density Functional Theory Calculations on Central and Graphics Processing Units. J. Chem. Theory Comput., 2021. [DOI | PubMed]
- V. Drontschenko, D. Graf, H. Laqua, C. Ochsenfeld. Lagrangian-Based Minimal-Overhead Batching Scheme for the Efficient Integral-Direct Evaluation of the RPA Correlation Energy. J. Chem. Theory Comput., 2021. [DOI | PubMed]
- T. H. Dunning. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys., 1989. [DOI]
- D. E. Woon, T. H. Dunning. Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through argon. J. Chem. Phys., 1993. [DOI]
- F. Weigend, A. Köhn, C. Hättig. Efficient Use of the Correlation Consistent Basis Sets in Resolution of the Identity MP2 Calculations. J. Chem. Phys., 2002. [DOI]
- S. Obara, A. Saika. Efficient recursive computation of molecular integrals over Cartesian Gaussian functions. J. Chem. Phys., 1986. [DOI]
- S. Obara, A. Saika. General recurrence formulas for molecular integrals over Cartesian Gaussian functions. J. Chem. Phys., 1988. [DOI]
- H. Laqua, J. Kussmann, C. Ochsenfeld. Efficient and Linear-Scaling Seminumerical Method for Local Hybrid Density Functionals. J. Chem. Theory Comput., 2018. [DOI | PubMed]
- H. Laqua, T. H. Thompson, J. Kussmann, C. Ochsenfeld. Highly efficient, linear-scaling seminumerical exact-exchange method for graphic processing units. J. Chem. Theory Comput., 2020. [DOI | PubMed]
- H. Laqua, J. Kussmann, C. Ochsenfeld. Accelerating seminumerical Fock-exchange calculations using mixed single-and double-precision arithmethic. J. Chem. Phys., 2021. [DOI | PubMed]
- A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, A. Scopatz. SymPy: symbolic computing in Python. PeerJ. Computer Science, 2017. [DOI]
- J. Kussmann, H. Laqua, C. Ochsenfeld. Highly efficient resolution-of-identity density functional theory calculations on central and graphics processing units. J. Chem. Theory Comput., 2021. [DOI | PubMed]
- C. Hättig, F. Weigend. CC2 excitation energy calculations on large molecules using the resolution of the identity approximation. J. Chem. Phys., 2000. [DOI]
- C. Hättig. Structure optimizations for excited states with correlated second-order methods: CC2 and ADC(2). Adv. Quantum Chem., 2005. [DOI]
- P. Pulay. Improved SCF convergence acceleration. J. Comput. Chem., 1982. [DOI]
- D. Kats, M. Schütz. A multistate local coupled cluster CC2 response method based on the Laplace transform. J. Chem. Phys., 2009. [DOI | PubMed]
- N. J. Higham. Cholesky factorization. Wiley Interdiscip. Rev. Comput. Stat., 2009. [DOI]
- H. Harbrecht, M. Peters, R. Schneider. On the low-rank approximation by the pivoted Cholesky decomposition. Appl. Numer. Math., 2012. [DOI]
- F. H. Bangerter, M. Glasbrenner, C. Ochsenfeld. Tensor-Hypercontracted MP2 First Derivatives: Runtime and Memory Efficient Computation of Hyperfine Coupling Constants. J. Chem. Theory Comput., 2022. [DOI | PubMed]
- J. Kussmann, C. Ochsenfeld. Pre-selective screening for matrix elements in linear-scaling exact exchange calculations. J. Chem. Phys., 2013. [DOI | PubMed]
- J. Kussmann, C. Ochsenfeld. Preselective screening for linear-scaling exact exchange-gradient calculations for graphics processing units and general strong-scaling massively parallel calculations. J. Chem. Theory Comput., 2015. [DOI | PubMed]
- J. Kussmann, C. Ochsenfeld. Hybrid CPU/GPU Integral Engine for Strong-Scaling Ab Initio Methods. J. Chem. Theory Comput., 2017. [DOI | PubMed]
- L. Dagum, R. Menon. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE, 1998. [DOI]
- F. Weigend. A Fully Direct RI-HF Algorithm: Implementation, Optimised Auxiliary Basis Sets, Demonstration of Accuracy and Efficiency. Phys. Chem. Chem. Phys., 2002. [DOI]
- S. I. L. Kokkila Schumacher, E. G. Hohenstein, R. M. Parrish, L. P. Wang, T. J. Martínez. Tensor Hypercontraction Second-Order Møller-Plesset Perturbation Theory: Grid Optimization and Reaction Energies. J. Chem. Theory Comput., 2015. [DOI | PubMed]
- M. Schreiber, M. R. Silva-Junior, S. Sauer, W. Thiel. Benchmarks for electronically excited states: CASPT2, CC2, CCSD, and CC3. J. Chem. Phys., 2008. [DOI | PubMed]
- P. E. Maslen, C. Ochsenfeld, C. A. White, M. S. Lee, M. Head-Gordon. Locality and sparsity of ab initio one-particle density matrices and localized orbitals. J. Phys. Chem. A, 1998. [DOI]
- C. Ochsenfeld, J. Kussmann, D. S. Lambrecht. Linear-scaling methods in quantum chemistry. Rev. Comput. Chem., 2007. [DOI]
- K. Pierce, V. Rishi, E. F. Valeev. Robust Approximation of Tensor Networks: Application to Grid-Free Tensor Factorization of the Coulomb Interaction. J. Chem. Theory Comput., 2021. [DOI | PubMed]
