...one of the most highly
regarded and expertly designed C++ library projects in the
world.

— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards

#include <boost/math/tools/estrin.hpp>

namespace boost { namespace math { namespace tools { // Advanced interface: Use if you can preallocate a scratch buffer of size (coeffs.size() +1)/2: // The scratch buffer size is *unchecked* in release compiles, so use at your own risk! template<typename RandomAccessContainer1, typename RandomAccessContainer2, typename RealOrComplex> inline auto evaluate_polynomial_estrin(RandomAccessContainer1 const & coeffs, RandomAccessContainer2& scratch, RealOrComplex z); // Template specialization for std::array, no preallocation is necessary so massive performance improvements are typically observed: template <typename RealOrComplex1, size_t n, typename RealOrComplex2> inline RealOrComplex2 evaluate_polynomial_estrin(const std::array<RealOrComplex1, n> &coeffs, RealOrComplex2 z); }}} // namespaces

Boost.math provided free functions which evaluate polynomials by Estrin's method.
Though Estrin's method is not optimal from the standpoint of minimizing arithmetic
operations (that claim goes to Horner's method), it nonetheless is well-suited
to SIMD pipelines on modern CPUs. For example, on an 2022 M1 Pro, evaluating
a double precision polynomial of length N using Estrin's method with scratch
space takes 0.28 N nanoseconds for large N, whereas Horner's method takes 1.24
N ns. If you know your polynomial coefficients at compile time and can store
them in a `std::array`

, then Estrin's method is unconditionally
faster. If the coefficients are computed at runtime, then only for N roughly
greater than 90 is Estrin's method better. These numbers are highly dependent
on compiler flags and architecture; ensure the compiler is allowed to emit
vector instructions and fmas to take full advantage of the benefits of Estrin's
method.

To measure the performance on your system, refer to the google benchmark file
`reporting/performance/estrin_performance.cpp`

.

Note that Estrin's method is less accurate that Horner's method. Refer to
`example/estrin_vs_horner_accuracy.cpp`

for details.

- Muller, Jean-Michel (2005). Elementary Functions: Algorithms and Implementation (2nd ed.). Birkhäuser. p. 58. ISBN 0-8176-4372-9.