Expressive Vector Engine – SIMD in C++
74 points by klaussilveira 5 days ago | 32 comments
  • vblanco 2 days ago |
    Interesting library, but i see it falls back into what happens to almost all SIMD libraries, which is that they hardcode the vector target completely and you cant mix/match feature levels within a build. The documentation recommends writing your kernels into DLLs and dynamic-loading them which is a huge mess https://jfalcou.github.io/eve/multiarch.html

    Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects, which lets you branch at runtime between simd levels as you wish. I find its a far better way of doing things if you actually want to ship the simd code to users.

    • spacechild1 2 days ago |
      Thanks, that's an important caveat!

      > Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects

      That's pretty cool because you can write function templates and instantiate different versions that you can select at runtime.

      • vblanco 2 days ago |
        Yeah thts the fun of it, you create your kernel/function so that the simd level is a template parameter, and then you can use simple branching like:

        if(supports<avx512>){ myAlgo<avx512>(); } else{ myAlgo<avx>(); }

        Ive also used it for benchmarking to see if my code scales to different simd widths well and its a huge help

        • dyaroshev 2 days ago |
          FYI: You don't want to do this. `supports<avx512>` is an expensive check. You really want to put this check in a static.
          • spacechild1 a day ago |
            I guess this was just pseudo-code. Of course you don't want to do a runtime feature check over and over again.
    • kookamamie 2 days ago |
      100% agreed. This is the main reason ISPC is my go-to tool for explicit vectorization.
    • janwas 2 days ago |
      +1, dynamic dispatch is important. Our Highway library has extensive support for this.

      Detailed intro by kfjahnke here: https://github.com/kfjahnke/zimt/blob/multi_isa/examples/mul...

    • vlovich123 2 days ago |
      Since you seem knowledgeable about this, what does this do differently from other SIMD libraries like xsimd / highway? Is it the addition of algorithms similar to the STD library that are explicitly SIMD optimized?
      • dyaroshev 2 days ago |
        The algorithms I tried to make as good as I knew how. Maybe 95% there. Nice tail handling. A lot of things supported. I like or interface over other alternatives, but I'm biased here. Really massive math library.
    • dyaroshev 2 days ago |
      Our answer to this - is dynamic dispatch. If you want to have multiple version of the same kernel compiled - compile multiple dlls.

      The big problem here is: ODR violations. We really didn't want to do the xsimd thing of forcing the user to pass an arch everywhere.

      Also that kinda defeats the purpose of "simd portability" - any code with avx2 can't work for an arm platform.

      eve just works everywhere.

      Example: https://godbolt.org/z/bEGd7Tnb3

      • janwas 2 days ago |
        It is possible to avoid ODR violations :) We put the per-target code into unique namespaces, and export a function pointer to them.
        • dyaroshev 2 days ago |
          You can do many thing with macros and inline namespaces but I believe they run into problems when modules come into play. Can you compile the same code twice, with different flags with modules?
          • janwas a day ago |
            We use pragma target instead of compiler flags :)
            • dyaroshev a day ago |
              I don't think we understand each other.

              We want to take one function and compile it twice:

              ``` namespace MEGA_MACRO {

              void foo(std::span<int> s) { super_awesome_platform_specific_thing(s); }

              } // namespace MEGA_MACRO ```

              Whatever you do - the code above has to be written once but compiled twice. In one file/in many files - doesn't matter.

              My point is - I don't think you can compile that code twice if you support modules.

              • janwas a day ago |
                I think I do understand, this is exactly what we do. (MEGA_MACRO == HWY_NAMESPACE)

                Then we have a table of function pointers to &AVX2::foo, &AVX3::foo etc. As long as the module exports one single thing, which either calls into or exports this table, I do not see how it is incompatible with building your project using modules enabled?

                (The way we compile the code twice is to re-include our source file, taking care that only the SIMD parts are actually seen by the compiler, and stuff like the module exports would only be compiled once.)

                • dyaroshev a day ago |
                  > is to re-include our source file

                  Yeah - that means your source file is never a module. We would really like eve to be modularized, the CI times are unbearable.

                  I'd love to be proven wrong here, that'd be amazing. But I don't think google highway can be modularized.

                  • janwas a day ago |
                    What leads you to that conclusion? It is still possible to use #include in module implementations. We can use that to make the module implementation look like your example.

                    Thus it ought to be possible, though I have not yet tried it.

                    • dyaroshev a day ago |
                      Well.

                      You have a file, something like: load.h

                      You need to include it multiple times, compiled with different flags.

                      So - it's never going to be in load.cxx or whatever that's called.

                      • janwas a day ago |
                        As mentioned ("re-include our source file"), we are indeed able to put the SIMD code, as well as the self-#include of itself, in a load.cxx TU.

                        Here is an example: https://github.com/google/gemma.cpp/blob/9dfe2a76be63bcfe679...

                        • dyaroshev 21 hours ago |
                          I don't think this works if your files are modules.

                          Let's stop here, it doesn't seem like we understand each other.

  • nickpsecurity 2 days ago |
    I also found this looking for portable SIMD:

    https://github.com/google/highway

  • shadowpho 2 days ago |
    Wait what about AMD? They only claim support for intel and arm
    • Sadiinso 2 days ago |
      « AMD » is x86
    • dyaroshev 2 days ago |
      AMD we support pretty well. I tested Zen1 and a bit Zen4
  • Conscat 2 days ago |
    EVE is personally my favorite SIMD library in any programming language. It's the only one I've tried that provides masked lane operations in a declarative style, aside from SPMD languages like CUDA or OpenMP. The [] syntax for that is admittedly pretty exotic C++, but I think the usefulness of the feature is worth it. I wish the documentation was better, though. When I first started, I struggled to figure out how to simply make a 4-lane float vector that I can pass into shaders, because almost all of the examples are written for the "wide" native-SIMD size.
  • dyaroshev 2 days ago |
    Hi!

    Thanks for your interest in the library.

    Here is a godbolt example: https://godbolt.org/z/bEGd7Tnb3 Here is a bunch of simple examples: https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...

    I personally think we have the following strenghs:

    * Algorithms. Writing SIMD loops is very hard. We give you a lot of ready to go loops. (find, search, remove, set_intersection to name a few). * zip and SOA support out of the box. * High quality codegen. I haven't seen other libraries care about unrolling/aligning data accesses - meanwhile these give you substantial improvements. * Supporting more than transform/reduce. We have really decent compress implemented for sse/avx/neon implemented for example.

    The following weaknesses:

    * We don't support runtime sized sve/rvv (only fixed size). We tried really hard, but unfortunately just the C++ language refuses to play ball there. Here is a discussion about that https://stackoverflow.com/questions/73210512/arm-sve-wrappin...

    If this is something you need we recommend compiling a few dynamic libraries with the correct fixed lengths. Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.

    * Runtime dispatch based on arch.

    We again recommend dlls for this. The problem here is ODR. I believe there is a solution based on preprocessor and namespaces I could use but it breaks as soon as modules become a thing. So - in the module world - we don't have an option. I'm happy for suggestions.

    * No MSVC support

    C++20 and MSVC is still not a thing enough. And each new version breaks something that was already working. Sad times.

    * Just tricky to get started.

    I don't know what to do about that. I'm happy to just write examples for people. If you wanna try a library - please create an issue/discussion or smth - I'm happy to take some time and try to solve your case.

    We talked about the library at CppCon: https://youtu.be/WZGNCPBMInI?si=buFteQB1e1vXRT5M

    If you want to learn how SIMD algorithms work, here are a couple of talks I gave: https://youtu.be/PHZRTv3erlA?si=b87DBYMDskvzYcq1 https://youtu.be/vGcH40rkLdA?si=WL2e5gYQ7pSie9bd

    Feel free to ask any questions.

    • janwas a day ago |
      > Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.

      I'm curious what you mean by 'variadics', and what exactly you find difficult?

      People new to Highway are often surprised by the d/tag argument to loads that say whether to load half/full vector, or no more than 4 elements, etc. The key is to understand these are just zero-sized structs used for type information, and are not the actual vector/data. After that, I observe introductory workshop participants are able to get started/productive quickly.

      • dyaroshev a day ago |
        I struggle to read the highway documentation, it focuses on things that are unrelated to me. So sorry if I'm wrong.

        Let me write the std::ranges code and ask you to write them with highway.

        https://godbolt.org/z/3s1b8P3sj

        PS: this is how it looks in eve: https://godbolt.org/z/Kzxqqdrez

        • janwas 17 minutes ago |
          Thanks for sharing :) Any thoughts on what kind of things you are looking for and didn't find?

          I cannot recall anyone saying this kind of thing is a bottleneck for them. We don't use std::range, but searching for a negative value can look like: https://gcc.godbolt.org/z/8bbb16Eea

          It looks like smaller codegen than EVE's https://godbolt.org/z/fEn9r175v?

  • thrtythreeforty 2 days ago |
    This library's eve::soa_vector is the first attempt I've seen at dealing with the "SOA problem," which is that if you write good, parallel-friendly code, all your types go to hell and never come back because the language can't express concepts like "my object is made from element 7 of each of these 6 pointers." Instead you write really FORTRAN-looking array processing code with no types or methods in sight.

    Does anyone know of other libraries that help a C++ programmer deal with struct-of-arrays?

    • dyaroshev a day ago |
      cppcast talked about soagen https://cppcast.com/soagen/ I didn't look into it too much.
      • thrtythreeforty 20 hours ago |
        Thank you!