Expressive Vector Engine – SIMD in C++

74 points by klaussilveira 5 days ago | 32 comments

vblanco 2 days ago |
Interesting library, but i see it falls back into what happens to almost all SIMD libraries, which is that they hardcode the vector target completely and you cant mix/match feature levels within a build. The documentation recommends writing your kernels into DLLs and dynamic-loading them which is a huge mess https://jfalcou.github.io/eve/multiarch.html
Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects, which lets you branch at runtime between simd levels as you wish. I find its a far better way of doing things if you actually want to ship the simd code to users.
spacechild1 2 days ago |
Thanks, that's an important caveat!
> Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects
That's pretty cool because you can write function templates and instantiate different versions that you can select at runtime.
vblanco 2 days ago |
Yeah thts the fun of it, you create your kernel/function so that the simd level is a template parameter, and then you can use simple branching like:
if(supports<avx512>){ myAlgo<avx512>(); } else{ myAlgo<avx>(); }
Ive also used it for benchmarking to see if my code scales to different simd widths well and its a huge help
dyaroshev 2 days ago |
FYI: You don't want to do this. `supports<avx512>` is an expensive check. You really want to put this check in a static.
spacechild1 a day ago |
I guess this was just pseudo-code. Of course you don't want to do a runtime feature check over and over again.
kookamamie 2 days ago |
100% agreed. This is the main reason ISPC is my go-to tool for explicit vectorization.
janwas 2 days ago |
+1, dynamic dispatch is important. Our Highway library has extensive support for this.
Detailed intro by kfjahnke here: https://github.com/kfjahnke/zimt/blob/multi_isa/examples/mul...
vlovich123 2 days ago |
Since you seem knowledgeable about this, what does this do differently from other SIMD libraries like xsimd / highway? Is it the addition of algorithms similar to the STD library that are explicitly SIMD optimized?
dyaroshev 2 days ago |
The algorithms I tried to make as good as I knew how. Maybe 95% there. Nice tail handling. A lot of things supported. I like or interface over other alternatives, but I'm biased here. Really massive math library.
dyaroshev 2 days ago |
Our answer to this - is dynamic dispatch. If you want to have multiple version of the same kernel compiled - compile multiple dlls.
The big problem here is: ODR violations. We really didn't want to do the xsimd thing of forcing the user to pass an arch everywhere.
Also that kinda defeats the purpose of "simd portability" - any code with avx2 can't work for an arm platform.
eve just works everywhere.
Example: https://godbolt.org/z/bEGd7Tnb3
janwas 2 days ago |
It is possible to avoid ODR violations :) We put the per-target code into unique namespaces, and export a function pointer to them.
dyaroshev 2 days ago |
You can do many thing with macros and inline namespaces but I believe they run into problems when modules come into play. Can you compile the same code twice, with different flags with modules?
janwas a day ago |
We use pragma target instead of compiler flags :)
dyaroshev a day ago |
I don't think we understand each other.
We want to take one function and compile it twice:
``` namespace MEGA_MACRO {
void foo(std::span<int> s) { super_awesome_platform_specific_thing(s); }
} // namespace MEGA_MACRO ```
Whatever you do - the code above has to be written once but compiled twice. In one file/in many files - doesn't matter.
My point is - I don't think you can compile that code twice if you support modules.
janwas a day ago |
I think I do understand, this is exactly what we do. (MEGA_MACRO == HWY_NAMESPACE)
Then we have a table of function pointers to &AVX2::foo, &AVX3::foo etc. As long as the module exports one single thing, which either calls into or exports this table, I do not see how it is incompatible with building your project using modules enabled?
(The way we compile the code twice is to re-include our source file, taking care that only the SIMD parts are actually seen by the compiler, and stuff like the module exports would only be compiled once.)
dyaroshev a day ago |
> is to re-include our source file
Yeah - that means your source file is never a module. We would really like eve to be modularized, the CI times are unbearable.
I'd love to be proven wrong here, that'd be amazing. But I don't think google highway can be modularized.
janwas a day ago |
What leads you to that conclusion? It is still possible to use #include in module implementations. We can use that to make the module implementation look like your example.
Thus it ought to be possible, though I have not yet tried it.
dyaroshev a day ago |
Well.
You have a file, something like: load.h
You need to include it multiple times, compiled with different flags.
So - it's never going to be in load.cxx or whatever that's called.
janwas a day ago |
As mentioned ("re-include our source file"), we are indeed able to put the SIMD code, as well as the self-#include of itself, in a load.cxx TU.
Here is an example: https://github.com/google/gemma.cpp/blob/9dfe2a76be63bcfe679...
dyaroshev 21 hours ago |
I don't think this works if your files are modules.
Let's stop here, it doesn't seem like we understand each other.
nickpsecurity 2 days ago |
I also found this looking for portable SIMD:
https://github.com/google/highway
shadowpho 2 days ago |
Wait what about AMD? They only claim support for intel and arm
Sadiinso 2 days ago |
« AMD » is x86
dyaroshev 2 days ago |
AMD we support pretty well. I tested Zen1 and a bit Zen4
Conscat 2 days ago |
EVE is personally my favorite SIMD library in any programming language. It's the only one I've tried that provides masked lane operations in a declarative style, aside from SPMD languages like CUDA or OpenMP. The [] syntax for that is admittedly pretty exotic C++, but I think the usefulness of the feature is worth it. I wish the documentation was better, though. When I first started, I struggled to figure out how to simply make a 4-lane float vector that I can pass into shaders, because almost all of the examples are written for the "wide" native-SIMD size.
dyaroshev 2 days ago |
Hi!
Thanks for your interest in the library.
Here is a godbolt example: https://godbolt.org/z/bEGd7Tnb3 Here is a bunch of simple examples: https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...
I personally think we have the following strenghs:
* Algorithms. Writing SIMD loops is very hard. We give you a lot of ready to go loops. (find, search, remove, set_intersection to name a few). * zip and SOA support out of the box. * High quality codegen. I haven't seen other libraries care about unrolling/aligning data accesses - meanwhile these give you substantial improvements. * Supporting more than transform/reduce. We have really decent compress implemented for sse/avx/neon implemented for example.
The following weaknesses:
* We don't support runtime sized sve/rvv (only fixed size). We tried really hard, but unfortunately just the C++ language refuses to play ball there. Here is a discussion about that https://stackoverflow.com/questions/73210512/arm-sve-wrappin...
If this is something you need we recommend compiling a few dynamic libraries with the correct fixed lengths. Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.
* Runtime dispatch based on arch.
We again recommend dlls for this. The problem here is ODR. I believe there is a solution based on preprocessor and namespaces I could use but it breaks as soon as modules become a thing. So - in the module world - we don't have an option. I'm happy for suggestions.
* No MSVC support
C++20 and MSVC is still not a thing enough. And each new version breaks something that was already working. Sad times.
* Just tricky to get started.
I don't know what to do about that. I'm happy to just write examples for people. If you wanna try a library - please create an issue/discussion or smth - I'm happy to take some time and try to solve your case.
We talked about the library at CppCon: https://youtu.be/WZGNCPBMInI?si=buFteQB1e1vXRT5M
If you want to learn how SIMD algorithms work, here are a couple of talks I gave: https://youtu.be/PHZRTv3erlA?si=b87DBYMDskvzYcq1 https://youtu.be/vGcH40rkLdA?si=WL2e5gYQ7pSie9bd
Feel free to ask any questions.
janwas a day ago |
> Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.
I'm curious what you mean by 'variadics', and what exactly you find difficult?
People new to Highway are often surprised by the d/tag argument to loads that say whether to load half/full vector, or no more than 4 elements, etc. The key is to understand these are just zero-sized structs used for type information, and are not the actual vector/data. After that, I observe introductory workshop participants are able to get started/productive quickly.
dyaroshev a day ago |
I struggle to read the highway documentation, it focuses on things that are unrelated to me. So sorry if I'm wrong.
Let me write the std::ranges code and ask you to write them with highway.
https://godbolt.org/z/3s1b8P3sj
PS: this is how it looks in eve: https://godbolt.org/z/Kzxqqdrez
janwas 17 minutes ago |
Thanks for sharing :) Any thoughts on what kind of things you are looking for and didn't find?
I cannot recall anyone saying this kind of thing is a bottleneck for them. We don't use std::range, but searching for a negative value can look like: https://gcc.godbolt.org/z/8bbb16Eea
It looks like smaller codegen than EVE's https://godbolt.org/z/fEn9r175v?
thrtythreeforty 2 days ago |
This library's eve::soa_vector is the first attempt I've seen at dealing with the "SOA problem," which is that if you write good, parallel-friendly code, all your types go to hell and never come back because the language can't express concepts like "my object is made from element 7 of each of these 6 pointers." Instead you write really FORTRAN-looking array processing code with no types or methods in sight.
Does anyone know of other libraries that help a C++ programmer deal with struct-of-arrays?
dyaroshev a day ago |
cppcast talked about soagen https://cppcast.com/soagen/ I didn't look into it too much.
thrtythreeforty 20 hours ago |
Thank you!