Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects, which lets you branch at runtime between simd levels as you wish. I find its a far better way of doing things if you actually want to ship the simd code to users.
> Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects
That's pretty cool because you can write function templates and instantiate different versions that you can select at runtime.
if(supports<avx512>){ myAlgo<avx512>(); } else{ myAlgo<avx>(); }
Ive also used it for benchmarking to see if my code scales to different simd widths well and its a huge help
Detailed intro by kfjahnke here: https://github.com/kfjahnke/zimt/blob/multi_isa/examples/mul...
The big problem here is: ODR violations. We really didn't want to do the xsimd thing of forcing the user to pass an arch everywhere.
Also that kinda defeats the purpose of "simd portability" - any code with avx2 can't work for an arm platform.
eve just works everywhere.
Example: https://godbolt.org/z/bEGd7Tnb3
We want to take one function and compile it twice:
``` namespace MEGA_MACRO {
void foo(std::span<int> s) { super_awesome_platform_specific_thing(s); }
} // namespace MEGA_MACRO ```
Whatever you do - the code above has to be written once but compiled twice. In one file/in many files - doesn't matter.
My point is - I don't think you can compile that code twice if you support modules.
Then we have a table of function pointers to &AVX2::foo, &AVX3::foo etc. As long as the module exports one single thing, which either calls into or exports this table, I do not see how it is incompatible with building your project using modules enabled?
(The way we compile the code twice is to re-include our source file, taking care that only the SIMD parts are actually seen by the compiler, and stuff like the module exports would only be compiled once.)
Yeah - that means your source file is never a module. We would really like eve to be modularized, the CI times are unbearable.
I'd love to be proven wrong here, that'd be amazing. But I don't think google highway can be modularized.
Thus it ought to be possible, though I have not yet tried it.
You have a file, something like: load.h
You need to include it multiple times, compiled with different flags.
So - it's never going to be in load.cxx or whatever that's called.
Here is an example: https://github.com/google/gemma.cpp/blob/9dfe2a76be63bcfe679...
Let's stop here, it doesn't seem like we understand each other.
Thanks for your interest in the library.
Here is a godbolt example: https://godbolt.org/z/bEGd7Tnb3 Here is a bunch of simple examples: https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...
I personally think we have the following strenghs:
* Algorithms. Writing SIMD loops is very hard. We give you a lot of ready to go loops. (find, search, remove, set_intersection to name a few). * zip and SOA support out of the box. * High quality codegen. I haven't seen other libraries care about unrolling/aligning data accesses - meanwhile these give you substantial improvements. * Supporting more than transform/reduce. We have really decent compress implemented for sse/avx/neon implemented for example.
The following weaknesses:
* We don't support runtime sized sve/rvv (only fixed size). We tried really hard, but unfortunately just the C++ language refuses to play ball there. Here is a discussion about that https://stackoverflow.com/questions/73210512/arm-sve-wrappin...
If this is something you need we recommend compiling a few dynamic libraries with the correct fixed lengths. Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.
* Runtime dispatch based on arch.
We again recommend dlls for this. The problem here is ODR. I believe there is a solution based on preprocessor and namespaces I could use but it breaks as soon as modules become a thing. So - in the module world - we don't have an option. I'm happy for suggestions.
* No MSVC support
C++20 and MSVC is still not a thing enough. And each new version breaks something that was already working. Sad times.
* Just tricky to get started.
I don't know what to do about that. I'm happy to just write examples for people. If you wanna try a library - please create an issue/discussion or smth - I'm happy to take some time and try to solve your case.
We talked about the library at CppCon: https://youtu.be/WZGNCPBMInI?si=buFteQB1e1vXRT5M
If you want to learn how SIMD algorithms work, here are a couple of talks I gave: https://youtu.be/PHZRTv3erlA?si=b87DBYMDskvzYcq1 https://youtu.be/vGcH40rkLdA?si=WL2e5gYQ7pSie9bd
Feel free to ask any questions.
I'm curious what you mean by 'variadics', and what exactly you find difficult?
People new to Highway are often surprised by the d/tag argument to loads that say whether to load half/full vector, or no more than 4 elements, etc. The key is to understand these are just zero-sized structs used for type information, and are not the actual vector/data. After that, I observe introductory workshop participants are able to get started/productive quickly.
Let me write the std::ranges code and ask you to write them with highway.
https://godbolt.org/z/3s1b8P3sj
PS: this is how it looks in eve: https://godbolt.org/z/Kzxqqdrez
I cannot recall anyone saying this kind of thing is a bottleneck for them. We don't use std::range, but searching for a negative value can look like: https://gcc.godbolt.org/z/8bbb16Eea
It looks like smaller codegen than EVE's https://godbolt.org/z/fEn9r175v?
Does anyone know of other libraries that help a C++ programmer deal with struct-of-arrays?