CMake's "find_file" can return non-executable files or folders, thus "find_program" must be used to ensure that only executable files are detected.
Fixes issue #3290
When building with LTO, the compiler warns about a violation of the ODR.
This commit fixes the two competing places by moving the structs into
anonymous namespaces.
User jgschaefer [reported an error on pixls.us](https://discuss.pixls.us/t/rt-build-from-git-crash-on-launch-debian-testing-64-bit/1425)
which could be traced down to an empty basename for a HaldCLUT. The
original implementation did not throw an exception due to the use of
`std::string::substr()` instead of `std::string::erase()`, but silently
assigned the first working profile to `profile_name`.
Ingo has provided a solution for the strange Windows crash with
`_mm_cvtpu16_ps()`: It was not an alignment problem, but the use of
MMX instructions which led to the SEGV.
Now Ingo's solutions omits MMX instructions altogether and is
nevertheless faster than the `_mm_set_ps()` workaround.
Many thanks to @heckflosse!
Ingo had some cleanup suggestions in #3154 which I tried to realize with
this commit. Although switching to `vfloat2` is a clever idea, I can see
no further speedup.
Ingo found a buffer overrun due to an inverted mask when deciding
whether to take the SSE path or not. This fix applies his suggested
pattern for boder cases. Kudos to @heckflosse.
Vectorize color space conversion for HaldCLUT depending on the
definition of `VECTLENSP`. It's not fully AVX compatible because `F2V`,
`LVF`, and `STVF` are SSE only.
Instead of using an `Image16`, which is organized in planes, store the
HaldCLUT in an `AlignedBuffer<std::uint16_t>` with sequential RGBx
values. This gives a speedup of roughly 23% here.
This commit adds a true LRU cache to `rtengine` which is used in the new
`CLUTStore` class. The code in `clutstore.*` was cleaned up with C++11
features and small optimizations taken from my `clutbench` project.
The `CLUTStore` class was converted to a true singleton.