The whole point of these formats is to compress down to less
space, so let's not leave actually putting it in the space-saving
form on the client code.
It is identical to the 32-bit format, except with more precision
and range due to using more bits. This format should comfortably
store any color information with precision easily exceeding the
limits of human vision.
This also makes encoding faster. However, it no longer does
rounding to the nearest precision when encoding, and insead does
flooring. This seems like a reasonable tradeoff: if you want more
precision... you should use a format with more precision.
It was a worthwhile experiment, but for it to really work it needs
a really proper luma-chroma separation, which is both slower than
I really want, and requires knowing the colorspace being used.
I might make another go at this based on the TIFF LogLUV color
format, requiring XYZ as input.
Before this it needed SSE 4.1, which is not strictly present on
all x86-64 platforms. This will still compile the faster path if
SSE 4.1 is available, but has an alternate path as well for all
x86-64 platforms.
The biggest one is avoiding a bunch of bit reversals by keeping
numbers in bit-reversed form for as long as we can.
Also reduced the hashing rounds: just 2 rounds seems to be enough
for a reasonable amount of statistical independence on both the
scrambling and shuffling. I tested both independently, keeping
the other with no scrambling/shuffling respectively. This makes
sense because in normal contexts 3 is enough, but in this case
both act as input to yet another hash which is effectively doing
more rounds.
Turns out it causes interference with the Sobol sampler.
Also tweaked some other things about sampling after removing
golden ratio sampling, to make things better.
Due to the undefined behavior of shifting a number by its
bit-width, the Sobol sampler would panic when sample index
`1 << 15` was requested.
This fixes it without introducing any additional checks or
operations.
This limits the number of samples per dimension to 2^16, but that
should be more than enough for any rendering situation. And this
reduces the direction numbers table size by a factor of 4.
This commit also takes advantage of the reduced bit space to
provide even better Owen scrambling, by utilizing the unused
16 bits for better mixing.
- Added an additional scramble round to the Owen scrambling, with
new optimized constants.
- Reordered the dimensions of the direction numbers to improve 2d
projections between adjecent dimensions. Only did this for
dimensions under ~40.
- Updated constants for Owen scrambling, based on better optimization
criteria.
- Increased randomness for the higher bits in the Owen scrambling.
- A simple and efficient implementation of Cranley-Patternson rotation
for the Sobol sampler.
This produces identical results, but generates the direction
vectors from the original sources at build time. This makes
the source code quite a bit leaner, and will also make it easier
to play with other direction vectors in the future if the
opportunity arises.
1. Use better constants for the hash-based Owen scrambling.
2. Use golden ratio sampling for the wavelength dimension.
On the use of golden ratio sampling:
Since hero wavelength sampling uses multiple equally-spaced
wavelengths, and most samplers only consider the spacing of
individual samples, those samplers weren't actually doing a
good job of distributing all the wavelengths evenly. Golden
ratio sampling, on the other hand, does this effortlessly by
its nature, and the resulting reduction of color noise is huge.
The previous implementation was fundamentally broken because it
was mixing the bits in the wrong direction. This fixes that.
The constants have also been updated. I created a (temporary)
implementation of slow but full owen scrambling to test against,
and these constants appear to give results consistent with that
on all the test scenes I rendered on. It is still, of course,
possible that my full implementation was flawed, so more validation
in the future would be a good idea.
This gives better variance than random digit scrambling, at a
very tiny runtime cost (so tiny it's lost in the noise of the
rest of the rendering process).
The important thing here is that I figured out how to use the
scrambling parameter properly to decorrelate pixels. Using the
same approach as with halton (just adding an offset into the sequence)
is very slow with sobol, since moving into the higher samples is
more computationally expensive. So using the scrambling parameter
instead was important.
It has a slight color cast to it at the moment, I believe due to
incorrect color space conversions, not because of the upsampling
method itself. So Meng upsampling is still the active method
at the moment.
Tests random vectors, and makes sure that encoding/decoding
round trip only introduces precision errors below a certain
threshold.
Pretty confident that the implementation is correct now.
This way the executable code can be worked with directly, instead
of via the python file that generates the rust code.
Also introduced some small optimizations.
Rust 1.27 stablized a variety of cpu intrinsics, including SIMD
on x86/64 platforms. This commit moves to using those intrinsics
for the optimized Float4 implementation. This means Psychopath
now compiles on stable Rust with all optimizations. Yay!