It has a slight color cast to it at the moment, I believe due to
incorrect color space conversions, not because of the upsampling
method itself. So Meng upsampling is still the active method
at the moment.
Eventually the Surface trait will be changed to actually mean the
ability to be processed _into_ a MicropolyBatch. So it's ultimately
nonsensical for MicropolyBatch to implement it.
This uses a normalized version of blackbody radiation, so the
colors still vary but the brightness doesn't vary nearly as
wildly as with genuine blackbody radiation.
The sampling method used before is numerically unstable for very
small lights. That sampling method is still used for large/close
lights, since it works very well for that. But for small/distant
lights a simpler and numerically stable method is used.
It *seemed* to fix the problem I was running into, but it actually
made the SphereLight ray intersection code incorrect, and wa just
avoiding intersections that should have happened.
I should test better before committing. :-)
Thanks to a discovery by Petra Gospodnetic during her GSOC
project, I was able to substantially improve light tree sampling
for lambert surfaces. As part of this, the part of the surface
closure API relevant to light tree sampling has been adjusted to
be more flexible.
These improvements do not yet affect GTR surface light tree
sampling.
More specifically: prior to this, SurfaceLights returned the
shadow ray direction vector to use. That was fine, but it
kept the responsibility of generating proper offsets (to account
for floating point error) inside the lights.
Now the SurfaceLights return the world-space point on the light
to sample, along with its surface normal and error magnitude.
This allows the robust shadow ray generation code to be in one
place inside the renderer code.
Reorganized light and surface traits so that light sources are
surfaces as well, which will let them slide easily into
intersection tests with the rest of the scene geometry.
It's not used right now, but in the future I want shaders to be
able to vary over time and have motion blur. This serves as a
nice little reminder by putting it in the API.
The main change is that SurfaceClosures now have the hero
wavelength baked into them. Since surface closures come from
surface intersections, and intersections are always specific to
a ray or path, and rays/paths have a fixed wavelength, it doesn't
make sense for the surface closure to constantly be converting
from a more general color representation to spectral samples
whenever its used.
This is also nice because it keeps surface closures removed from
any particular representation of color. All color space handling
etc. can be kept inside the shaders.
The SAH split would happily repeatedly split on the same axis
as long as the surface area was reduced as much as splitting
on the other axes. This resulted in sliver-like bounding boxes
for some scenes, which is terrible for the light tree.
The SAH splitting code now accounts for the diagonal of the
bounding box, favoring smaller ones. This seems to work well,
fixing the issue without introducing any apparent performance
regressions.
The previous commit dealt with triangles self-shadowing. This
commit deals with avoiding intersections with _other_ objects
very near a triangle ending up being erroneously shadowed by
it.
This turned out to be a rather interesting one. The water-tight
ray/triangle intersection algorithm, while very accurate for
finding if there is an intersection with a line segment, is
not as remarkably accurate for determining if that intersection
is within the interval of the ray.
This is because of the coordinate transformation it does
depending on ray direction: for triangles laying flat on one of
the axis planes near zero, that near-zero coordinate can get
transformed to a much less accurate space for testing. In fact,
generally speaking, beause of the coordinate transform, you can
only rely on the test being as accurate as the least accurate
axis.
The ray-origin offset code was doing offsets based on the
assumption that the error on the major axes are independent, but
as this triangle intersection algorithm shows, you can't actually
depend on that being the case. So rather than handling triangle
intersection as a special case, I've changed the intersection
position error to be a single float, representing the maximum
possible error on any axis. This should be robust for any
geometry type added in the future, and also solves the immediate
issue in a correct way.
Turns out going higher arity makes a huge positive difference
is sampling quality. Currently have 32-arity set as the default,
as it seems to be worth it for the better sampling.
For some reason the ulp incrementing is unreliable when starting
at zero. It creates subnormal numbers, and that seems to be an
issue somewhere in the pipeline, ultimately leading to weird
render artifacts. Not entirely sure why.
This fixes it by avoiding subnormal numbers in the final offset
ray origin. Left a note suggesting investigating in more detail
at some point.
Very small triangles were being missed because of the
not-so-robust ray-triangle intersection algorithm I was using.
Switched to the algorithm from the paper "Watertight
Ray/Triangle Intersection" by Woop et al. Happily, the new
algorithm doesn't seem to measurably slow down renders at all.
They are now generated by a build.rs script from nothing but the
colorspace's primaries, which makes it super easy to add more
colorspaces. So easy that I added three more: ACES AP0, ACES AP1
and Rec.2020.
This lays the foundation for supporting output to different
colorspaces.
This eliminates writing temp files to disk for any part of the
Blender/Psychopath integration.
The option to export to a file still exists, however, by
specifying an export output path.
This is more a piece-of-mind thing than anything else. But it
also lets us make the number of LDS dimensions lower without
worrying, which in turn makes the code smaller.
After implementation, it does appear to make rendering slower
by a noticable bit compared to what I was doing before. At very
low sampling rates it does provide a bit of visual improvement,
but by the time you get to even just 16 samples per pixel its
benefits seem to disappear.
Due to the slow down and the minimal gains, I'll be removing
this in the next commit. But I want to commit it so I don't
lose the code, since it was an interesting experiment with
some promising results.
I couldn't make the BVH4 faster than the BVH, and the bitstack
was bloating the AccelRay struct. Removing the bitstack gives
a small but noticable speedup in rendering.
Specifically, LightPath is now significantly smaller, and
resultingly faster to process.
Also finally fixed the bug where without light sources the light
from the sky wouldn't affect surfaces.
If the average surface area of all the time samples is close enough
to the surface area of their union, just take the union and use that.
This both makes the BVH smaller in memory (time samples don't
propigate up the tree beyond their usefulness) and makes it
faster since traversal can avoid interpolating BBoxes when there's
only one BBox for a node.
Reduced from 64 to 42. This still allows each BVH to hold 4.4
trillion elements, but it guarantees that the accel ray's
traversal bitstack can accommodate at least two nested max-depth
trees.
In practice it worked fine, but only by accident. NaN's were
being passed to the lerp_slice function, which led to the
correct result in this case but is icky and dependant
on how lerp_slice is implemented.
The BVH building code is now largely split out into a separate
type, BVHBase. The intent is that this will also be used by
the BVH4 when I get around to it.
The BVH itself now uses references instead of indexes, allocating
and pointing directly into the MemArena. This allows the nodes
to all be right next to their bounding boxes in memory.
This seems to work more nicely than a fixed block size, because
it adapts to how much memory is being requested over-all. For
example, a small scene won't allocate large amounts of RAM,
but a large scene with large data won't be penalized with a
lot of tiny allocations.
Not tested yet, just a straightforward conversion from the C++
Psychopath codebase. So there are probably bugs in it from the
conversion. But it compiles!
Also created a proper World struct in the process, to store all
infinite-extent type stuff.
Note that I goofed and did a new rustfmt pass but forgot to
commit before making these changes, so there's a lot of
formatting changes in this too. *sigh*