Thanks to a discovery by Petra Gospodnetic during her GSOC
project, I was able to substantially improve light tree sampling
for lambert surfaces. As part of this, the part of the surface
closure API relevant to light tree sampling has been adjusted to
be more flexible.
These improvements do not yet affect GTR surface light tree
sampling.
The SAH split would happily repeatedly split on the same axis
as long as the surface area was reduced as much as splitting
on the other axes. This resulted in sliver-like bounding boxes
for some scenes, which is terrible for the light tree.
The SAH splitting code now accounts for the diagonal of the
bounding box, favoring smaller ones. This seems to work well,
fixing the issue without introducing any apparent performance
regressions.
Turns out going higher arity makes a huge positive difference
is sampling quality. Currently have 32-arity set as the default,
as it seems to be worth it for the better sampling.
I couldn't make the BVH4 faster than the BVH, and the bitstack
was bloating the AccelRay struct. Removing the bitstack gives
a small but noticable speedup in rendering.
If the average surface area of all the time samples is close enough
to the surface area of their union, just take the union and use that.
This both makes the BVH smaller in memory (time samples don't
propigate up the tree beyond their usefulness) and makes it
faster since traversal can avoid interpolating BBoxes when there's
only one BBox for a node.
Reduced from 64 to 42. This still allows each BVH to hold 4.4
trillion elements, but it guarantees that the accel ray's
traversal bitstack can accommodate at least two nested max-depth
trees.
In practice it worked fine, but only by accident. NaN's were
being passed to the lerp_slice function, which led to the
correct result in this case but is icky and dependant
on how lerp_slice is implemented.
The BVH building code is now largely split out into a separate
type, BVHBase. The intent is that this will also be used by
the BVH4 when I get around to it.
The BVH itself now uses references instead of indexes, allocating
and pointing directly into the MemArena. This allows the nodes
to all be right next to their bounding boxes in memory.