The SAH split would happily repeatedly split on the same axis
as long as the surface area was reduced as much as splitting
on the other axes. This resulted in sliver-like bounding boxes
for some scenes, which is terrible for the light tree.
The SAH splitting code now accounts for the diagonal of the
bounding box, favoring smaller ones. This seems to work well,
fixing the issue without introducing any apparent performance
regressions.
The previous commit dealt with triangles self-shadowing. This
commit deals with avoiding intersections with _other_ objects
very near a triangle ending up being erroneously shadowed by
it.
This turned out to be a rather interesting one. The water-tight
ray/triangle intersection algorithm, while very accurate for
finding if there is an intersection with a line segment, is
not as remarkably accurate for determining if that intersection
is within the interval of the ray.
This is because of the coordinate transformation it does
depending on ray direction: for triangles laying flat on one of
the axis planes near zero, that near-zero coordinate can get
transformed to a much less accurate space for testing. In fact,
generally speaking, beause of the coordinate transform, you can
only rely on the test being as accurate as the least accurate
axis.
The ray-origin offset code was doing offsets based on the
assumption that the error on the major axes are independent, but
as this triangle intersection algorithm shows, you can't actually
depend on that being the case. So rather than handling triangle
intersection as a special case, I've changed the intersection
position error to be a single float, representing the maximum
possible error on any axis. This should be robust for any
geometry type added in the future, and also solves the immediate
issue in a correct way.
Turns out going higher arity makes a huge positive difference
is sampling quality. Currently have 32-arity set as the default,
as it seems to be worth it for the better sampling.
For some reason the ulp incrementing is unreliable when starting
at zero. It creates subnormal numbers, and that seems to be an
issue somewhere in the pipeline, ultimately leading to weird
render artifacts. Not entirely sure why.
This fixes it by avoiding subnormal numbers in the final offset
ray origin. Left a note suggesting investigating in more detail
at some point.
Very small triangles were being missed because of the
not-so-robust ray-triangle intersection algorithm I was using.
Switched to the algorithm from the paper "Watertight
Ray/Triangle Intersection" by Woop et al. Happily, the new
algorithm doesn't seem to measurably slow down renders at all.
They are now generated by a build.rs script from nothing but the
colorspace's primaries, which makes it super easy to add more
colorspaces. So easy that I added three more: ACES AP0, ACES AP1
and Rec.2020.
This lays the foundation for supporting output to different
colorspaces.
This eliminates writing temp files to disk for any part of the
Blender/Psychopath integration.
The option to export to a file still exists, however, by
specifying an export output path.
This is more a piece-of-mind thing than anything else. But it
also lets us make the number of LDS dimensions lower without
worrying, which in turn makes the code smaller.
After implementation, it does appear to make rendering slower
by a noticable bit compared to what I was doing before. At very
low sampling rates it does provide a bit of visual improvement,
but by the time you get to even just 16 samples per pixel its
benefits seem to disappear.
Due to the slow down and the minimal gains, I'll be removing
this in the next commit. But I want to commit it so I don't
lose the code, since it was an interesting experiment with
some promising results.
I couldn't make the BVH4 faster than the BVH, and the bitstack
was bloating the AccelRay struct. Removing the bitstack gives
a small but noticable speedup in rendering.
Specifically, LightPath is now significantly smaller, and
resultingly faster to process.
Also finally fixed the bug where without light sources the light
from the sky wouldn't affect surfaces.
If the average surface area of all the time samples is close enough
to the surface area of their union, just take the union and use that.
This both makes the BVH smaller in memory (time samples don't
propigate up the tree beyond their usefulness) and makes it
faster since traversal can avoid interpolating BBoxes when there's
only one BBox for a node.
Reduced from 64 to 42. This still allows each BVH to hold 4.4
trillion elements, but it guarantees that the accel ray's
traversal bitstack can accommodate at least two nested max-depth
trees.
In practice it worked fine, but only by accident. NaN's were
being passed to the lerp_slice function, which led to the
correct result in this case but is icky and dependant
on how lerp_slice is implemented.
The BVH building code is now largely split out into a separate
type, BVHBase. The intent is that this will also be used by
the BVH4 when I get around to it.
The BVH itself now uses references instead of indexes, allocating
and pointing directly into the MemArena. This allows the nodes
to all be right next to their bounding boxes in memory.
This seems to work more nicely than a fixed block size, because
it adapts to how much memory is being requested over-all. For
example, a small scene won't allocate large amounts of RAM,
but a large scene with large data won't be penalized with a
lot of tiny allocations.
Not tested yet, just a straightforward conversion from the C++
Psychopath codebase. So there are probably bugs in it from the
conversion. But it compiles!
Also created a proper World struct in the process, to store all
infinite-extent type stuff.
Note that I goofed and did a new rustfmt pass but forgot to
commit before making these changes, so there's a lot of
formatting changes in this too. *sigh*
After some experimentation, it's pretty clear that the LightTree
performs a lot better with a model of spherical _volume_ light
sources. This makes sense considering that generally they
represent a distribution of other lights in space.
This is a quick hack to make it behave a bit more like that. But
the long-term solution will be to adjust how
estimate_eval_over_solid_angle() of surface closures is implemented.
Turns out that the standard min/max functions were slow for
some reason, and simple if statements are much faster. This
simple change improves render times by over 30%. Crazy.
The bug was in the previous commit, where I thought I was
preventing out-of-bounds access during traversal by limiting
the tree depth. While the idea was correct, I forgot that the
traversal stack needs _2_ extra slots on top of the tree depth,
not just 1. Fixed.
This avoids exceeding max BVH depth even in pathological cases.
Still need improve non-worst case building, but this at least
prevents crashes in worst case.
The lighting is super crappy, and pretty much hacked in. Will
need to redo this properly soon. However, this verifies that
certain other parts of the code are (mostly) working properly.
The part of the renderer responsible for light transport has been
split out into a LightPath struct. Also moving over to spectral
rendering, although it's a bit silly at the moment.
BVH traversal still happens in local space, but final actual
surface intersection calculations are done in world space by
transforming the triangle into world space. This is to improve
numerical consistency between intersections.
This, of course, depends on the simd ops being there, which
currently they are not. But in the future, hopefully this will
make things speedy. Will need to test, of course.
This is mainly just to make the tracer code read more cleanly.
All of the pushing and popping logic obscured the big picture
and made things a bit confusing.
The test scene isn't rendering properly, presumably because
something isn't correct in the parsing (although it's not clear
it's in the mesh parsing). Need to investigate.
The AssemblyBuilder is responsible for collecting the data needed
to actually create an Assembly. AssemblyBuilders are now the
only way to create an Assembly, which guarantees that Assemblies
aren't half-baked.
Also got instancing working with transforms and such. It may not
be _really_ working because I don't have a complex test case for
it yet. But that will come later.
Apparently this is what UnsafeCell is for, and the code I wrote
before wasn't technically correct, even thought it worked in
practice. Hooray for doing things properly!
Includes:
- More scene parsing code. Making good progress!
- Making the rendering code actually use the Scene and Assembly
types.
- Bare beginnings of a Tracer type.
Weird, to be frank. It was a lot of work. Can't believe I don't
even remember doing it before. Oh well.
In any case, I've improved the 'old' one quite a bit. It should
be more robust now, and will provide errors that may actually be
useful to people when a file fails to parse.
Everything is done with indices anyway, so there was no reason
for it to store an internal reference to the object data. This
gets rid of the type parameter and lifetime parameter on the BVH
struct itself, which will also make it easier to bundle it with
the data it indexes, which will be important later on.
Before this the BVH traversal was always traversing into the
same child first regardless of the situation. Now it checks
the direction of the first ray of the batch and compares it
to the split axis of the node, and traverses into the closest
node first.
It yields the objects that the ray needs to be tested against.
Thus it is the responsibility of the code using the iterator
to actually do the object-level ray tests and update the ray's
max_t etc. accordingly.
This keeps all of the BVH-related code generic with respect to
what kind of object/data the BVH actually contains, which means
the same BVH code can be used for both scene-level and
triangle-level data.
The BVH is now generic over any kind of data. The building
function takes in a closure that can bound the given data type
in 3d space, and the rest just works.
Since it's generated code anyway, it doesn't need to be formatted
nicely, and rustfmt was spewing out a bunch of errors because of
too-long lines anyway.