This is more a piece-of-mind thing than anything else. But it
also lets us make the number of LDS dimensions lower without
worrying, which in turn makes the code smaller.
After implementation, it does appear to make rendering slower
by a noticable bit compared to what I was doing before. At very
low sampling rates it does provide a bit of visual improvement,
but by the time you get to even just 16 samples per pixel its
benefits seem to disappear.
Due to the slow down and the minimal gains, I'll be removing
this in the next commit. But I want to commit it so I don't
lose the code, since it was an interesting experiment with
some promising results.
I couldn't make the BVH4 faster than the BVH, and the bitstack
was bloating the AccelRay struct. Removing the bitstack gives
a small but noticable speedup in rendering.
Specifically, LightPath is now significantly smaller, and
resultingly faster to process.
Also finally fixed the bug where without light sources the light
from the sky wouldn't affect surfaces.
If the average surface area of all the time samples is close enough
to the surface area of their union, just take the union and use that.
This both makes the BVH smaller in memory (time samples don't
propigate up the tree beyond their usefulness) and makes it
faster since traversal can avoid interpolating BBoxes when there's
only one BBox for a node.
Reduced from 64 to 42. This still allows each BVH to hold 4.4
trillion elements, but it guarantees that the accel ray's
traversal bitstack can accommodate at least two nested max-depth
trees.
In practice it worked fine, but only by accident. NaN's were
being passed to the lerp_slice function, which led to the
correct result in this case but is icky and dependant
on how lerp_slice is implemented.
The BVH building code is now largely split out into a separate
type, BVHBase. The intent is that this will also be used by
the BVH4 when I get around to it.
The BVH itself now uses references instead of indexes, allocating
and pointing directly into the MemArena. This allows the nodes
to all be right next to their bounding boxes in memory.
This seems to work more nicely than a fixed block size, because
it adapts to how much memory is being requested over-all. For
example, a small scene won't allocate large amounts of RAM,
but a large scene with large data won't be penalized with a
lot of tiny allocations.
Not tested yet, just a straightforward conversion from the C++
Psychopath codebase. So there are probably bugs in it from the
conversion. But it compiles!
Also created a proper World struct in the process, to store all
infinite-extent type stuff.
Note that I goofed and did a new rustfmt pass but forgot to
commit before making these changes, so there's a lot of
formatting changes in this too. *sigh*
After some experimentation, it's pretty clear that the LightTree
performs a lot better with a model of spherical _volume_ light
sources. This makes sense considering that generally they
represent a distribution of other lights in space.
This is a quick hack to make it behave a bit more like that. But
the long-term solution will be to adjust how
estimate_eval_over_solid_angle() of surface closures is implemented.
Turns out that the standard min/max functions were slow for
some reason, and simple if statements are much faster. This
simple change improves render times by over 30%. Crazy.
The bug was in the previous commit, where I thought I was
preventing out-of-bounds access during traversal by limiting
the tree depth. While the idea was correct, I forgot that the
traversal stack needs _2_ extra slots on top of the tree depth,
not just 1. Fixed.
This avoids exceeding max BVH depth even in pathological cases.
Still need improve non-worst case building, but this at least
prevents crashes in worst case.
The lighting is super crappy, and pretty much hacked in. Will
need to redo this properly soon. However, this verifies that
certain other parts of the code are (mostly) working properly.
The part of the renderer responsible for light transport has been
split out into a LightPath struct. Also moving over to spectral
rendering, although it's a bit silly at the moment.
BVH traversal still happens in local space, but final actual
surface intersection calculations are done in world space by
transforming the triangle into world space. This is to improve
numerical consistency between intersections.
This, of course, depends on the simd ops being there, which
currently they are not. But in the future, hopefully this will
make things speedy. Will need to test, of course.
This is mainly just to make the tracer code read more cleanly.
All of the pushing and popping logic obscured the big picture
and made things a bit confusing.
The test scene isn't rendering properly, presumably because
something isn't correct in the parsing (although it's not clear
it's in the mesh parsing). Need to investigate.
The AssemblyBuilder is responsible for collecting the data needed
to actually create an Assembly. AssemblyBuilders are now the
only way to create an Assembly, which guarantees that Assemblies
aren't half-baked.
Also got instancing working with transforms and such. It may not
be _really_ working because I don't have a complex test case for
it yet. But that will come later.
Apparently this is what UnsafeCell is for, and the code I wrote
before wasn't technically correct, even thought it worked in
practice. Hooray for doing things properly!
Includes:
- More scene parsing code. Making good progress!
- Making the rendering code actually use the Scene and Assembly
types.
- Bare beginnings of a Tracer type.
Weird, to be frank. It was a lot of work. Can't believe I don't
even remember doing it before. Oh well.
In any case, I've improved the 'old' one quite a bit. It should
be more robust now, and will provide errors that may actually be
useful to people when a file fails to parse.
Everything is done with indices anyway, so there was no reason
for it to store an internal reference to the object data. This
gets rid of the type parameter and lifetime parameter on the BVH
struct itself, which will also make it easier to bundle it with
the data it indexes, which will be important later on.
Before this the BVH traversal was always traversing into the
same child first regardless of the situation. Now it checks
the direction of the first ray of the batch and compares it
to the split axis of the node, and traverses into the closest
node first.
It yields the objects that the ray needs to be tested against.
Thus it is the responsibility of the code using the iterator
to actually do the object-level ray tests and update the ray's
max_t etc. accordingly.
This keeps all of the BVH-related code generic with respect to
what kind of object/data the BVH actually contains, which means
the same BVH code can be used for both scene-level and
triangle-level data.
The BVH is now generic over any kind of data. The building
function takes in a closure that can bound the given data type
in 3d space, and the rest just works.
Since it's generated code anyway, it doesn't need to be formatted
nicely, and rustfmt was spewing out a bunch of errors because of
too-long lines anyway.
The code here is a bit messy right now. Just did enough to get
it working. But it needs to be cleaned up and report parse
errors in a human-readable way, among other things.
It was using bounds-checked indexing in the basic operations. Now
it's using non-bounds-checked indexing, since all of the indexing
is constants that we know to be within the bounds.