The bug was in the previous commit, where I thought I was
preventing out-of-bounds access during traversal by limiting
the tree depth. While the idea was correct, I forgot that the
traversal stack needs _2_ extra slots on top of the tree depth,
not just 1. Fixed.
BVH traversal still happens in local space, but final actual
surface intersection calculations are done in world space by
transforming the triangle into world space. This is to improve
numerical consistency between intersections.
This, of course, depends on the simd ops being there, which
currently they are not. But in the future, hopefully this will
make things speedy. Will need to test, of course.