I couldn't make the BVH4 faster than the BVH, and the bitstack
was bloating the AccelRay struct. Removing the bitstack gives
a small but noticable speedup in rendering.
The BVH building code is now largely split out into a separate
type, BVHBase. The intent is that this will also be used by
the BVH4 when I get around to it.
The BVH itself now uses references instead of indexes, allocating
and pointing directly into the MemArena. This allows the nodes
to all be right next to their bounding boxes in memory.