It's a bit inefficient because a thread is spawned for each pixel. Need to implement bucketing.
The problem was in how the camera rays were generated.