two people wearing VR Video headsets

Here at Pixvana we spend all of our time thinking about ways to make it easier to create and share superlatively high-quality immersive VR video with viewers.  We’ve made explainer videos on this subject outlining why many vr videos look fuzzy, but it’s time to go deeper!  Many, many factors affect image quality of VR videos, but we can summarize these thematically as:

  1. Cameras – how camera optics and sensors impact VR video quality
  2. Software – how software settings used for stitching and mastering (editing, color correction, visual effects) affect your master VR video’s quality
  3. Encoding – how video codecs and packaging settings impact the delivered VR video quality
  4. Headsets – how Head Mounted Display (HMD) processors, displays, and optical lenses render the final experience of the VR video

In this post we’ll dive into #1 and #2 to get started, and we’ll follow up with the others in a Part 2.

Camera Optics and Sensors and their impact on VR Video quality 

The camera system used to capture your VR video will have the first dramatic impact on your VR video image quality.  Digital camera sensors and lenses have been studied in great detail over the years and sites like DPReview have developed standardized test charts (illustrated below) and shooting conditions to do great side-by-side comparisons.  When comparing cameras and lenses head-to-head, this type of controlled shooting, side-by-side image comparisons are necessary to really understand the varying images that can be achieved.

DPReview.com's camera comparison tool

DPReview.com’s most excellent camera comparison chart allows a variety of cameras to be compared side-by-side, as-shot in a controlled studio setting using a standard image comprised of various sample charts, colors, and fine-detail objects such as paint brushes, sand, crayons, etc.  You can check it out on their site.

Beyond the traditional camera resolution/quality tests for a video camera system, there are two factors that are of particular concern in spherical Virtual Reality video photography: the effects of very wide-angle lenses, and absolute image resolution that is achieved cumulatively by all cameras in a 360 video camera/rig.

How Wide-Angle Lenses Impact VR Video Image Quality

VR video cameras are composed of as few as two lenses/sensors in most consumer-grade pocket VR video cameras like the Ricoh Theta or GoPro Fusion, to as many as seventeen+ (17) for a camera like the Google Jump.  Generally a pro-level VR video camera will be comprised of four to eight (4-8)  lenses and sensors, such as those on a GoPro Omni or Insta360 Pro camera.  Each of these lenses will need a relatively wide field-of-view, such as a 8-16mm focal length.  This means that the placement of the camera and its lenses and sensors relative to the scene being filmed will have a dramatic effect on how the scene is captured.  Consider the following two images where we can see the overall framing, and a cropped detail of the same part of the scene.

two framings of a scene with camera pointed 90 degrees apart, captured on a professional grade DSLR

Shown above are two framings of a scene with camera pointed 90 degrees apart, captured on a professional grade DSLR using a 8mm fisheye lens that captures 180 degrees of field-of-view.  The image captured is shown above, and a 1:1 pixel resolution crop comparing the same subject matter is shown below.

In Image A the resolution chart is framed in the center of the lens/sensor, producing approximately 500×500 pixels of absolute resolution that is of very high quality.  In Image B the framing of this same area is against the edge of the lens/sensor, producing both spatial and color distortions that reduce the effective image quality. Absent any other concerns, this initial lens/sensor framing against the action will have a dramatic impact on the clarity and resolution of this part of the VR video scene.  This optical degradation would be much more marked if using a lower quality lens or camera sensor (such as a GoPro).

two production stills showing Pixvana’s Aaron Rhodes on location

Take a look at this issue in the two production stills showing Pixvana’s Aaron Rhodes on location.  

On the left, Aaron is actively considering the orientation of the camera to the scene, making sure the key action areas are centered along one of the six (6) camera lens/sensors of the rig.  On the right, he is (relatively) oblivious to the orientation of the lenses to the action, as there are sixteen (16) lenses/sensors along the horizon, providing much more redundant/overlapping image coverage.

Absolute Sensor Image Quality

VR video cameras are simply traditional video cameras arranged into arrays, so the raw images produced by these cameras are no different than traditional video cameras.  The size/quality of each sensor in the array can thus be measured just like a traditional video camera. Consider the following two images where we can see the overall image resolution/quality from two cameras shooting the same scene.

images captured by a consumer camera’s sensor (left) and a professional grade DSLR (right)

Shown above are images captured by a consumer camera’s sensor (left) and a professional grade DSLR (right).  The image above shows the overall framing which are similar, but the below crop show the 1:1 pixel size/clarity which is approximately 300% more detailed in the higher quality camera.

In Image A the resolution chart measures ~250×250 pixels, while in the higher quality Image B crop we get ~450×450 pixels = therefore the cumulative pixel count is  300% greater in Image B. The full-frame size sensor of Camera B simply resolves more resolution. The much higher quality lens, coupled with the larger sensor, captures more color and contrast detail that distinguishes the different light intensities in the scene, as well the amount of fine-detail.  Most camera sensor comparisons incorporate the metric of MTF (Modulation Transfer Function) which you can read about on photographer-blogger Ken Rockwell’s blog.  A VR video captured on these two camera systems will yield dramatically different in-headset vr video quality.

aaron on location with vr 360 cameras

Here’s Aaron again on location with two dramatically different horse-powered camera systems. 

On the left he is capturing six (6) video streams from GoPros, each video stream a 4K video. On the right, a custom camera rig that uses 5 RED professional cinema cameras, each capturing 6K of video.  Not only is the higher quality video spatially higher resolution (4K v 6K), but of even greater differences in pixel quality as typically measured by dynamic-range and pixel-to-angular-resolution. The RED cameras capture superb images with dramatic dynamic range and sharpness, providing tremendous latitude for post-processing and mastering at as high as 12K of 360 degree resolution.

Here’s an example of the kind of rich imagery that can be generated by professional cinema camera arrays when shooting 12k resolution VR content.  Notice the detail in the images, particularly in the challenging indoor dark/low-light environments that would be muddy and of low resolution if a lesser camera system had been used.

 

 

 

Software Post-Production and Mastering considerations for high-quality VR video

Many VR video post-production choices will affect the overall image quality of your VR video.  Just like with traditional video you will want to work at the highest quality (lowest compression) settings for all mezzanine versions and rendering so that you can preserve your compression to the last step before delivery.  But 360/VR video post-production introduces several factors which are specific to the medium:

  • Stitching and mastering resolution (4K, 8K, etc.) for all editing and effects work
  • Mono or Stereo (simulated 3D depth) and how these affect resolution of the images delivered to the headset
  • The projection-map used to store the vr video (eg: equirectangular, cube maps, etc.) and how these change the potential image quality

Stitching and Mastering Resolution

Once the individual video streams from your camera system are stitched together, the resulting video tends to be your “raw/master” video for all subsequent post-production.  Let’s consider how stitching and choosing a target mastering resolution will affect image quality.

 

Play the above animation to see a visualization of how the six different 4K videos produced by this VR camera are aligned and stitched into a complete 360 degree video stream at 8K of resolution.   

If a VR camera is comprised of six (6) cameras each recording a 4K video stream, the stitching software will look for areas of overlapping image on the edge of frame of each stream and use those areas to align and merge the images together into a “stitched” image that solves for the camera geometry and scene.

mage areas highlighted in red are generally overlapping areas that are used to align the cameras together to synthesize a full camera stitch of the scene The image areas highlighted in red are generally overlapping areas that are used to align the cameras together to synthesize a full camera stitch of the scene.  This overlapping pixel area reduces the number of true absolute pixels captured by the scene, since they are duplicated in multiple videos.

As an example, these six (6) videos produce = six (6) (cameras), by ~4000 (horizontal) by ~2000 (vertical) pixels which give us ~48,000,000 pixels per frame of resolution available before the stitching software analysis.  Depending on the amount of overlapping image areas, the actual resulting pixels-from-sensor that are unique will be anywhere from 30-50% reduced, once the overlapping image areas are removed from the final stitched image.  The math varies depending on how many videos are being stitched, and the resolution of each video (we’ll elaborate on this math in a subsequent blog post). In this particular configuration, the six (6) videos yield a 8K resolution stitched video which becomes the “master” for that camera footage for all subsequent post-production.

comparison of a region in raw video, 8k stitch, and 4k stitch

Image A is a direct cropped region from the source video file, as shot by one of the six (6) cameras in the array.  Image B is a cropped sample from a 8K stitched video. Image C is that same area from a 4K stitch.

In the illustration above we can compare the pixels seen by the raw camera file (Image A) to the resulting pixels in a 8K and 4K stitch.  All pixels are relatively uniform, which reveals a much more similar absolute resolution between the raw camera video (Image A) and the 8K stitch (Image B). The 8K image visibly preserves the original video’s image detail.  By contrast, Image C shows how dramatically pixel resolution has been reduced when only using a 4K stitch resolution, effectively lowering the scene detail and resolution by 200%.

Remember that an 8K video is 4x greater in spatial resolution than a single 4K video (because it is both 2x the width AND the length, so 4x greater resolution) = ~32,000,000 pixels per frame!  This adds meaningful cost to all computer, transfer, storage, rendering, etc. — usually at least 4x as much as time and sometimes more. Because of this consideration many high-end television and film productions, even for blockbuster films, still do vfx post-production at 2k resolution instead of full 4k resolution.  8k resolution is 16x more data than a 2k stream! It is important to balance high quality against all production costs, and remember that just cranking up a stitching software’s desired output resolution is not necessarily the right decision for your project. Image A is a direct cropped region from the source video file, Image B is a cropped sample from a 8K stitched video, Image C is that same area from a 4K stitch.

In this variation we can see that not all camera systems are meant to produce higher resolution stitched video.  Here Image A is a direct cropped region from the source video file, Image B is a cropped sample from a 8K stitched video, Image C is that same area from a 4K stitch.

In the above example, the raw video detail is up-sampled by the stitching process to an artificially higher resolution.  This “interpolated upscale” does not produce a higher quality image. Instead, it just generates more pixels that you will now be paying for in every step of post-production.  For this particular camera configuration, we can see that a target output of 4K of resolution would be approximately correct, as it preserves a congruent level of pixel resolution as existed in the original raw video.  To exaggerate this point consider that no camera system today would produce a 32K resolution stitch–although you could ask many stitching software applications to generate that target resolution.

Mono or Stereo Quality Consideration for VR Video

Monoscopic or Stereoscopic (simulated 3D depth by using a separate video for the left and right eye during playack) will tend to dramatically impact the quality of your resulting vr video.  Subjective and artistic merits of “3D depth effect” aside (not everyone or every scene benefits from the depth effect), there is a very real technical problem in stereoscopic VR video. Representing both the left and right eye image will reduce either the vertical or horizontal resolution of the overall scene by 50%.

same pixel resolution, left to right: monoscopic, left/right stereoscopic, top/bottom stereoscopic

Pictured are three (3) videos of the same overall pixel resolution.  A monoscopic version, and a left/right and top/bottom stereoscopic versions.

Notice that the mono version of the video has 100% of the horizontal and vertical resolution available to represent the scene.  The left/right video reduces the horizontal resolution of each eye by 50%, and the top/bottom does the same in the vertical resolution.  Generally speaking, the stereo effect will add a sense of depth to a scene at the expense of reducing overall resolution and clarity.

We will write more on this subject in Part 2 of this post (coming soon), with samples for in-headset viewing.

Projections for Spherical Video

Our final post-production concern is the projection method used to represent the spherical information for a scene.  As we’ve covered previously, representing spheres inside of flat planes (which is the geometry of a rectangular video stream) is impossible without a transformation, or “projection” that maps the geometry of the sphere onto the plane.  This article and video elaborate on VR video projections challenges and some solutions.

For now let’s just review the basics, most importantly how a projection map can preserve or degrade the effective image resolution for a VR video or region of a video.  The most common VR video projections are the Equirectangular and Cube-Map projections, and most post-production and 3D graphics packages will expect/require one-or-the-other between these two techniques.

Here’s a quick introduction to Diamond-Plane projections, which we’ll use as example:

To illustrate the benefits of proper spherical mapping, let’s consider the raw resolution of an image from the camera video and the resulting image resolution in Equirectangular and Diamond Plane formats.  Pixvana has spent a lot of time pondering better projections for VR video that balance quality and compression efficiency (for streaming), and Diamond Plane is our current best solution on the subject.

An equirectangular image from spherical VR video, and diamond plane projection mapped onto 20-sided icosahedron

Pictured an equirectangular image from a spherical VR video, and that same frame in a “diamond plane” projection that maps the sphere onto a 20-sided icosahedron shaped geometry.

Equirectangular projection maps tend to exaggerate pixel resolution in the upper and lower thirds of the image (which is wasteful).  This means that parts of the VR video sphere that are below and above the horizon are over-represented in the video file. They take up more spatial resolution than the original raw video had in detail, and more than they actually will require when projected to the viewer in the VR headset.

Image comparison from the middle and upper third of the video projection on equarectangular and diamond plane

These proportional cropped areas from each of the projections show the relative size and distortions of image areas that are in the middle of the horizon of the spherical video (the chef) and the upper third towards zenith where distortion is more dramatic and inefficient in the equirectangular projection (the plant on the shelf).

When using different projections, different distortions will be introduced in *some* area of the image, as no projection can perfectly represent the pixel density/areas of a sphere-on-a-plane.  Equirectangular is particularly bad in the upper and lower thirds of the image. We developed diamondplane projection technique as it is more balanced across the entire scene, but it is not generally available in post-production pipelines and is used here only to highlight the issue.  This projection subject also affects the last-mile streaming/delivery to the headset, which we will cover late Part 2 of this post (coming soon).

Summary so far and more to discuss

As you can probably tell, we care a lot about video quality at Pixvana! This article is the first in a new series of guides that cover the best ways to optimize video resolution through each stage of the VR video pipeline – from the camera to the headset.

In this post we discussed VR video quality issues that arrise from camera optics and resolution, and post-production software processes used to master your VR video.  In Part 2 we will continue this discussion and cover encoding and streaming considerations, and of greatest importance, the VR headset specific issues that actually render the image for the viewer.

To bring to life the issues we’ve covered here, check out our VR Video Camera Image Quality Test Shoot where we show side-by-side images from various camera systems, where we show the differences in sensor, image quality, and stitched resolution of footage captured by a variety of VR camera systems.

 

Test
Filed Under: