Peak quality 3D 180 immersive video in Oculus Go and Gear VR - Eric Cheng

Peak quality 3D 180 immersive video in Oculus Go and Gear VR

Articles

In March 2018, I shot and edited a stereoscopic 3D-180 mini-documentary about Bob Kramer, one of the most well-known kitchen knife makers in the world. This is a case study about the project, which was conceived to develop a workflow to enable a single person to generate high-quality stereoscopic video content for VR headsets. Also, Bob’s work is gorgeous, and it was a good opportunity to create an experience to describe what he does!

How to watch the documentary in a VR headset:

For Oculus Rift, you can download and play either version. Players like Virtual Desktop and SKYBOX VR will play the video back if you install a codec package like K-Lite Codec Pack Standard, but you won’t get spatial audio. 

The challenges of 360 video

Stereoscopic (3D) video experiences for VR headsets are notoriously difficult to create. The resulting experiences are often beset by technical issues, many of which are insurmountable without large VFX budgets and custom VR app development.

Two particularly-difficult challenges are stitching for stereoscopic output, and limited video decoder resolution. Stitching for stereoscopic output is much more complicated than stitching for monoscopic output, and it is difficult to recover from a failed stitch. Most mobile devices are able to decode a 4K video—plenty of pixels for a flat video. However, in 3D-360, pixels are not only spread across the entire possible field of view (360 degrees x 180 degrees), they are also split between two eyes. More pixels are needed to present a sharp picture.

The goal of content, especially in VR, is to allow users to lose themselves in an experience. In stereoscopic 360, technical challenges issues like stitching artifacts and low viewing resolution can prevent content from getting through. Furthermore, storytelling in 360 is still an experimental art. 360 videos represent a new kind of media experience, and few storytellers know how to do it effectively.

Stereoscopic 180

For the past few months, I’ve been shooting, editing, and viewing a lot of stereoscopic 180 video (“3D-180”). Oculus Video and third party VR video players (e.g., Skybox VR, Virtual Desktop, Pixvana SPIN Player) have supported 3D-180 playback for a long time in side-by-side (“SBS”), cropped-equirectangular format.

3D-180 has the following technical advantages over 3D-360:

  • Old-school stereographic shooting doesn’t require complicated stitching
  • Higher resolution in headset: pixels are packed into half the field of view
  • Long-time Oculus side-load support
  • Camera person and production team can stay in the room during a shoot!

One of the biggest benefits is non-technical: filmmakers and visual storytellers already know how to tell stories in a front-facing format. When I brief a filmmaker on how to shoot in 180, there isn’t actually much I need to do other than describe what shooting distances work best, and where stereo is most effective. Storytelling mechanics are already well understood.

Disadvantages of 3D-180:

  • Cannot capture rear half of a scene (also can be an advantage)
  • Fewer 3D-180 cameras available
  • Workflow tools are still being developed

Test shoot: a knife maker in Seattle

I went to Bellingham, Washington, for a day to shoot a 3D-180 documentary about Bob Kramer, one of the world’s best knife makers. The Oculus Experiences team has been talking for a long time about how the Maker Movement could be good subjects for a video series, and it was a good opportunity to shoot a test. For a camera, I used a Z CAM K1 Pro, a $3K, integrated 3D-180 camera with Micro Four Thirds sensors and good calibration for de-warping in post (the software “stitcher” is provided and just works). Although the Z CAM K1 Pro is Google VR180-certified (a lot of the VR180 marketing content was shot with this camera), it also exports in SBS cropped equirectangular.

The shoot itself was very straightforward, and all camera positions were locked off on a tripod. The camera’s lenses have adjustable apertures, but are focused at hyperfocal, meaning that no focus adjustment is necessary. I used the Z CAM iOS app over a Wi-Fi connection to change camera settings. Aside from setting a manual exposure and white balance, it was a one-button affair.

I recorded audio using a few different microphones, including a lavalier mic, a stereo shotgun mic, and a Zoom H2n for first-order ambisonics. The shop was a very loud environment, and I wanted to make sure that Bob’s voice could be clearly heard, while capturing enough information to do a spatial-audio mix to recreate the vibe of the shop. At the beginning of each shot, I used a digital clapperboard (on iPad) for sync, and said the take number out loud.

For almost all of the shots, the camera’s lenses were placed at my chin level–around 5’3″ (I’m 5’8″). Although it’s commonplace to put VR cameras “at eye level”, in practice, I’ve found that this feels too high. I shot with Bob about 4-6 feet away from the camera, although stereo from this camera seems to be comfortable from about 3 feet out. In one scene, an anvil is about 2 feet away from the camera; stereo still seems to be fine, but vergence-accommodation conflict starts to be an issue.

Over the course of 8 hours, we shot 38 clips covering the entire process of making a kitchen knife, from forging steel (from a donated barrel of an AR-15!) to acid-etching and polishing the final product. The shop was a fantastic setting for immersive capture; what’s not to love when one combines steel, fire, and exotic machinery?

Post-processing and editing

Each of the two “eyes” of the Z CAM K1 Pro outputs a fisheye video at 2880 x 2880 in high-bitrate h.265, and saves it onto a separate SD card. Z CAM provides post-processing software called WonderStitch, which is used to stitch footage from all of their cameras. Processing K1 Pro footage is still called “stitching,” but it’s really de-warping and re-projecting. The de-warp from fisheye to a normalized projection (in this case, equirectangular) uses factory calibration specific to each individual camera. Calibration data is stored in the cloud, and an online serial-number lookup automatically pulls calibration data down to the stitcher, where it is cached for future use.

For the main edit, I used Adobe Premiere Pro, which supports stereoscopic, side-by-side, equirectangular projections at any horizontal and vertical field of view (in this case, both were set to 180 degrees). A year ago, traditional non-linear editors did not support immersive media very well, and post-processing workflow in the mid-range was rough (high-end VFX tools have always had good tools for folks who know what they are doing). Today, the tools are in much better shape, especially for simple video and audio edits.

Post-processing workflow

Ingest and organize

After copying files from SD and microSD cards (2 for the camera, and 3 for the audio recorders), I moved all of the video and audio files for each clip into individual folders. I named the folders based on clip number.

“Stitch”

I processed the individual files from the Z CAM K1 Pro using WonderStitch, outputting masters in h.265 at 5760 x 2880 (side-by-side cropped equirectangular at 2560 x 2560 per eye). Stitching is approximately real-time on a NVIDIA GTX1080ti GPU. For reference, a Razer Blade laptop with a GTX1060 stitches at about 1/2 to 2/3 real-time. This is an order of magnitude quicker than what is required to process 3D-360 footage.

Although Z CAM offers an option to export at 6144 x 3072, the feature exists only to satisfy marketing demands for “6K” content. In practice, there is no benefit to outputting more than 5760 x 2880 from the K1 Pro. 

WonderStitch currently has no option for exporting in the highest quality (e.g., PNG sequence, DNxHR, or ProRes).

Although the K1 Pro is calibrated at the factory, calibration isn’t perfect, and camera hardware can also move or drift over time. My production K1 Pro produced relatively-good stereoscopic images, but there was still vertical disparity in the final output. Footage looked good–better than stereoscopic footage from any 360 camera, but watching it caused a little discomfort, commonly felt as that “pinch” in the front part of your brain. A few days before my final render, Z CAM shipped me an alpha version of an update to WonderStitch, which includes a feature called “Smart Align” (now available in the public release). After checking the “Smart Align” box, re-processing, and watching the footage, I was absolutely floored–the discomfort pinch was completely gone. Overlaying the left and right eyes for an analysis showed the complete absence of vertical disparity! Z CAM’s “Smart Align” feature is one of the biggest updates ever released in the history of immersive stereoscopic video, and has helped to renew my faith in the medium. After seeing such a big difference in output, I re-stitched all of the clips from the shoot.

Tip: WonderStitch supports batch stitching. Put a folder of folders with each child folder containing left and right eye clips, and point the stitcher at the top level.

Video Edit

After stitching, I brought all of the stitched masters into Adobe Premiere Pro CC 2018 and edited them into a sequence in exactly the same way as I would have with rectilinear video. The K1 Pro records audio using a built-in microphone, so I had a way to preview both video and audio. The audio track served as reference audio for synchronizing all of the other audio sources (more on audio later).

Premiere Pro supports VR preview, which means that I could re-project into preview window as rectilinear (simulating what one would see in headset). I set the sequence settings to match the projection and arrangement of the source video: 180-deg hFOV, 180-deg vFOV, side-by-side stereoscopic, and it just worked. However, Adobe’s Immersive HMD playback (which uses Steam VR) assumes that all content is 360; in headset, content was horizontally stretched to fill a full 360 FOV, despite having sequence settings set to 180.

One I locked the video edit, I did an OMF export to hand the spatial audio portion of the edit to Abesh Thakur, who was able to open the project in Pro Tools / Spatial Workstation. While Abesh did the audio mix, I color graded the footage.

Color Grading and Masking

I have decades of experience as a photographer, but I’m not really a video guy. I find video grading tools to be mysterious, and it takes me a lot of time to make something look good. Luckily, Premiere Pro’s Lumetri Color panel mirrors the common processing tools photographers know and love. I was able to do a quick color grade across all of the clips in the sequence in less than an hour.

Because 180 cameras capture a full hemisphere, the left “eye” (lens) can see the right lens, and the right eye can see the left lens. This does not look good when viewed in 3D in headset, so the lenses must either be masked out or stitched to fill using content from the other eye. My goal was to do little-to-no post processing on the video footage, so I chose to mask out the lenses using a mask I created in Photoshop. An identical mask in the left eye and right eye removes the lens from the frame in a way that is comfortable in headset.

The black mask hides the lenses that are visible on the sides of the frame

Spatial Audio Mix

Bow down to Abesh Thakur! We are not worthy!

But seriously, Abesh volunteered to do a spatial audio mix, and I am deeply grateful. I had audio from a wireless lavalier mic, a stereo shotgun mic, 4-channel spatial audio from a Zoom H2n, and default audio from the K1 Pro. Using the audio from the K1 Pro as a sync track, Abesh created the soundspace you hear in the video, blending sounds from the shop with Bob’s voice.

After the mix was done, Abesh exported the tracks in the following formats:

  • 9-channel, 2nd-order ambisonics + 2-channel headlocked stereo (for FB/Oculus)
  • 4-channel, 1st-order ambisonics w/mixed-in intro sequence (for YouTube and others)
  • 2-channel, binaural stereo (for players that don’t support spatial audio)

The title sequence

For the title sequence, I pulled a mono still frame from the 180 video, did a black and white conversion, and put it in the background. Bob gave me a bunch of still images of his knives and of him working in the shop, and I used Premiere Pro’s Immersive “Plane to Sphere” filter to project the images into 360. I adjusted post-projection roll, pitch and yaw to position the images in the sphere, and adjusted convergenced to pull it forward from the background. I had to work using a 3D-360 sequence because the immersive filters don’t yet work on side-by-side 180. After I finished the 360 title sequence, I pulled the left and right eyes out and arranged them side by side on a 3D-180 sequence.

The April 2018 update of Adobe Creative Cloud (Premiere Pro 12.1) released in early April vastly improved the “Plane to Sphere” filter. The original filter re-projected content full of aliasing and moire, especially obvious in Bob’s knives, whose Damascus steel patterns are intricately detailed. Installing the Premiere Pro update completely fixed the aliasing issues. This update landed just a few days before my final export, and it made the title sequence possible.

Export

Adobe Premiere Pro is capable of exporting video with ambisonics, and an all-Premiere workflow would certainly work fine for most 3D-180 content. In fact, any video editing software that is capable of exporting high-resolution video would work. However, Premiere Pro isn’t capable of exporting a video track with bothambisonics and a stereo headlocked track (a headlocked stereo track plays simultaneously with the ambisonic track, so if you wanted stereo audio that didn’t change based on head movement, you would put it in the headlocked track). In my case, I exported video using Premiere Pro, and then use Facebook Spatial Workstation’s FB360 Encoder to mux the video with ambisonics and headlocked tracks provided by Abesh.

I want to stress that this step is only necessary for folks doing a separate ambisonics mix. Projects without this audio complexity can be edited and exported using any video editor.

I wanted my final videos in a few different formats:

  1. 5.7K 3D-180 master (5760 x 2880)
  2. 5K 3D-180 (5120 x 2560) w/2nd-order ambisonics (peak display quality for Gear VR w/Exynos S7/S8)
  3. 4K 3D-180 (4096 x 2048) w/2nd-order ambisonics

But first, a sidenote about RGB color values in video.

16-235 is evil

I’m sure many of you have watched online videos and wondered why blacks are dark gray. The Rec.709 HDTV standard uses 8-bit RGB color values from 16-235 (“limited range RGB”) for its color content. (16, 16, 16) is reference black, and (235, 235, 235) is reference white. Values between 0-15 and 236-254 are used for timing and other metadata. When a player doesn’t know (or doesn’t respect) color space tagging, it can cause headaches in color matching; I won’t go into details here, but the end result is often that blacks are never fully black, and whites are never fully white. This is most obvious in video content if black colors touch the edge of the frame (which is often fully black).

If a player is expecting limited range video, you’ll still get black blacks and white whites, but you lose 14% of possible color values, which might lead to banding. If the player is expecting full-range video and you give it limited range content, content will look “washed out”, since (16, 16, 16) is shown as dark gray, and (235, 235, 235) as very-light gray.

Top: random export creates RGB values of 16-235 in output. Bottom: full-range export.

This is a complicated topic that could consume hours (weeks? years!?) of one’s life, so I’ll stop here. The point is that the limited-range issue sneaks in all over the place in video workflows. If you output h.264 somewhere in a video editing workflow, chances are that the content has been normalized into the limited range. This is OK if your player knows that the output is limited in range, but if it is expecting full-range RGB, your blacks will turn dark gray, and your whites will turn light gray. Indeed, in most of my editing workflow for this project, I couldn’t achieve full range in headset viewing.

Here is one solution to achieve full range RGB in video exports.

First, in Windows 10, I opened the NVIDIA Control Panel and set dynamic range to full:

Set “Dynamic range” to “Full (0-255)” in the Advanced tab in “Adjust Video Color Settings” in the NVIDIA Control Panel (Control Panel->NVIDIA Control Panel).

Back to export…

I exported the 5.7K 3D-180 master from the Premiere Pro sequence using Adobe Media Encoder as a Cineform Quicktime file, which I used this as a master for further encoding. Then I used ffmpeg–with a special flag that asks it to output full range color–to resize and re-encode in h.264 or h.265 at a crf value of 18 (17, for h.265)–the recommended value for “visually lossless”, at least, for flat viewing. Here is a sample ffmpeg command that retains full range RGB, assuming your source is full range:

ffmpeg -i "in.mov" -vf "scale=4096x2048:out_range=full:out_color_matrix=bt709" -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -c:a copy -g 60 -movflags faststart "out.mp4"

Finally, I used Facebook Spatial Workstation’s FB360 Encoder to mux the video and audio, outputting in “FB360 Matroska” for Oculus headsets, and in “YouTube Video (with 1st order ambiX”) for other players like Pixvana SPIN Player.

Exporting a mezzanine file to use as an intermediate “master” saves a lot of time in encoding for  multiple distribution formats. I can batch up and parallelize encodes and walk away from the computer.

Distribution

The 3D-180 distribution ecosystem is in its infancy. As of today, Facebook supports 3D-180 ingest, and if you upload to Facebook, you’ll see the videos in the same interactive way you can see 360 videos.  If you’re uploading to a Facebook Page, the Page Composer interface will now have a way for you to specify that you’re uploading 180 content. For uploading to Facebook profiles, 3D-180 videos will need a new metadata tag, and unfortunately, metadata injectors are not yet available. Keep an eye on the Facebook 360 page for future news about how to inject metadata for 3D-180 videos.

Oculus Video, Gallery, and third-party VR video players generally support 180 and 3D-180 as sideloaded files in internal storage, or via streaming from media servers. We’re also starting to see apps like Pixvana SPIN Player, which specialize in distributing video to managed devices. These players and services have varying levels of support for spatial audio, so you’ll have to play around to find the best fit.

YouTube supports 3D-180 in their proprietary VR180 format, and only through device-specific upload apps.

Here are some ways you can play back the Kramer Knives video, which is 26-minutes long.

  • Stream from Facebook in the Facebook 360 VR experience, or in News Feed for flat playback.
  • Download the original video file and sideload onto your Gear VR or Oculus Go
    • Download links
    • How to watch in Gear VR
      • Connect via USB to your computer, copy the downloaded file to your Movies folder (or other folder), watch in Oculus Video.
      • or, install Pixvana SPIN Player and hit this link from your Samsung phone 
    • How to watch in Oculus Go
      • Plug your Go into your computer. Put the headset on and allow access at the prompt (put your headset on to authorize). Copy the file onto your Go (you can put it in your Downloads folder), and it will show up in Gallery under “Internal Storage”.
      • Put the video file in Dropbox, Google Drive, or on a local media server. Launch Gallery, connect to your networked source, and stream or download the video. If spatial audio does not play when streaming from the network, use the interface to download the file from the server into internal storage.

180 vs 360 comparison

Here’s a clip you can download of the 360 version of the Kramer Knives video (half the world is black). Because video resolution is the same in each file, each eye in 360 is half the resolution as it is in 180.

Rapid tools development

The combination of an updated Premiere Pro, the “Smart Align” pre-release from Z CAM, and figuring out a full-range RGB export workflow led to a series of continuous re-renders over a few days that ultimately led to a much better-looking export. These tools change quickly, and going back to see the effect they have on old or existing projects can make a huge difference.

I hope that you enjoy the documentary, and that this case study was useful for those of you who plan to pursue 3D-180 video projects. Please feel free to leave comments with questions, corrections, opinions, etc. I’d also love to hear about your current and future plans to shoot in this format!

Sloths!

Another video that is fun to watch in headset features sloths in Panama. You can check it out on Facebook, or download the files to sideload.

Full Equipment List:

Software / Editing Tools List: