Illustration of our prediction and spherical disparity definition. On the right is the 360 degree images with equirectangular projection and our depth estimation. On the left, (a): Spherical projection Ptop and Pbottom; (b)(c): Top and bottom equirectangular projections, where Pt and Pb are projection points from a 3D point on to the spherical surface; d is the angular disparity associated to Pt and Pb point projections; rt and rb are projection vectors for each spherical projection; θt and θb are defined as the angles between the south pole and its respective projection vectors; and θp is the polar angle.


Recently, end-to-end trainable deep neural networks have significantly improved stereo depth estimation of perspective images. However, 360° images captureed under equirectangular projection cannot benefit from directly adopting existing methods due to distortion introduced (i.e., lines in 3D arenot projected into lines in 2D). To tackle this issue, we present a novel architecture specifically designed for spherical disparity using the setting of top-bottom 360° camera pairs. Moreover, we propose to mitigate the distortion issue by: 1) an additional input branch capturing the position and relation of each pixel in the spherical coordinate, and 2) a cost volume built upon a learnable shifting filter. Due to the lack of 360° stereo data, we collect two 360° stereo datasets from Matterport3D and Stanford3D for training and evaluation. Extensive experiments and ablation study are provided to validate our method against existing algorithms. Finally, we show promising results on real-world environments capturing images with two consumer-level cameras

Video Demostration


Overview of the proposed 360SD-Net architecture. Our network mainly consists of three parts: 1) two-branch feature extractor that takes the stereo equirectangular images and the polar angle as the input fro feature concatenation, 2) the ASPP module to enlarge the receptive field, and 3) the learnable shifting filter to construct cost volume with optimal step-size. Finally, we use a 3D Encoder-Decoder to extract deeper context and regress the final disparity map.

Synthetic Datset Results

Qualitative performance of depth estimation conducted on the MP3D stereo dataset with error map. This evaluation compares Ground truth, PSMNet and our 360SD-Net. We also highlight the regions with zoom-in figure.

Qualitative performance in 3D point-cloud conducted on the MP3D 360° stereo dataset. (a) Ground truth point cloud; (b) Estimated point cloud using our 360SD-Net model; (c) Estimated point cloud using PSMNet. First row: Top views of point clouds for highlighting geometry consistency; second row: Perspective views of point clouds for highlighting waves and wrinkles distortions in 3D.

Real-world Static Results

The following three videos are our demostration of real-world prediction with model trained on MP3D synthetic dataset.

Video 1: The reconstruction of our labratory with a pair of 360 images. The walls and bookshelf are well reconstructed.

Video 2: The reconstruction of a storage room with a pair of 360 images. The depth results are sharp, which shows an outstanding point-cloud in room layout and object details.

Video 3: The reconstruction of a stairs with a pair of 360 images. The direction heading upstairs and downstairs are clear and the stair floor is well reconturcted.

Synthetic Dataset

Our collected 360° stereo datasets on MP3D and SF3D. First row: RGB equirectangular top-image; Second row: Ground truth depth maps; Third row: Ground truth disparity maps.

Real-world Data Collection Device

Our camera rig for real-world collection
We use two Insta360 one X camera with a top-bottom aligned setting. Then, we calibrate them with 6X6 Aprilgrid.

Code and Paper link


	title = {360SD-Net: 360$^{\circ} Stereo Depth Estimation with Learnable Cost Volume },
	author = {Ning-Hsu Wang and Bolivar Solarte andYi-Hsuan Tsai and Wei-Chen Chiu and Min Sun},
	booktitle = {International Conference on Robotics and Automation (ICRA)},
	year = {2020}
	title={360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume},
	author={Ning-Hsu Wang and Bolivar Solarte and Yi-Hsuan Tsai and Wei-Chen Chiu and Min Sun},
	journal={arXiv preprint arXiv:1911.04460},