Ning-Hsu Albert Wang
email: albert.nhwang[at]gmail.com

| CV | Google Scholar | Github |
| LinkedIn | Twitter |

I am a Machine Learning engineer at Google, focusing on GenAI and ISP ML model development and acceleration. Prior to this role, I worked as a Machine Learning Engineer at Taiwan AILabs, where I concentrated on advancing 360 image technologies for AR/VR and the Metaverse.

I received my Master's degree in Electrical Engineering from National Tsing Hua University (NTHU), advised by Prof. Min Sun, and my Bachelor's degree in Mechanical Engineering from National Chiao Tung University. During my graduate research, I was fortunate to collaborate with Prof. Wei-Chen Chiu and Dr. Yi-Hsuan Tsai on 360° Stereo Depth Project and Prof. Hwann-Tzong Chen on Planar Reconstruction. During the last semester and the following year, I had a wonderful time as a Computer Vision research intern at MediaTek, Taiwan, working on Depth Estimation, All-in-Focus Reconstruction, and Computational Photography.

My research interest lies in Computer Vision and its applications, especially focusing on 360° imagery, 3D Geometry & Reconstruction, VR/AR application, GenAI, Robotics Perceptions and Computational Photography. I am fascinated with understanding and recreating the 3D world. In my free time, I enjoy listening to music, restaurant/cafe/bar hopping and badminton.

sym

Google
ML Engineer
Jun. 24 - Present

sym

Taiwan AILabs
ML Engineer
Aug. 21 - Apr. 24

sym

MediaTek, Taiwan
Research Intern
Feb. 20 - Mar. 21

sym

NTHU
M.Sc.
Feb. 2018 - Aug. 2020

sym

Atos
Onsite Engineer
Jul. 17 - Aug. 17

sym

NCTU
B.Sc.
Sep. 13 - Jun. 17

  News
  • [09/2024] One paper accepted in Neurips'24.
  • [06/2024] Joined Google as a Machine Learning Engineer.
  • [04/2024] One US Patent issued, patent no.: 11,967,096.
  • [02/2022] One US Patent filed.
  • [07/2021] One paper accepted in ICCV'21.
  • [03/2021] One paper accepted in CVPR'21 as oral paper.
  • [02/2021] Finish my internship at MediaTek.
  • [10/2020] Finish my military training.
  • [08/2020] Receive my M.Sc. degree from National Tsing Hua University.
  • [07/2020] Selected as the honorary member of the Phi Tau Phi Scholastic Honor Society of the Republic of China.
  • [02/2020] Start my research internship at MediaTek working with Dr. Yun-Lin Chang, Dr. Chia-Ping Chen and Senior Engineers Yu-Lun Liu, Ren Wang, and Yu-Hao Huang.
  • [01/2020] One paper accepted in ICRA'20.
  • [08/2019] One paper accepted in ICCV'19 360 Perception and Interaction Workshop.
  Publications

Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
Ning-Hsu Wang, Yu-Lun Liu,
Neurips 2024

webpage | abstract | bibtex | arXiv | huggingface demo | code(tbd)

Accurately estimating depth in 360-degree imagery is crucial for virtual reality, autonomous navigation, and immersive media applications. Existing depth estimation methods designed for perspective-view imagery fail when applied to 360-degree images due to different camera projections and distortions, whereas 360-degree methods perform inferior due to the lack of labeled data pairs. We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively. Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels through a six-face cube projection technique, enabling efficient labeling of depth in 360-degree images. This method leverages the increasing availability of large datasets. Our approach includes two main stages: offline mask generation for invalid regions and an online semi-supervised joint training regime. We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy, particularly in zero-shot scenarios. Our proposed training pipeline can enhance any 360 monocular depth estimator and demonstrates effective knowledge transfer across different camera projections and data types.

            @article{wang2024depthanywhere,
                title={Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation},
                author={Ning-Hsu Wang and Yu-Lun Liu},
                journal={arXiv},
                year={2024}
            }
          

Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision
Ning-Hsu Wang, Ren Wang, Yu-Lun Liu, Yu-Hao Huang, Yu-Lin Chang, Chia-Ping Chen, Kevin Jou
ICCV 2021

webpage | abstract | bibtex | arXiv | code

Depth estimation is a long-lasting yet important task in computer vision. Most of the previous works try to estimate depth from input images and assume images are all-in-focus (AiF), which is less common in real-world applications. On the other hand, a few works take defocus blur into account and consider it as another cue for depth estimation. In this paper, we propose a method to estimate not only a depth map but an AiF image from a set of images with different focus positions (known as a focal stack). We design a shared architecture to exploit the relationship between depth and AiF estimation. As a result, the proposed method can be trained either supervisedly with ground truth depth, or unsupervisedly with AiF images as supervisory signals. We show in various experiments that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and also has higher efficiency in inference time.

          @inproceedings{Wang-ICCV-2021,
            author    = {Wang, Ning-Hsu and Wang, Ren and Liu, Yu-Lun and Huang, Yu-Hao and Chang, Yu-Lin and Chen, Chia-Ping and Jou, Kevin}, 
            title     = {Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision}, 
            booktitle = {International Conference on Computer Vision},
            year      = {2021}
          }
        
sym

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer
Cheng Sun, Chi-Wei Hsiao, Ning-Hsu Wang, Min Sun, Hwann-Tzong Chen
CVPR 2021 Oral

abstract | bibtex | arXiv | code

Indoor panorama typically consists of human-made structures parallel or perpendicular to gravity. We leverage this phenomenon to approximate the scene in a 360-degree image with (H)orizontal-planes and (V)ertical-planes. To this end, we propose an effective divide-and-conquer strategy that divides pixels based on their plane orientation estimation; then, the succeeding instance segmentation module conquers the task of planes clustering more easily in each plane orientation group. Besides, parameters of V-planes depend on camera yaw rotation, but translation-invariant CNNs are less aware of the yaw change. We thus propose a yaw-invariant V-planar reparameterization for CNNs to learn. We create a benchmark for indoor panorama planar reconstruction by extending existing 360 depth datasets with ground truth H&V-planes (referred to as "PanoH&V" dataset) and adopt state-of-the-art planar reconstruction methods to predict H&V-planes as our baselines. Our method outperforms the baselines by a large margin on the proposed dataset.

        @inproceedings{SunHWSC21,
          author    = {Cheng Sun and
                        Chi{-}Wei Hsiao and
                        Ning{-}Hsu Wang and
                        Min Sun and
                        Hwann{-}Tzong Chen},
          title     = {Indoor Panorama Planar 3D Reconstruction via Divide and Conquer},
          booktitle = {CVPR},
          year      = {2021},
        }
        
sym

360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume
Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
ICRA 2020

webpage | abstract | bibtex | arXiv | code

Recently, end-to-end trainable deep neural networks have significantly improved stereo depth estimation of perspective images. However, 360° images captureed under equirectangular projection cannot benefit from directly adopting existing methods due to distortion introduced (i.e., lines in 3D arenot projected into lines in 2D). To tackle this issue, we present a novel architecture specifically designed for spherical disparity using the setting of top-bottom 360° camera pairs. Moreover, we propose to mitigate the distortion issue by: 1) an additional input branch capturing the position and relation of each pixel in the spherical coordinate, and 2) a cost volume built upon a learnable shifting filter. Due to the lack of 360° stereo data, we collect two 360° stereo datasets from Matterport3D and Stanford3D for training and evaluation. Extensive experiments and ablation study are provided to validate our method against existing algorithms. Finally, we show promising results on real-world environments capturing images with two consumer-level cameras

          @inproceedings{wang20icra,
            title = {360SD-Net: 360$^{\circ} Stereo Depth Estimation with Learnable Cost Volume },
            author = {Ning-Hsu Wang and Bolivar Solarte andYi-Hsuan Tsai and Wei-Chen Chiu and Min Sun},
            booktitle = {International Conference on Robotics and Automation (ICRA)},
            year = {2020}
            }
        
  Projects
sym

Panorama Scene Generation

  • Developed a comprehensive strategy for multi-task stable-diffusion algorithm development.
  • Focused primarily on style generation and super resolution.
  • Provided support for panorama image generation.

  • Skills: Generative AI · Stable Diffusion · Transformers

    sym

    Video Enhancement and Restoration

  • Initiated a video enhancement and recovery project for documentary videos.
  • Presented a comparable method, including compression removal and video super-resolution, in comparison to a competitor's approach within a month.

  • Skills: Transformer · Video Processing · Sequential Data · Data/Model Parallelism

    sym

    Light Source Estimtaion

  • Design a novel light-source representation for 3D object insertion in the Metaverse product, constructed using Babylon.js.
  • Created a new Light Source Dataset based on the recently introduced data representation.
  • Developed a neural network model from scratch and deployed it to the product within half a year.

  • Skills: Neural Network Model Design · Computer Graphics · Computational Photography · Semantic segmentation

    sym

    360° Stereo Depth Estimation

  • Presented a new 360° stereo dataset.
  • Implementation of deep neural network baselines as well as conventional methods.
  • Presented a deep nerual network with several novel modules for 360° stereo depth estimation.

  • | code |

    sym

    3D Horror Scene: Horror Style Transfer Using 360° Views and 3D Reconstruction

  • Collected 5000 horror scene images with web crawling from YouTube horror game videos for style transfer training.
  • Implemented CycleGAN for style transfer and LayoutNet for 360◦ layout reconstruction.
  • Combined both model outputs (horror style 360◦ images and 3D room layout) to form a 3D model of horror rooms.

  • sym

    Unmanned Aircraft Remote Delivery System (Drone)

  • Designed and implemented the drone mechanism, motor control, delivery, and real-time surveillance system.
  • Demonstrated the UAV control for unseen location object delivery with a load of 300g.

  • sym

    KNR Robot Navigation and Object Detection in Maze

  • Designed and manufactured the KNR mechanism with multi-sensor (camera, ultrasonic and infrared sensor).
  • Programmed the navigation system, including motor control, multi-sensor feedback, and image processing in LabVIEW.

  • sym

    Validation of The Lambda Method for Integer Ambiguity Estimation

  • Implementation of The Lambda Method for Integer Ambiguity Estimation with Matlab simulation.

  •   Patent
    Methods and Apparatuses of Depth Estimation from Focus Information
    Ren Wang, Yu-Lun Liu, Yu-Hao Huang, Ning-Hsu Wang,
    U.S. Patent Appl. 17/677,365, filed Feb. 2022, published Sep. 2022, issued Apr. 2024, patent no.: 11,967,096
      Professional Activity
    • Reviewer: RA-L, TPAMI, IJCV, AAAI 2023, CVPR 2023, ECCV 2024
      Awards
    • Honorary Member of The Phi Tau Phi Scholastic Honor Society of the Republic of China
    • Appier Conference Scholarship for Top Researches on Artificial Intelligence
    • Arctic Code Vault Contributor (GitHub)

    Template from this awesome website.