Vive Center Publications


Optimistic Dual Extrapolation for Non-monotone Variational Inequality
Author(s): Chaobing Song, Yichao Zhou, Zhengyuan Zhou, Yong Jiang, and Yi Ma
Journal/Conference: NeurIPS, December 2020.

Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization
In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Stochastic Variance Reduction via Accelerated Dual Averaging (SVR-ADA)}. In the nonstrongly convex and smooth setting, SVR-ADA can attain an O(1n)-accurate solution in O(nloglogn) number of stochastic gradient evaluations, where n is the number of samples; meanwhile, SVR-ADA matches the lower bound…

Author(s): Chaobing Song, Yong Jiang, and Yi Ma
Journal/Conference: NeurIPS, December 2020.

Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization
Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for robust low-rank matrix recovery from grossly corrupted measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and sparsity of corruption. This paper shows that with a double over-parameterization for both…

Author(s): Chong You, Zhihui Zhu, Qing Qu, and Yi Ma
Journal/Conference: NeurIPS (spotlight), December 2020.

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction
To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction (MCR2), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive…

Author(s): Yaodong Yu, Kwan Ho Ryan Chan, Chong You, Chaobing Song, and Yi Ma
Journal/Conference: NeurIPS, December 2020.

TransceiVR: Bridging Asymmetrical Communication Between VR Users and External Collaborators
Virtual Reality (VR) users often need to work with other users, who observe them outside of VR using an external display. Communication between them is difficult; the VR user cannot see the external user’s gestures, and the external user cannot see VR scene elements outside of the VR user’s view. We carried out formative interviews with experts to understand these asymmetrical interactions and identify their goals and challenges. From this, we identify high-level system design goals to facilitate asymmetrical interactions and a corresponding space of implementation approaches based on the level of programmatic access to a VR application. We present TransceiVR, a system that utilizes VR platform APIs to enable asymmetric communication interfaces for third-party applications without requiring source code access…

Author(s): Balasaravanan Thoravi Kumaravel, Cuong Nguyen, Stephen DiVerdi, Bjoern Hartmann. 2020.
Conference: Will be presented at UIST’20.

Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Group
This paper considers the fundamental problem of learning a complete (orthogonal) dictionary from samples of sparsely generated signals. Most existing methods solve the dictionary (and sparse representations) based on heuristic algorithms, usually without theoretical guarantees for either optimality or complexity. The recent 1-minimization based methods do provide such guarantees but the associated algorithms recover the dictionary one column at a time. In this work, we propose a new formulation that maximizes the 4-norm over the orthogonal group, to learn the entire dictionary….

Author(s): Yuexiang Zhai, Zitong Yang, Zhenyu Liao, John Wright, and Yi Ma
Journal/Conference: Journal of Machine Learning Research (JMLR), 2020.

Understanding L4-based Dictionary Learning: Interpretation, Stability, and Robustness

Author(s): Yuexiang Zhai, Hermish Mehta, Zhengyuan Zhou, and Yi Ma
Journal/Conference: International Conference on Learning Research (ICLR), 2020.

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network…

Author(s): Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma
Journal/Conference:  International Conference on Machine Learning (ICML), June 2020.
arXiv:2002.11328 [cs.LG]

Deep Isometric Learning for Visual Recognition
Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance. This paper shows that deep vanilla ConvNets without normalization nor skip connections can also be trained to achieve surprisingly good performance on standard image recognition benchmarks. This is achieved by enforcing the convolution kernels to be near isometric during initialization and training, as well as by using a variant of ReLU that is shifted towards being isometric….

Author(s): Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, and Jitendra Malik. 
Journal/Conference:  International Conference on Machine Learning (ICML), June 2020.


TutoriVR: A Video-Based Tutorial System for Design Applications in Virtual Reality
Virtual Reality painting is a form of 3D-painting done in a Virtual Reality (VR) space. Being a relatively new kind of art form, there is a growing interest within the creative practices community to learn it. Currently, most users learn using community posted 2D-videos on the internet, which are a screencast recording of the painting process by an instructor. While such an approach may suffice for teaching 2D-software tools, these videos by themselves fail in delivering crucial details that required by the user to understand actions in a VR space…

Author(s): Balasaravanan Thoravi Kumaravel, Cuong Nguyen, Stephen DiVerdi, Bjoern Hartmann. 2019. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’19).

Tracking of Deformable Human Avatars through Fusion of Low-Dimensional 2D and 3D Kinematic Models
We propose a method to estimate and track the 3D posture as well as the 3D shape of the human body from a single RGB-D image. We estimate the full 3D mesh of the body and show that 2D joint positions greatly improve 3D estimation and tracking accuracy. The problem is inherently very challenging because due to the complexity of the human body, lighting, clothing, and occlusion. The solve the problem, we leverage a custom MobileNet implementation of OpenPose CNN to construct a 2D skeletal model of the human body. We then fit a low-dimensional deformable body model called SMPL to the observed point cloud using initialization from the 2D skeletal model…

Author(s): Ningjian Zhou and S. Shankar Sastry
Technical Report No. UCB/EECS-2019-87
Publication Date: May 19, 2019

Temporal IK: Data-Driven Pose Estimation for Virtual Reality
High-quality human avatars are an important part of compelling virtual reality (VR) experiences. Animating an avatar to match the movement of its user, however, is a fundamentally difficult task, as most VR systems only track the user’s head and hands, leaving the rest of the body undetermined. In this report, we introduce Temporal IK, a data-driven approach to predicting full-body poses from standard VR headset and controller inputs. We describe a recurrent neural network that, when given a sequence of positions and rotations from VR tracked objects, predicts the corresponding full-body poses in a manner that exploits the temporal consistency of human motion…

Author(s): James Lin and James O’ Brien
Technical Report No. UCB/EECS-2019-59
Publication Date: May 17, 2019

Real-Time Hand Model Estimation from Depth Images for Wearable Augmented Reality Glasses
This work presents a hand model estimation method designed specifically with augmented reality (AR) glasses and 3D AR interface in mind. The proposed work is capable of estimating the 3D positions of all ten finger from a single depth image. By leveraging a low-dimensional hand model and exploiting hand geometries from an ego-centric view, we build a lightweight algorithm that is accurate, environment agnostic, and runs in real time on mobile hardware. One major consideration in our design for AR is that the user’s hand is likely to interact with planar surfaces since they serve as ideal “touchscreens”…PDF

Author(s): Bill Zhou, Alex Yu, Joseph Menke and Allen Yang
DOI: 10.1109/ISMAR-Adjunct.2019.00-31

A User Experience Study of Locomotion Design in Virtual Reality Between Adult and Minor Users
Virtual reality (VR) is an important new technology that is fundamentally changing the way people experience entertainment and education content. Due to the fact that most currently available VR products are one-size-fits-all, the user experience of the content interface and user interaction for children is not well understood compared to that for adults. In this study, we seek to explore user experience of locomotion in VR between healthy adults and healthy minors along both objective and subjective dimensions…

Author(s): Zhijiong Huang, Yu Zhang, Kathryn C. Quigley, Ramya Sankar, and Allen Yang
DOI: 10.1109/ISMAR-Adjunct.2019.00027

NeurVPS: Neural Vanishing Point Scanning via Conic Convolution
We present a simple yet effective end-to-end trainable deep network with geometry-inspired convolutional operators for detecting vanishing points in images. Traditional convolutional neural networks rely on aggregating edge features and do not have mechanisms to directly exploit the geometric properties of vanishing points as the intersections of parallel lines. In this work, we identify a canonical conic space in which the neural network can effectively compute the global geometric information of vanishing points locally, and we propose a novel operator named conic convolution…

Author(s): Yichao Zhou, Haozhi Qi, and Yi Ma
Conference: NeurIPS, 2019

L-CNN: End-to-End Wireframe Parsing
We present a conceptually simple yet effective algorithm to detect wireframes in a given image. Compared to the previous methods which first predict an intermediate heat map and then extract straight lines with heuristic algorithms, our method is end-to-end trainable and can directly output a vectorized wireframe that contains semantically meaningful and geometrically salient junctions and lines…

Author(s): Yichao Zhou, Haozhi Qi, and Yi Ma
Conference: International Conference on Computer Vision (ICCV), 2019

Learning to Reconstruct 3D Manhattan Wireframes from a Single Image
In this paper, we propose a method to obtain a compact and accurate 3D wireframe representation from a single image by effectively exploiting global structural regularities. Our method trains a convolutional neural network to simultaneously detect salient junctions and straight lines, as well as predict their 3D depth and vanishing points. Compared with the state-of-the-art learning-based wireframe detection methods, our network is much simpler and more unified, leading to better 2D wireframe detection…

Author(s): Yichao Zhou, Haozhi Qi, Yuexiang Zhai, Qi Sun, Zhili Chen, Li-Yi Wei, and Yi Ma
Conference: International Conference on Computer Vision (ICCV), 2019

Faculty Researchers

Ruzena Bajcsy

Topics: Exoskeletons, Human Kinematic & Dynamic Modeling, Telemedicine, Health Telemonitoring, and Human Musculoskeletal Modeling

Francesco Borrelli

Topics: Applications

Luisa Caldas


Bjoern Hartmann


Richard Koci Hernandez


Ren Ng

Topics: Imaging

James O' Brien

Topics: Graphics

Shankar Sastry


Claire Tomlin


Stella Yu


Allen Yang

Topics: Localization, Immersion, Applications, and Interaction