Bugra Tekin

I am a scientist at the Microsoft Mixed Reality & AI group in Zurich headed by Prof. Marc Pollefeys. I received my Ph.D. degree at the Computer Vision Laboratory of EPFL under the supervision of Prof. Pascal Fua and Prof. Vincent Lepetit. Before that, I obtained my M.Sc. degree in Electrical Engineering from EPFL in 2013, and B.Sc degree in Electrical & Electronics Engineering from Bogazici University in 2011 with high honors. I also spent time at Microsoft Research during my Ph.D. I am the recipient of Qualcomm Innovation Fellowship Europe in 2017.

Email  /  Google Scholar  /  LinkedIn


I'm interested in computer vision, machine learning, deep learning, image processing, and augmented reality. Much of my research is about semantically understanding humans and objects from the camera images in the 3D world. Particularly, I work on 2D/3D human pose estimation, hand pose estimation, action recognition, 3D object detection and 6D pose estimation. In the past, I have also worked in biomedical imaging.


Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction
Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2020.

In this paper, we propose a new method for dense 3D reconstruction of hands and objects from monocular color images. We further present a self-supervised learning approach leveraging photo-consistency between sparsely supervised frames.

H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions
Bugra Tekin, Federica Bogo, Marc Pollefeys
Computer Vision and Pattern Recognition (CVPR), 2019. (oral)

In this work, we propose, for the first time, a unified method to jointly recognize 3D hand and object poses, and their interactions from egocentric monocular color images. Our method jointly estimates the hand and object poses in 3D, models their interactions and recognizes the object and activity classes with a single feed-forward pass through a neural network.

Domain-Specific Priors and Meta Learning for Low-shot First-Person Action Recognition
Huseyin Coskun, Zeeshan Zia, Bugra Tekin, Federica, Bogo, Nassir Navab, Federico Tombari, Harpreet Sawhney
arXiv Preprint, arXiv:1907.09382, 2019.

We develop an effective method for low-shot transfer learning for first-person action classification. We leverage independently trained local visual cues to learn representations that can be transferred from a source domain providing primitive action labels to a target domain with only a handful of examples.

Real Time Seamless Single Shot 6D Object Pose Prediction
Bugra Tekin, Sudipta N. Sinha, Pascal Fua
Computer Vision and Pattern Recognition (CVPR), 2018.
supplementary / code

We introduce a new deep learning architecture that naturally extends the single-shot 2D object detection paradigm to 6D object pose estimation. It demonstrates state-of-the-art accuracy with real-time performance and is at least 5 times faster than the existing methods (50 to 94 fps depending on the input resolution).

Learning Latent Representations of 3D Human Pose with Deep Neural Networks
Isinsu Katircioglu*, Bugra Tekin*, Mathieu Salzmann, Vincent Lepetit, Pascal Fua
International Journal of Computer Vision (IJCV), 2018.

We propose an efficient Long-Short-Term-Memory (LSTM) network for enforcing consistency of 3D human pose predictions across temporal windows.

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation
Bugra Tekin, Pablo Marquez-Neila, Mathieu Salzmann Pascal Fua
International Conference on Computer Vision (ICCV), 2017.
supplementary / code / project

We introduce an approach to learn where and how to fuse the streams of a two-stream convolutional neural network operating on different input modalities for 3D human pose estimation.

Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation
Bugra Tekin, Pablo Marquez-Neila, Mathieu Salzmann, Pascal Fua
arXiv Preprint, arXiv:1611.05708, 2016.

We propose to jointly model 2D uncertainty and leverage 3D image cues in a regression framework for reliable monocular 3D human pose estimation.

Structured Prediction of 3D Human Pose with Deep Neural Networks
Bugra Tekin*, Isinsu Katircioglu*, Mathieu Salzmann, Vincent Lepetit, Pascal Fua
British Machine Vision Conference (BMVC), 2016. (oral)

We introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images that relies on an overcomplete auto-encoder to learn a high-dimensional latent pose representation and account for joint dependencies.

Direct Prediction of 3D Body Poses from Motion Compensated Sequences
Bugra Tekin, Artem Rozantsev, Vincent Lepetit, Pascal Fua
Computer Vision and Pattern Recognition (CVPR), 2016.

We propose to predict the 3D human pose from a spatiotemporal volume of bounding boxes. We further propose a CNN-based motion compensation method that increases the stability and reliability of our 3D pose estimates.

Predicting People's 3D Poses from Short Sequences
Bugra Tekin, Xiaolu Sun, Xinchao Wang, Vincent Lepetit, Pascal Fua
arXiv Preprint, arXiv:1504.08200, 2015.

We propose an efficient approach to exploiting motion information from consecutive frames of a video sequence to recover the 3D pose of people. Instead of computing candidate poses in individual frames and then linking them, as is often done, we regress directly from a spatio-temporal block of frames to a 3D pose in the central one.

Learning Separable Filters
Amos Sironi*, Bugra Tekin*, Roberto Rigamonti, Vincent Lepetit, Pascal Fua
Pattern Analysis and Machine Intelligence (PAMI), 2015.
supplementary / code 2D / code 3D

We introduce an efficient approach to approximate a set of nonseparable convolutional filters by linear combinations of a smaller number of separable ones. We demonstrate that this greatly reduces the computational complexity at no cost in terms of performance for image recognition tasks with convolutional filters and CNNs.

Benefits of Consistency in Image Denoising with Steerable Wavelets
Bugra Tekin, Ulugbek Kamilov, Emrah Bostan, Michael Unser
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013. (oral)

We propose a technique for improving the performance of L1-based image denoising in the steerable wavelet domain. Our technique, which we call consistency, refers to the fact that the solution obtained by the algorithm is constrained to the space spanned by the basis functions of the transform, which results in a certain norm equivalence between image-domain and wavelet-domain estimations.

(*: indicates equal contribution)


Learning Robust Features and Latent Representations for Single View 3D Pose Estimation of Humans and Objects
Bugra Tekin
Ph.D. Thesis , September 2018

Learning Separable Filters with Shared Parts
Bugra Tekin
M.Sc. Thesis , June 2013


Method, System and Device for Direct Prediction of 3D Body Poses from Motion Compensated Sequence
Pascal Fua, Vincent Lepetit, Artem Rozantsev, Bugra Tekin
US Patent , Pub. No: US 2017-0316578 A1, Pub. Date: November 02, 2017


Deep Learning, TA, 2018

Computer Vision, TA, 2016, 2017

Numerical Methods for Visual Computing, TA, 2016

Programmation (C/C++) / (Java), TA, 2013, 2015

Principles of Digital Communications, TA, 2013

Circuits and Systems I/II, TA, 2011, 2012, 2013

pronunciation of my name, Buğra / website template