Loading...

Peter Kocsis

PhD student in Inverse Rendering

Supervisor: Prof. Dr. Matthias Niessner
Visual Computing & Artificial Intelligence Lab, Technical University of Munich
peter.kocsis(at)tum.de

About

I am currently doing my PhD in the Visual Computing & Artificial Intelligence Lab at the Technical University of Munich under the supervision of Prof. Dr. Matthias Niessner. I finished my bachelor's as Mechatronics engineer, then did my master's in Robotics, Cognition, Intelligence. Previously, I worked on Reinforcement Learning for control and planning. Later I dived into Active Learning for image classification. During my PhD, I am focusing on Photorealistic 3D Reconstruction, specifically on lighting and material decomposition.

Publications

LightIt: Illumination Modeling and Control for Diffusion Models

CVPR 2024
Peter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Niessner, Yannick Hold-Geoffroy
Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation such as setting the overall mood or cinematic appearance. To overcome these limitations, we propose to condition the generation on shading and normal maps. We model the lighting with single bounce shading, which includes cast shadows. We first train a shading estimation module to generate a dataset of real-world images and shading pairs. Then, we train a control network using the estimated shading and normals as input. Our method demonstrates high-quality image generation and lighting control in numerous scenes.

Intrinsic Image Diffusion for Single-view Material Estimation

CVPR 2024
Peter Kocsis, Vincent Sitzmann, Matthias Niessner
Intrinsic image decomposition is a highly ambigous task. Deep-learning-based methods often fail due to the lack of large-scale real world data. We propose to formulate the problem probabilistically and generate possible decompositions using a generative model. This way, we can also utilize the strong image prior of diffusion models for the task of material estimation, which largely helps generalization.

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

NeurIPS 2022
Peter Kocsis, Peter Súkeník, Guillem Brasó, Matthias Niessner, Laura Leal-Taixé, Ismail Elezi
Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes.

Projects

Active Learning with Transformers

2021
Technical University of Munich
During my masters' thesis, I was working on using inter-sample message passing for active learning. Active learning requires uncertainty estimation of the unlabeled pool. Providing inter-sample information to the network helps to better find the out-of-domain samples.

Reinforcement Learning for Motion Planning

2020
Technical University of Munich
Commonroad is a generic framework for developing and testing motion planning algorithms for autonomous vehicles. Besides working on the platform as working student, I was also participating in researching reinforcement-learning-based motion planning with dense and sparse rewards.

Neural Ball-balancing Table

2019
Budapest University of Technology and Economics
During my bachelors' thesis, I have constructed a ball-balancing table and implemented various control algorithms. A virtual twin has been implemented in Unity and trained a neural-network-based controller, then transferred it to the real world device.

Monocular Localization

2017
Machine Perception Research Laboratory
The goal of the project was to reimplement and potentially improve the paper "Visual localization within LIDAR maps for automated urban driving" (Ryan W. W. and Ryan M. E., 2014). Given a pre-scanned map, we render synthetic views around an estimated pose. Then, we match the synthetic views to the camera feed.
Designed By HTML Codex