dc.description.abstract |
In robotics, simulations are crucial, especially during the testing stage. However, the
sim2real gap remains a concern. For example, object segmentation learned on simulators
may not translate well on real world data and vice versa. Thus, we cannot train robots,
particularly those that use object detection, on a simulator and expect them to work as well
in the real world. This gap between simulation and reality has been the subject of extensive
research, which has accelerated with the development of deep learning.
For straightforward neural networks, like U-Net, we require paired data between the two
domains of simulation and reality. This, unfortunately, is not always available. This is
where image-to-image “pix2pix” generative models come in. CycleGAN is a great
example of this, which not only maps the image from one domain to the other but also
makes sure that the input and the output images match in image distribution.
We hypothesize that this simulation-to-reality gap can be closed by a more concentrated
approach that only considers realism in depth which is a significant dimension in RGBD
images. While adjustments to color texturing might improve photorealism, changes to
depth may also be beneficial because there is a discernible difference between real and
simulated depth, which can be recorded by classification models. Additionally, models
incorporating diffusion and CLIP could be applied for further improving results. |
en_US |