| 11 |
3D world captured by humans and AIComparing depth estimation bias using large-scale human data
|
|---|
Humans can naturally estimate 3D structures from 2D images, and recent advances in artificial intelligence (AI) have enabled physical devices to develop similar abilities. Our research investigates whether these systems rely on the same visual cues as humans in depth estimation. To this end, we collected large-scale human-annotated data for indoor and outdoor images and compared them with predictions from various AI models. Our results show that many AI models exhibited estimation biases similar to humans (e.g., perceiving distant objects as closer than they physically are). Additionally, we identify an accuracy-similarity trade-off: highly accurate AI models often behave less like humans. By precisely modeling human-like error patterns, our work contributes to the development of AI models that better align with human perception. This may support safer and more intuitive applications, such as remote robot operation, where visual misunderstandings can lead to accidents.
[1] Y. Kubota, T. Fukiage, “Human-like monocular depth biases in deep neural networks,” PLOS Computational Biology, Vol. 21, No. 8, e1013020, 2025.
[2] Y. Kubota, T. Fukiage, “Accuracy does not guarantee human-likeness in monocular depth estimators ,” arXiv, 2512.08163, 2025.
[3] Y. Kubota, T. Fukiage, “Benchmarking human and DNN biases in monocular depth estimation,” under review, 2026.
Yuki Kubota, Sensory Representation Research Group, Human Information Science Laboratory