We are excited to announce that our dataset, The Places audio caption (Japanese) 100K corpus, is now available. This speech corpus was collected to investigate the learning of spoken language (words, sub-word units, higher-level semantics, etc.) from visually-grounded speech.