# Recognition Research Group

## Publications

### 2021

#### Journal Papers

- T. Nakamura and H. Kameoka, "Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, vol. 29, pp. 68-82.
- H. Kameoka, W. -C. Huang, K. Tanaka, T. Kaneko, N. Hojo, and T. Toda, "Many-to-Many Voice Transformer Network," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, vol. 29, pp. 656-670.
- W. -C. Huang, T. Hayashi, Y. -C. Wu, H. Kameoka, and T. Toda, "Pretraining Techniques for Sequence-to-Sequence Voice Conversion," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, vol. 29, pp. 745-755.
- C. Watanabe and H. Kameoka, "X-DC: Explainable Deep Clustering based on Learnable Spectrogram Templates," Neural Computation, (Volume 33, Issue 7), 2021.
- Y. Fujiwara, S. Kanai, Y. Ida, A. Kumagai, and N. Ueda, "Fast Algorithm for Anchor Graph Hashing," Very Large Data Base (VLDB) Endowment Inc., 2021, vol. 14, no. 6, pp.916-928.
- X. Wu, Y.Sun, T. Kawanishi, and K. Kashino, "Contrast Enhancement based on Discriminative Co-occurrence Statistics," Multimedia Tools and Applications, 80(4), 2021, pp. 6413-6442.
- Xiaomeng Wu, Takahito Kawanishi, Kunio Kashino, “Reflectance-guided histogram equalization and comparametric approximation,” IEEE Transactions on Circuits and Systems for Video Technology 31(3): 863-876 (2021)

#### Peer-reviewed Conference Papers

- Y. Ohishi, Y. Tanaka, and K. Kashino, "Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity," in Proc. the 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 9137-9142.
- D. Ikami, G. Irie, and T. Shibata, "Constrained Weight Optimization for Learning without Activation Normalization," in Proc. the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 2606-2614.
- Y. Fujiwara, Y. Ida, S. Kanai, A. Kumagai, and N. Ueda, "Fast Similarity Computation for t-SNE," in Proc. the 37th IEEE International Conference on Data Engineering (ICDE), 2021.
- S. Inoue, H. Kameoka, L. Li, and S. Makino, "SepNet: A Deep Separation Matrix Prediction Network for Multichannel Audio Source Separation," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 191-195.
- X. Wu, Y. Sun, A. Kimura, and K. Kashino, "Reflectance-Oriented Probabilistic Equalization for Image Enhancement," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 1835-1839.
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, "MaskCycleGAN-VC: Learning Non-Parallel Voice Conversion with Filling in Frames," In Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5919-5923.
- Y. Mitsuzumi, G. Irie, D. Ikami, and T. Shibata, "Generalized Domain Adaptation," in Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- T. Kaneko, "Unsupervised Learning of Depth and Depth-of-Field Effect from Natural Images with Aperture Rendering Generative Adversarial Networks," in Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- D. Niizumi, D. Takeuchi, Y. Ohishi, N. Harada, and K. Kashino, "BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation," in Proc. the International Joint Conference on Neural Networks (IJCNN), 2021.
- Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, and Seiichi Uchida, “Attention to warp: Deep metric learning for multivariate time series,” IEEE International Conference on Document Analysis and Recognition (ICDAR) 2021.
- T. Shibata, G. Irie, D. Ikami, and Y. Mitsuzumi, "Learning with Selective Forgetting" in Proc. International Joint Conference on Artificial Intelligence (IJCAI2021), August 2021.
- T. Shibata, M. Tanaka and M. Okutomi, "Geometric Data Augmentation Based on Feature Map Ensemble" in Proc. IEEE International Conference on Image Processing (ICIP2021), September 2021.
- Seiya Matsuda, Akisato Kimura, Seiichi Uchida, "Impression2Font: Generating fonts by specifying impressions," in Proc. International Conference on Document Analysis and Recognition (ICDAR), 2021.
- Masaya Ueda, Akisato Kimura, Seiichi Uchida, "Which parts determine the impression of fonts?" in Proc. International Conference on Document Analysis and Recognition (ICDAR), 2021.

### 2020

#### Journal Papers

- X. Wu, T. Kawanishi, and K. Kashino, "Reflectance-guided histogram equalization and comparametric approximation," IEEE Transactions on Circuits and Systems for Video Technology, 2020.
- C. Watanabe, K. Hiramatsu, and K. Kashino, "Knowledge discovery from layered neural networks based on non-negative task matrix decomposition," IEICE Transactions on Information and Systems, vol. E103.D, no. 2, pp. 390-397, 2020.
- I. Takahashi, N. Suzuki, N. Yasuda, A. Kimura, N. Ueda, M. Tanaka, N. Tominaga, and N. Yoshida, "Photometric Classification of Hyper Suprime-Cam Transients using Machine Learning," Publications of the Astronomical Society of Japan, no. 5, vol. 72, 2020.
- H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "Nonparallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, vol 28, pp. 2982-2995.

#### Peer-reviewed Conference Papers

- N. Hojo, Y. Ijima, H. Sugiyama, N. Miyazaki, T. Kawanishi, and K. Kashino, "DNN-based speech synthesis considering dialogue-act information and its evaluation with respect to illocutionary act naturalness," in Proc. the 10th International Conference on Speech Prosody (Speech Prosody 2020), 2020.
- D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Invertible DNN-based nonlinear time-frequency transform for speech enhancement," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6644-6648.
- Y. Ohishi, A. Kimura, T. Kawanishi, K. Kashino, D. Harwath, and J. Glass, "Trilingual semantic embeddings of visually grounded speech with self-attention mechanisms," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 4352-4356.
- X. Wu, T. Kawanishi, and K. Kashino, "Reflectance-guided, contrast-accumulated histogram equalization," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 2498-2502.
- S. Kurihara, M. Fukui, S. Shimauchi, and N. Harada, "Subjective quality estimation using PESQ for hands-free terminals," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 921-925.
- D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Real-time speech enhancement using equilibriated RNN," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 851-855.
- Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Phase reconstruction based on recurrent phase unwrapping with deep neural networks," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 826-830.
- Y. Koizumi, M. Yasuda, S. Murata, S. Saito, H. Uematsu, and N. Harada, "SPIDERnet: Attention network for one-shot anomaly detection in sounds," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 281-285.
- S. Emura, H. Sawada, S. Araki, and N. Harada, "A frequency-domain BSS method based on l1 norm, unitary constraint, and cayley transform," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 111-115.
- L. Li, H. Kameoka, and S. Makino, "Determined Audio Source Separation with Multichannel Star Generative Adversarial Network," in Proc. the 30th International Workshop on Machine Learning for Signal Processing (MLSP), 2020, pp. 1-6.
- H. Takeuchi, K. Kashino, Y. Ohishi, and H. Saruwatari, "Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 185-189.
- M. Yasuda, Y. Ohishi, Y. Koizumi, and N. Harada, "Crossmodal Sound Retrieval based on Specific Target Co-occurrence Denoted with Weak Labels," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 1446-1450.
- Y. Ohishi, A. Kimura, T. Kawanishi, K. Kashino, D. Harwath, and J. Glass, "Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 1486-1490.
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, "CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 2017-2021.
- W. Huang, T. Hayashi, Y. Wu, H. Kameoka, and T. Toda, "Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 4676-4680.
- D. Takeuchi, Y. Koizumi, Y. Ohishi, N. Harada, and K. Kashino, "Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning," in Proc. the fifth workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020.
- Y. Fujiwara, A. Kumagai S. Kanai, Y. Ida, and N. Ueda, "Efficient Algorithm for the b-Matching Graph," in Proc. the 26th Conference on Knowledge Discovery and Data Mining (ACM SIGKDD), 2020, pp. 187-197.
- O. Krishna, G. Irie, X. Wu, T. Kawanishi, and K. Kashino, "Adaptive Spotting: Deep Reinforcement Object Search in 3D Point Clouds," in Proc. the Asian Conference on Computer Vision (ACCV), 2020.
- G. Irie, D. Ikami, T. Kawanishi, and K. Kashino, "Cascaded Transposed Long-range Convolutions for Monocular Depth Estimation," in Proc. the Asian Conference on Computer Vision (ACCV), 2020.
- M. Nakano, A. Kimura, T. Yamada, and N. Ueda, "Baxter Permutation Process," in Proc. Advances in Neural Information Processing Systems (NeurIPS), vol.33, 2020, pp. 8648-8659.
- Xiaomeng Wu, Akisato Kimura, Kunio Kashino, Seiichi Uchida, “Total whitening for online signature verification based on deep representation,” International Conference on Pattern Recognition (ICPR) 2020: 655-661
- Y. Mitsuzum, G. Irie, A. Kimura and A. Nakazawa, "A Generative Self-Ensemble Approach To Simulated+Unsupervised Learning," in Proc ICIP2020 - 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 2151-2155.

### 2019

#### Journal Papers

- S. Seki, H. Kameoka, L. Li, T. Toda, and K. Takeda, "Underdetermined source separation based on generalized multichannel variational autoencoder," IEEE Access, vol. 7, pp. 168104-168115, 2019.
- H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "ACVAE-VC: Non-parallel voice conversion with auxiliary classifier variational autoencoder," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 9, pp. 1432-1443, 2019.
- H. Kameoka, L. Li, S. Inoue, and S. Makino, "Supervised determined source separation with multichannel variational autoencoder," Neural Computation, vol. 31, pp. 1-24, 2019.
- C. Watanabe, K. Hiramatsu, and K. Kashino, "Understanding community structure in layered neural networks," Neurocomputing, vol. 367, pp. 84-102, 2019.

#### Peer-reviewed Conference Papers

- C. Watanabe, "Interpreting layered neural networks via hierarchical modular representation," in Proc. ICONIP 2019 - the 26th International Conference on Neural Information Processing, 2019, pp. 376-388.
- M. Yamaguchi, G. Irie, T. Kawanishi, and K. Kashino, "Subspace structure-aware spectral clustering for robust subspace clustering," in Proc. ICCV 2019 - 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9874-9883.
- S. Ikawa and K. Kashino, "Neural audio captioning based on conditional sequence-to-sequence model," in Proc. DCASE 2019 - the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE), 2019, pp. 99-103.
- M. Tsuchida, H. Gunji, H. Nakajima, T. Kawanishi, K. Kashino, and A. Matsui, "Development of the stool color card for early detection of biliary atresia using multispectral image," in Proc. CIC 2019 - the 27th Color and Imaging Conference, 2019, pp. 304-307.
- X. Wu, A. Kimura, B. K. Iwana, S. Uchida, and K. Kashino, "Deep dynamic time warping: End-to-end local representation learning for online signature verification," in Proc. ICDAR 2019 - International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 1103-1110.
- K. Ueno, G. Irie, M. Nishiyama, and Y. Iwai, "Weakly supervised triplet learning of canonical plane transformation for joint object recognition and pose estimation," in Proc. ICIP 2019 - 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 2476-2480.
- G. Irie, T. Kawanishi, and K. Kashino, "Robust learning for deep monocular depth estimation," in Proc. ICIP 2019 - 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 964-968.
- N. Hojo and N. Miyazaki, "Evaluating intention communication by TTS using explicit definitions of illocutionary act performance," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 1536-1540.
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, "StarGAN-VC2: Rethinking conditional methods for StarGAN-based voice conversion," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 679-683.
- S. Inoue, H. Kameoka, L. Li, S. Seki, and S. Makino, "Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier," in Proc. ICA 2019 - the 23rd International Congress on Acoustics (ICA), 2019, pp. 6988-6995.
- M. Yamaguchi, G. Irie, T. Kawanishi, and K. Kashino, "Delving deep into least square regression model for subspace clustering," in Proc. BMVC 2019 - the 30th British Machine Vision Conference, 2019, p. 118.
- S. Seki, H. Kameoka, L. Li, T. Toda, and K. Takeda, "Generalized multichannel variational autoencoder for underdetermined source separation," in Proc. EUSIPCO 2019 - the 27th European Signal Processing Conference (EUSIPCO), 2019, pp. 1-5.
- M. Tsuchida and H. Sato, S. Imamura, T. Kawanishi, K. Kashino, and K. Yano, "Giga-pixel multispectral imaging using commercially available digital camera," in Proc. CIDOC 2019 - 2019 International Committee for Documentation Annual Conference (CIDOC), 2019.
- M. Tsuchida, T. Kawanishi, and K. Kashino, "Virtual color restoration of decolored object using spectrally programmable light source," in Proc. CIDOC 2019 - 2019 International Committee for Documentation Annual Conference (CIDOC), 2019.
- M. Tsuchida and H. Sato, T. Kawanishi, K. Kashino, and K. Yano, "High resolution image retrieval, browsing and visual guide system for museum using smartphone," in Proc. CIDOC 2019 - 2019 International Committee for Documentation Annual Conference (CIDOC), 2019.
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, "Cycleganvc2: Improved cyclegan-based non-parallel voice conversion," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6820-6824.
- K. Tanaka, H. Kameoka, T. Kaneko, and N. Hojo, "ATTS2S-VC: Sequence-to-sequence voice conversion with attention and context preservation mechanisms," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6805-6809.
- G. Irie, M. Ostrek, H. Wang, H. Kameoka, A. Kimura, T. Kawanishi, and K. Kashino, "Seeing through sounds: Predicting visual semantic segmentation results from multichannel audio signals," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 3961-3964.
- M. Yamaguchi, Y. Koizumi, and N. Harada, "Adaflow: Domain-adaptive density estimator with application to anomaly detection and unpaired cross-domain translation," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 3647-3651.
- X. Wu, A. Kimura, S. Uchida, and K. Kashino, "Prewarping siamese network: Learning local representations for online signature verification," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 2467-2471.
- O. Krishna, G. Irie, X. Wu, T. Kawanishi, and K. Kashino, "Learning search path for region-level image matching," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 1967-1971.
- L. Li, H. Kameoka, and S. Makino, "Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 546-550.
- S. Inoue, H. Kameoka, L. Li, S. Seki, and S. Makino, "Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 96-100.
- Y. Ida, Y. Fujiwara, "Network implosion: Effective model compression for ResNets via static layer pruning and retraining," in Proc. IJCNN 2019 - 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1-8.
- A. Kumagai, T. Iwata, Y. Fujiwara, "Transfer metric learning for unseen domains" in Proc. ICDM 2019 - 2019 International Conference on Data Mining (ICDM), 2019, pp. 2467-2477.
- Y. Fujiwara, Y. Ida, S. Kanai, A. Kumagai, J. Arai, N. Ueda, "Fast random forest algorithm via incremental upper bound," in Proc. CIKM 2019 ? 2019 ACM International Conference on Information and Knowledge Management (CIKM), 2019, pp. 2205-2208.
- A. Kumagai, T. Iwata, Y. Fujiwara, "Transfer anomaly detection by inferring latent domain representations," In Proc. NeurIPS 2019 - 2019 Annual Conference on Neural Information Processing Systems (NeurIPS), 2019, pp. 2467-2477.
- Y. Ida, Y. Fujiwara, H. Kashima, "Fast sparse group lasso," in Proc. NeurIPS 2019 ? 2019 Annual Conference on Neural Information Processing Systems (NeurIPS), 2019, pp. 1700-1708.
- S. Kanai, Y. Ida, Y. Fujiwara, M. Yamada, S. Adachi, "Absum: Simple regularization method for reducing structural sensitivity of convolutional neural networks," in Proc. AAAI 2020 ? 2020 AAAI Conference on Artificial Intelligence, 2020.
- M. Eshghi, K. Tanaka, K. Kobayashi, H. Kameoka, T. Toda, "An investigation of features for fundamental frequency pattern prediction in electrolaryngeal speech enhancement," in Proc. the 10th ISCA Speech Synthesis Workshop (SSW2019), 2019, pp. 251-256.
- D. Wang, H. Kameoka, K. Shinoda, "A modified algorithm for multiple input spectrogram inversion," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 4569-4573.

### 2018

#### Journal Papers

- H. Kameoka, T. Higuchi, M. Tanaka, L. Li, "Nonnegative matrix factorization with basis clustering using cepstral distance regularization," IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 26, no. 6, pp. 1029-1040, Jun. 2018.
- C. Watanabe, K. Hiramatsu, and K. Kashino "Modular Representation of Layered Neural Networks," Neural Networks, vol. 97, pp. 62-73, 2018.
- X. Wu, K. Hiramatsu, and K. Kashino, "Label propagation with ensemble of pairwise geometric relations: Towards robust large-scale retrieval of object instances," International Journal of Computer Vision, vol. 126, no. 7, pp. 689-713, 2018.

#### Peer-reviewed Conference Papers

- K. Tanaka, T. Kaneko, N. Hojo, and H. Kameoka, "Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks," in Proc. SLT 2018 - IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 632-639.
- H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "StarGAN-VC: Non-parallel many-to-many voice conversion using star generative adversarial networks," in Proc. SLT 2018 - IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 266-273.
- M. Tsuchida, T. Kawanishi, and K. Kashino, "Color enhancement factors to control spectral power distribution of illumination," in Proc. SIGGRAPH Asia 2018 Posters, 2018.
- Y. Kodama, Y. Kawanishi, T. Hirayama, D. Deguchi, I. Ide, H. Murase, H. Nagano, and K. Kashino, "Localizing the gaze target of a crowd of people," Computer Vision - ACCV 2018 Workshops?14th Asian Conference on Computer Vision, Perth, Australia, December 2?6, 2018, Revised Selected Papers, pp. 15?30, Springer LNCS 11367, June 2019.
- S. Ikawa and K. Kashino, "Acoustic event search with an onomatopoeic query: measuring distance between onomatopoeic words and sound," in Proc. DCASE 2018 - the Detection and Classification of Acoustic Scenes and Events (DCASE), 2018.
- X. Wu, G. Irie, K. Hiramatsu, and K. Kashino, "Weighted generalized mean pooling for deep image retrieval," in Proc. ICIP 2018 - the 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 495-499.
- M. Mori and M. Nakano, "Efficient cyclic learning rate schedules and their evaluations for neural network ensemble," in Proc. MLSP 2018 - the 28th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2018, pp. 1-6.
- K. Oyamada, H. Kameoka, T. Kaneko, K. Tanaka, N. Hojo, and H. Ando, "Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram," in Proc. EUSIPCO 2018 - the 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 2514-2518.
- N. Hojo, H. Kameoka, K. Tanaka, and T. Kaneko, "Automatic speech pronunciation correction with dynamic frequency warping-based spectral conversion," in Proc. EUSIPCO 2018 - the 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 2310-2314.
- T. Kaneko and H. Kameoka, "CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks," in Proc. EUSIPCO 2018 - the 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 2100-2104.
- A. Kimura, Z. Ghahramani, K. Takeuchi, T. Iwata, and N. Ueda, "Few-shot learning of neural networks from scratch by pseudo example optimization," in Proc. BMVC 2018 - the British Machine Vision Conference (BMVC), 2018.
- K. Yano, M. Tsuchida, S. Imamura, and M. Yamaji, "WebGIS-based application for compering rakuchu rakugai-zu folding screens," in Proc. DSDAH 2018 - the 1st KDD Workshop on Data Science for Digital Art History: tackling big data Challenges, Algorithms, and Systems (DSDAH), 2018.
- B. K. Iwana, M. Mori, A. Kimura, and S. Uchida, "Introducing local distance-based features to temporal convolutional neural networks," in Proc. ICFHR 2018 - the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2018, pp. 92-97.
- T. Kaneko, K. Hiramatsu, and K. Kashino, "Generative adversarial image synthesis with decision tree latent controller," in Proc. CVPR 2018 - IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 6606-6615.
- K. Tanaka, H. Kameoka, and K. Morikawa, "VAE-SPACE: Deep generative model of voice fundamental frequency contours," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5779-5783.
- L. Juvela, B. Bollepalli, X. Wang, H. Kameoka, M. Airaksinen, J. Yamagishi, and P. Alku, "Speech waveform synthesis from MFCC sequences with generative adversarial networks," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5679-5683.
- R. Sato, H. Kameoka and K. Kashino, "Statistical phrase/accent command estimation algorithm utilizing linguistic information," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5569-5573.
- X. Wu, G. Irie, K. Hiramatsu, and K. Kashino, "Query expansion with diffusion on mutual rank graphs," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 1653-1657.
- S. Ikawa and K. Kashino, "Generating sound words from audio signals of acoustic events with sequence-to-sequence model," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 346-350.
- H. Kagami, H. Kameoka, and M. Yukawa, "Joint separation and dereverberation of reverberant mixtures with determined multichannel non-negative matrix factorization," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 31-35.
- L. Li and H. Kameoka, "Deep clustering with gated convolutional networks," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 16-20.
- Y. Mukuta, A. Kimura, D. Adrian, and Z. Ghahramani "Weakly supervised collective feature learning from curated media," in Proc. AAAI 2018 - the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018, pp. 7260-7267.

### 2017

#### Journal Papers

- F. Wang, H. Nagano, K. Kashino and T. Igarashi, "Visualizing Video Sounds with Sound Word Animation to Enrich User Experience", IEEE Transactions on Multimedia, vol. 19, no. 2, pp. 418-429, Feb. 2017.

#### Peer-reviewed Conference Papers

- P. L. Tobing, H. Kameoka, and T. Toda, "Deep acoustic-toarticulatory inversion mapping with latent trajectory modeling," in Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 1274-1277.
- K. Oyamada, H. Kameoka, T. Kaneko, H. Ando, K. Hiramatsu, and K. Kashino, "Non-native speech conversion with consistency-aware recursive network and generative adversarial network," in Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 182-188.
- C. Watanabe, K. Hiramatsu, and K. Kashino, "Modular representation of autoencoder networks," in Proc. SSCI - IEEE Symposium Series on Computational Intelligence (SSCI), 2017, pp. 1-8.
- C. Watanabe, K. Hiramatsu, and K. Kashino, "Recursive extraction of modular structure from layered neural networks using variational Bayes method," in Proc. Discovery Science, 2017, pp. 207-222.
- L. Li, H. Kameoka, and S. Makino, "Mel-generalized cepstral regularization for discriminative non-negative matrix factorization," in Proc. MLSP 2017 - IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017, pp. 1-6.
- S. Seki, H. Kameoka, T. Toda, and K. Takeda, "Missing component restoration for masked speech signals based on time-domain spectrogram factorization," in Proc. MLSP 2017 - IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017, pp. 1-6.
- X. Wu, X. Liu, K. Hiramatsu, and K. Kashino, "Contrast-accumulated histogram equalization for image enhancement," in Proc. ICIP 2017 - 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 3190-3194.
- T. Kaneko, S. Takaki, H. Kameoka, and J. Yamagishi, "Generative adversarial network-based postfilter for STFT spectrograms," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 3389-3393.
- L. Li, H. Kameoka, T. Toda, and S. Makino, "Speech enhancement using non-negative spectrogram models with Mel-generalized cepstral regularization," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1998-2002.
- S. Takaki, H. Kameoka, and J. Yamagishi, "Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1128-1132.
- T. Kaneko, H. Kameoka, K. Hiramatsu, and K. Kashino, "Sequence-to-sequence voice conversion with similarity metric learned using generative adversarial networks," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1283-1287.
- K. Tanaka, H. Kameoka, T. Toda, and S. Nakamura, "Physically constrained statistical F0 prediction for electrolaryngeal speech enhancement," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1069-1073.
- M. Murata, K. Hiramatsu, and S. Satoh, "Information retrieval model using generalized Pareto distribution and its application to instance search," in Proc. SIGIR 2017 - the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, p. 1117-1120.
- M. Nakano, "Infinite number place," in Proc. the 11th conference on Bayesian Nonparametrics, 2017, p. 46.
- A. Kimura, I. Takahashi, M. Tanaka, N. Yasuda, N. Ueda, and N. Yoshida, "Single-epoch supernova classification with deep convolutional neural networks," in Proc. ICDCSW 2017 - IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), 2017, pp. 354-359.
- R. Sato, H. Kameoka, and K. Kashino, "Fast algorithm for statistical phrase/accent command estimation based on generative model incorporating spectral features," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5595-5599.
- Y. Tajiri, H. Kameoka, T. Toda, and S. Nakamura, "A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body- conducted signals," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 4960-4964.
- T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 4910-4914.
- X. Wu, T. Kawanishi, M. Mori, K. Hiramatsu, and K. Kashino, "Edited film alignment via selective Hough transform and accurate template matching," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 1707-1711.
- X. Liu, T. Kawanishi, X. Wu, K. Hiramatsu, and K. Kashino, "Deep salience map guided arbitrary direction scene text recognition," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 1642-1646.
- H. Kagami, H. Kameoka, and M. Yukawa, "A majorization-minimization algorithm with projected gradient updates for time-domain spectrogram factorization," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 561-565.
- H. Kameoka, H. Kagami, and M. Yukawa, "Complex NMF with the generalized Kullback-Leibler divergence," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 56-60.
- M. Tsuchida, K. Yano, K. Hiramatsu, and K. Kashino, "Visualizing lost designs in degraded early modern tapestry using infra-red image," in Proc. the 6th Computational Color Imaging Workshop (CCIW'17), Springer LNCS vol. 10213, 2017, pp.144-149.
- L. Li, H. Kameoka, and S. Makino, "Discriminative non-negative matrix factorization with majorization-minimization," in Proc. HSCMA 2017 - the 5th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2017, pp. 141-145.
- Y. Hirose, A. Kimura, and H. Fujishiro, "Cleansing, organizing and training: Two guidelines for generating attractive news headlines for social media," Computation + Journalism Symposium (C+J2017), 2017.
- T. Kaneko, K. Hiramatsu, and K. Kashino, "Generative attribute controller with conditional filtered generative adversarial networks," in Proc. CVPR 2017 - the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6089-6098.

### 2016

#### Journal Papers

- Masaya Murata, Hidehisa Nagano, Kaoru Hiramatsu, Kunio Kashino and Shin'ichi Satoh, ``Bayesian Exponential Inverse Document Frequency and Region-of-Interest Effect for Enhancing Instance Search Accuracy'', IEICE Transactions on Information and Systems, vol. E99-D, no. 9, pp. 2320-2331, Sep. 2016.
- Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 9, pp. 1626-1641, Sep. 2016.
- K. O’Hanlon, H. Nagano, N. Keriven and M. D.Plumbley, “Non-Negative Group Sparsity with Subspace Note Modeling for Polyphonic Transcription,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol.24, no.3, pp.530-542, Mar. 2016.
- Xiaomeng Wu, Jun Shimamura, Taiga Yoshida, Hidehisa Nagano, Kunio Kashino, Takahito Kawanishi, Kaoru Hiramatsu, Takayuki Koizumi, Testuya Kinebuchi, “Spatial Verification via Pairwise Geometric Constraints and 3D View-Directional Voting,” ITE Transaction on Media Technology and Applications, 2016.

#### Book Chapter, Tutorial Papers

- Hirokazu Kameoka, "Non-negative matrix factorization and its variants for audio signal processing," in Applied Matrix and Tensor Variate Data Analysis, T. Sakata (Ed.), Springer Japan, Feb. 2016.
- Takehiro Moriya, Ryosuke Sugiura, Yutaka Kamamoto, Hirokazu Kameoka and Noboru Harada, ``Progress in LPC-based frequency-domain audio coding,'' APSIPA Transactions on Signal and Information Processing, 2016.

#### Invited Talks

- Hirokazu Kameoka, Hideaki Kagami, "Complex non-negative matrix factorization: Phase-aware sparse representation of audio spectrograms," 5th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, 29th Nov. 2016

#### Peer-reviewed Conference Papers

- Xinhao Liu, Takahito Kawanishi, Xiaomeng Wu, Kunio Kashino: Scene text recognition with CNN classifier and WFST-based word labeling. International Conference on Pattern Recognition (ICPR) 2016
- Aki Hayashi, Hirokazu Kameoka, Tatsushi Matsubayashi, Hiroshi Sawada, "Non-negative periodic component analysis for music source separation," in Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2016 (APSIPA ASC 2016), Dec. 2016.
- Nobutaka Ono, Kazuaki Shibata, Hirokazu Kameoka, "Self-localization and channel synchronization of smartphone arrays using sound emissions," in Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2016 (APSIPA ASC 2016), Dec. 2016.
- Masaru Tsuchida, Kaoru Hiramatsu, and Kunio Kashino, “Designing Spectral Power Distribution of Illumination with Color Chart to Enhance Color Saturation”, 24th Color and Imaging conference (CIC24), pp. 278-282, Nov., 2016
- Shuya Ito, Koichi Ito, Takafumi Aoki, and Masaru Tsuchida, A 3D Reconstruction Method with Color Reproduction from Multi-band and Multi-view Images, ACCV 2016 Workshop (e-heritage), Springer LNCS vol. 10117, pp 236-247, Nov., 2016
- Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino, "Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks", The 24th ACM International Conference on Multimedia (ACMMM), Amsterdam, The Netherlands, October 2016.
- Kota Nagayama, Akisato Kimura, Hiroyuki Fujishiro "Make it go viral - Generating attractive headlines for distributing news articles on social media," Proc. Computation + Journalism Symposium (C+J2016), Stanford, CA, USA, September-October 2016.
- Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari, "Semi-supervised joint enhancement of spectral and cepstral sequences of noisy speech," in Proc. The 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), pp. 3753-3757, Sep. 2016.
- Patrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura, "Acoustic-to-articulatory inversion mapping based on latent trajectory Gaussian mixture model," in Proc. The 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), pp. 953-957, Sep. 2016.
- Lauri Juvela, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku, "Majorisation-minimisation based optimisation of the composite autoregressive system with application to glottal inverse filtering," in Proc. The 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), pp. 968-972, Sep. 2016.
- Naoki Murata, Hirokazu Kameoka, Keisuke Kinoshita, Shoko Araki, Tomohiro Nakatani, Shoichi Koyama, Hiroshi Saruwatari, "Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution," in Proc. 2016 24th European Signal Processing Conference (EUSIPCO 2016), pp. 1648-1652, Aug. 2016.
- Masaya Murata, Hidehisa Nagano, Kaoru Hiramatsu and Kunio Kashino, "Filter Design based on Multiple Model Estimation", The 2016 American Control Conference (ACC 2016), pp. 7061-7066, Jul. 2016.
- Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura, “Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework,” in Proc. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2016), Mar. 2016.
- Tomohiko Nakamura, Hirokazu Kameoka, “Shifted non-negative matrix factorization with source-filter model for monaural audio source separation,” in Proc. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2016), Mar. 2016.
- Naoki Murata, Shoichi Koyama, Hirokazu Kameoka, Norihiro Takamune, Hiroshi Saruwatari, “Sparse sound field decomposition with multichannel extension of complex NMF,” in Proc. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2016), Mar. 2016
- Katsushiko Ishiguro, Issei Sato, Naonori Ueda, Masahiro Nakano, Akisato Kimura "Infinite plaid models for infinite bi-clustering," Proc. Internationa AAAI Conference on Artificial Intelligence (AAAI2016), February 2016.
- Xinhao liu, Takahito Kawanishi, Xiaomeng Wu, Kunio Kashino, “Scene Text recognition with high performance CNN classifier and efficient word inference,”2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2016), Mar. 2016

### 2015

#### Journal Papers

- Xiaomeng Wu, Kunio Kashino: Interest point selection by topology coherence for multi-query image retrieval. Multimedia Tools and Applications 74(17): 7147-7180 (2015)
- Xiaomeng Wu, Kunio Kashino, “Second-Order Configuration of Local Features for Geometrically Stable Image Matching and Retrieval,” IEEE Transactions on Circuits and System for Video Technology, Vol.25, no.8, PP.1395-1408, 2015.

#### Book Chapter, Tutorial Papers

- Hirokazu Kameoka, “Non-negative matrix factorization and its variants for audio signal processing,” in Applied Matrix and Tensor Variate Data Analysis, T.Sakata (Ed.), Springer Japan, 2015.

#### Invited Talks

- Akisato Kimura "Computation models of human visual attention driven by auditory cues," International Symposium on Brainware LSI, Sendai, Miyagi, Japan, March 2015.

#### Peer-reviewed Conference Papers

- M. Murata, H. Nagano, and K. Kashino, “Gaussian Unscented Filter,” The 54th IEEEConference on Decision and Control (CDC 2015), pp. 4338-4343, Dec. 2015.
- X. Wu, T yoshida, J. Shimamura, H. Nagano, K. Kashino, T. Kawanishi, K. Hiramatsu, T. Kurozumi and T. Kinebuchi, “NTT at TRECVID 2015: Instance Search,” Proc. Of TRECVID 2015, Nov. 2015.
- M. Tsuchida, M. Mori, K. Kashino, and J. Ymamoto, “Reproduction of Reflective and Fluorescent Components using Eight-band Imaging,” 23rd Color and Imaging Conference(CIC23), pp.52-57, Oct. 2015.
- Xiaomeng Wu, M. Mori, K. Kashino, “Data-driven taxonomy forest for fine-grained image categorization,” Multimedia and Expo (ICME), 2015 IEEE International Conference on, pp.1-6, 2015
- Xiaomeng Wu, Kunio Kashino, “Adaptive Dither Voting for Robust Spatial Verification,” IEEE International Conference on Computer Vision (ICCV), pp.1877-1885, Dec, 2015
- Xiaomeng Wu, Kunio Kashino, “Robust Spatial Matching as Ensemble of Weak Geometric Relations,” British Machine Vision Conference, British Machine Vision Association, pp.25.1-25.12, Sep. 2015.
- Takuya Higuchi, Hirokazu Kameoka, ”Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model,” EUSIPCO, EURASIP, pp.2043-2047, Nice, Aug. 2015.
- Minoru Mori, Xiaomeng Wu, Kunio Kashino, “Trademark Image Retrieval Using Inverse Total Feature Frequency and Multiple Detectors,” 16th International Conference on Computer Analysis of Images and Patterns (CAIP), pp.778-789, Sep. 2015.
- Hirokazu Kameoka,“Modeling speech parameter sequences with latent trajectory hidden Markov model,” in Proc. The 25th IEEE International Workshop on Machine Learning for Signal Processing (MLSP2015), pp.1-6, Sep.2015.
- Jiro Nakajima, Akisato Kimura, Akihiro Sugimoto, Kunio Kasshino "Visual attention driven by auditory cues: Selecting visual features in synchronization with attracting auditory events," Proc. International Conference on Multimedia Modeling (MMM2015), Vol.2, pp.74-86, Sydney, Australia, January 2015.
- Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, “Relaxation of rank-1 spatial constraint in overdetermined blind source separation,” The 2015 European Signal Processing Conference (EUSPCO 2015), pp.1261-1262, Aug. 2015.
- Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi "Distributed forests for MapReduce-based machine learning," Proc. IAPR Asian Conference on Pattern Recognition (ACPR 2015), Kuala Lumpur, Malaysia, November 2015.
- Sawa Kourogi, Akisato Kimura, Hiroyuki Fujishiro, Hitoshi Nishikawa "Identifying attractive headlines for social media," Proc. ACM International Conference on Information and Knowledge Management (CIKM2015), pp.1859-1862, Melbourne, Australia, October 2015.
- Jun Fujiki, Masaru Tanaka, Hitoshi Sakano, Akisato Kimura "Geometric interpretation of Fisher's linear discriminant analysis through communication theory," Proc. IAPR International Conference on Machine Vision and Applications (MVA2015), pp.333-336, Tokyo, Japan, May 2015.

#### Other Conference Papers

- M. Murata, H. Nagano, K. Hiramatsu, and K. Kashino, “Current Issues of Particle Filtering and Some Algorithmic Improvements,” The 58th Japan Joint Automatic Control Conference (JACC), 2015. (Invited)
- Masaya Murata, Kaoru Hiramatsu, and Kunio Kashino, “Current Issues of Particle Filtering and Some Algorithmic Improvements,” The 47th ISCIE International Symposium on Stochastic Systems Theory and Its Applications (SSS’15), Dec. 2015.

### 2014

#### Journal Papers

- A.Kimura, K.Duh, T.Hirao, K.Ishiguro, T.Iwata, C.M.Au Yeung, "Creating stories from socially curated microblog messages," to appear, IEICE Transactions on Information and Systems, Vol.E97-D, No.6, June 2014.
- M.Mori, S.Uchida, H.Sakano,"Global Feature for Online Character Recognition," Pattern Recognition Letters, vol.35, no.1 pp.142-148, 2014.
- X.Wu, K.Kashino,"Interest Point Selection by Topology Coherence for Multi-Query Image Retrieval.", Multimedia Tools and Applications

#### Book Chapter, Tutorial Papers

- A.Kimura, "Large-scale cross-media analysis and mining from socially curated contents."Progress in Informatics. Mar.2014

#### Peer-reviewed Conference Papers

- Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo, Hirokazu Kameoka, "Speech prosody generation for text-to-speech synthesis based on generative model of F0 contours," in Proc. The 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Sep. 2014. (to appear)
- Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, Hirokazu Kameoka, "A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models," in Proc. The 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Sep. 2014. (to appear)
- Ryosuke Sugiura, Yutaka Kamamoto, Noboru Harada, Hirokazu Kameoka, Takehiro Moriya, "Direct linear conversion of LSP parameters for perceptual control in speech and audio coding," in Proc. The 2014 European Signal Processing Conference (EUSIPCO 2014), Sep. 2014. (to appear)
- Ryosuke Sugiura, Yutaka Kamamoto, Noboru Harada, Hirokazu Kameoka, Takehiro Moriya, "Representation of spectral envelope with warped frequency resolution for audio coder," in Proc. The 2014 European Signal Processing Conference (EUSIPCO 2014), Sep. 2014. (to appear)
- Tomohiko Nakamura, Hirokazu Kameoka, "Fast signal reconstruction from magnitude spectrogram of continuous wavelet transform based on spectrogram consistency," Accepted for publication in Proc. the 17th International Conference on Digital Audio Effects (DAFx-14), Sep. 2014.
- R.Ogata, M.Mori, V. Frinken, S.Uchida, "Constrained AdaBoost for Totally-Ordered Global Features," ICFHR2014, Sept.
- X.Wu, K.Kashino,"Image Retrieval Based on Anisotropic Scaling and Shearing Invariant Geometric Coherence.", ICPR, Aug.2014
- M.Mori, H.Kimiyama, M.Ogawara,"Seach-Based Content Analysis System on Online Collaborative Platform for Film Production.", ICPR2014, Aug.2014
- M.Mori, T.Kurozumi, H.Nagano, K.Kashino,"Video Content Detection with Single Frame Level Accuracy Using Dynamic Thresholding Technique.", ICPR2014, Aug.2014
- M. Tsuchida. K. Kashino, and J. Yamato, "Experimental Evaluation of Chromostereopsis with Varying Center Wavelength and FWHM of Spectral Power Distribution", ICISP2014 (MCS2014), LNCS 8509, Jun 2014.
- M.Nakano, K.Ishiguro, A.Kimura, T.Yamada, N.Ueda "Rectangular tiling process," to appear, International Conference on Machine Learning (ICML2014), June 2014.
- X.Wu, K.Kashino,"Image Retrieval Based on Spatial Context with Relaxed Gabriel Graph Pyramid.", ICASSP2014, May.2014
- Y.Ohishi, D.Mochihashi, H.Kameoka, K.Kashino,"Mixture of Gaussian Process experts for predicting sung melodic contour with expressive dynamic fluctuations.", ICASSP2014, May.2014
- Takuya Higuchi, Norihiro Takamune, Tomohiko Nakamura, Hirokazu Kameoka, "Underdetermined blind separation and tracking of moving sources based on DOA-HMM, Accepted for publication in Proc. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2014), pp. 3215-3219, May 2014.

### 2013

#### Journal Papers

- M.Tsuchida, S.Sakai, M.Miura, K.Ito, T.Kawanishi, K.Kashino, J.Yamato, T.Aoki,"Stereo One-shot Six-band Camera System for Accurate Color Reproduction.",Journal of Electronic Imaging, Vol.22, Issue03, Jul.2013
- A.Kimura, M.Sugiyama, H.Kameoka, H.Sakano "Designing various component analysis at will via generalized pairwise expression," IPSJ Transactions on Mathematical Modeling and its Applications, Vol.6, No.1, pp.128-135, March 2013.
- A.Kimura, M.Sugiyama, T.Nakano, H.Kameoka, H.Sakano, E.Maeda, K.Ishiguro "SemiCCA: Efficient semi-supervised learning of canonical correlations," IPSJ Transactions on Mathematical Modeling and its Applications, Vol.6, No.1, pp.136-145, March 2013.
- A.Kimura, R.Yonetani, T.Hirayama "Computational models of human visual attention and their implementations: A survey," IEICE Transactions on Information and Systems, Vol.E96-D, No.3, pp.562--578, March 2013.

#### Invited Talks

- M.Tsuchida, "High-resolution and Multiband Image-capturing System.",International Meeting on Information Display, Aug.2013
- M.Tsuchida, S.Sakai, K.Ito, R.Mukai, K.Kashino, J.Yamato, T.Aoki, "A stereo six-band motion picture capturing using 4K digital cinema camera.", SIGGRAPH DCAJ Session, Jul.2013
- A. Kimura "Social curation as corpora for large-scale multimedia content analysis," ACM International Conference on Multimedia Retrieval (ICMR2013), April 2013.

#### Peer-reviewed Conference Papers

- K.Takeuchi, R.Tomioka, K.Ishiguro, A.Kimura, H.Sawada "Non-negative multiple tensor factorization," Proc. IEEE International Conference on Data Mining (ICDM2013), pp.1199-1204, December 2013.
- M.Murata, H.Nagano, K.Kashino, S.Sato,"NTT Communication Science Laboratories and National Institute of Informatics at TRECVID 2013 Instance Search Task."TRECVID Workshop 2013, Nov.2013
- M.Tsuchida, K.Kashino, J.Yamato,"An eleven-band stereoscopic camera system for accurate color and spectral reproduction.",Color and Imaging Conference, Nov.2013
- A.Kimura, K.Ishiguro, A.Marcos Alvarez, K.Kataoka, K.Murasaki, M.Yamada "Image context discovery from socially curated contents" Proc. ACM International Conference on Multimedia (ACMMM2013), pp.565-568, October 2013.
- A.M.Alvarez, M.Yamada, A.Kimura,"Exploiting socially-generated side information to improve dimensionality reduction.", Int. Work. Socially-Aware Multimedia, Oct.2013
- A.M.Alvarez, M.Yamada, A.Kimura, T.Iwata,"Clustering-Based Anomaly Detection in Multi-View Data.", CIKM2013, Oct.2013
- M.Tsuchida, A.Takayanagi, W.Wakita, K.Kashino, J.Yamato, H.Tanaka,"Digital Archiving of Tapestries of Kyoto Gion Festival using a High-definition and Multispectral Image Capturing System."The International Conference on Culture and Computing. Sep.2013
- K.Takeuchi, K.Ishiguro, A.Kimura, H.Sawada "Non-negative multiple matrix factorization," Proc. International Joint Conference on Artificial Intelligence (IJCAI2013), pp.1713-1720, August 2013.
- M.Yamada, A.Kimura, F.Naya, H.Sawada, "Change-point detection with feature selection in high-dimensional time-series data," Proc. International Joint Conference on Artificial Intelligence (IJCAI2013), pp.1827-1833, August 2013.
- H.Kameoka, K.Yoshizato, T.Ishihara, Y.Ohishi, K.Kashino, S.Sagayama,"Generative modeling of speech F0 contours.", 2013Interspeech, Aug.2013
- T.Ishihara, H.Kameoka, K.Yoshizato, D.Saito, S.Sagayama,"Probabilistic speech F0 contour model incorporating statistical vocabulary model of phrase-accent command sequence.", 2013Interspeech, Aug.2013
- N.Hojo, K.Yoshizato, H.Kameoka, D.Saito, S.Sagayama, "Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models.", 8th ISCA Speech Synthesis Workshop, Aug.2013
- M.Tsuchida, T.Kawanishi, R.Mukai, K.Kashino, J.Yamato,"Extension of Dynamic Range of Camera System based on Multi-band image Capturing.",12th International AIC Congress, Jul.2013
- M.Tsuchida, A.Takayanagi, Y.Sakaguchi, R.Mukai, K.Kashino, J.Yamato, H.Tanaka, "Estimation of Spectral Reflectance from Six-band Images based on Partial Least-squares Regression." 12th International AIC Congress, Jul.2013
- K.O'Hanlon, H.Nagano, M.D.Plumbley,"Using Oracle Analysis for Decomposition-Based Automatic Music Transcription.", LNCS (CMMR 2012, Revised Selected Papers), Jun 2013.
- M.Murata, K.Kashino,"Normalized Unscented Kalman Filter and Normalized Unscented RTS Smoother for Nonlinear State-Space Model Identification.", 2013 American Control Conference, Jun.2013
- T.Higuchi, N.Takamune, T.Nakamura, H.Kameoka,"Underdetermined blind separation and tracking of moving sources based on DOA-HMM.", ICASSP2013, May.2013
- Y.Ohishi, D.Mochihashi, T.Matsui, M.Nakano, H.Kameoka, T.Izumitani, K.Kashino,"Bayesian Semi-supervised Audio Event Diarization Based on Markov Indian Buffet Process.", ICASSP2013, May.2013
- K.O'Hanlon, H.Nagano, M.D.Plumbley, "Structured Sparsity for Automatic Music Transcription.", ICASSP2012, Mar.2013

### 2012

#### Journal Papers

- T.Endrjukaite, N.Kosugi,"Music Visualization Technique of Repetitive Structure Representation to Support Intuitive Estimation of Music Affinity and Lightness.", Journal of Mobile Multimedia, 8(1):49-71 (2012), Apl.2012

#### Invited Talks

- M.Tsuchida, T.Kawanishi, K.Kashino, J.Yamato, "A stereo nine-band camera for accurate color and spectrum reproduction," SIGGRAPH DCAJ Session, Aug.2012

#### Peer-reviewed Conference Papers

- M.Murata, H.Nagano, K.Kashino,"Robustifying Kalman Filter to Rapidly Adapt Significant Chages in System Model Parameters of State-Space Models.", The52th IEEE Conference on Decision and Control, Dec.2012
- T.Endrjukaite, N.Kosugi,"Time-dependent genre recognition by means of Instantaneous Frequency Spectrum based on Hilbert-Huang Transform.", The 14th iiWAS, Dec.2012
- M.Murata, T.Izumitani, H.Nagano, R.Mukai, K.Kashino, S.Sato,"NTT Communication Science Laboratories and National Institute of Informatics at TRECVID 2012 Instance Search and Multimedia Event Detection Tasks.", TRECVID2012, Nov.2012
- M.Tsuchida, S.Sakai, M.Miura, K.Ito, T.Kawanishi, K.Kashino, J.Yamato, T.Aoki, "A six-band stereoscopic video camera system for accurate color reproduction.", Color and Imaging Conference, Nov.2012
- W.Wakita, M.Tsuchida, S.Tanaka, T.Kawanishi, K.Kashino, J.Yamato, H.Tanaka, "High-resolution and Multi-spectral Capturing for Digital Archiving of Large 3D Woven Cultural Artifacts.", The 2nd ACCV Workshop on e-Heritage 2012, Nov.2012
- H.Kameoka, K.Ochiai, M.Nakano, M.Tsuchiya, S.Sagayama,"CONTEXT-FREE 2D TREE STRUCTURE MODEL OF MUSICAL NOTES FOR BAYESIAN MODELING OF POLYPHONIC SPECTROGRAMS.", The 13th International Society for Music Information Retrieval Conference (ISMIR), Oct.2012
- K.Yoshizato, H.Kameoka, D.Saito, S.Sagayama,"Hidden Markov convolutive mixture model for pitch contour analysis of speech.", 13th Annual Conference of the International Speech Communication Association(Interspeech2012), Sep.2012
- H.Kameoka, M.Sato, T.Ono. N.Ono, S.Sagayama,"BLIND SEPARATION OF INFINITELY MANY SPARSE SOURCES.",International Workshop on Acoustic Signal Enhancement 2012(IWAENC), Sep.2012
- Y.Ohishi, H.Kameoka, D.Mochihashi, K.Kashino,"A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components.",Interspeech2012, Sep.2012
- M.Mori, S.Uchida, H.Sakano,"Dynamic Programming Matching with Global Features for Online Character Recognition.", International Conference on Frontiers in Handwriting Recognition(ICFHR2012), Sep.2012
- N.Kosugi, M.Kondo,"Community Site for Music Therapists Based on the Records.", The 15th International Conference on Network-Based, Sep.2012
- W.Wakita, M.Tsuchida, S.Tanaka, T.Kawanishi, K.Kashino, J.Yamato, H.Tanaka, ""High-definition and Multispectral Capturing for Digital Archiving of Large 3D Woven Cultural Artifacts."",SIGGRAPH '12 ACM SIGGRAPH 2012 Posters, Aug.2012
- T.Murayama, D.Peter,"Rate Distortion Codes for the Collective Estimation from Indepenent Noisy Observations.", ISIT2012, Jul.2012
- D.Mikami, S.Kimura, K.Kadota, M.Kashino, K.Kashino,"INTER-TRIAL DIFFERENCE ANALYSIS THROUGH APPEARANCE-BASED MOTION TRACKING.", International society of biomechanics in sports, Jul.2012
- K.O'Hanlon, H.Nagano, M.D.Plumbley, "Structured Sparsity for Automatic Music Transcription.", ICASSP2012, Mar 2012.
- K.O'Hanlon, M.D.Plumbley, H.Nagano,"Group Non-negative Basis Pursuit for Automatic Music Transcription.", MML2012, Jun.2012
- K.Yoshizato, H.Kameoka, D.Saito, S.Sagayama,"Statistical approach to Fujisaki-model parameter estimation from speech signals and its quantitative evaluation.", in Proc. Speech Prosody 2012, May.2012
- M.Nakano, Y.Ohishi, H.Kameoka, R.Mukai, K.Kashino,"Bayesian nonparametric music parser.", ICASSP2012, Mar.2012
- K.Ochiai, H.Kameoka, S.Sagayama,"EXPLICIT BEAT STRUCTURE MODELING FOR NON-NEGATIVE MATRIX FACTORIZATION-BASED MULTIPITCH ANALYSIS.", ICASSP2012 Mar.2012
- H.Tachibana, H.Kameoka, N.Ono, S.Sagayama,"COMPARATIVE EVALUATIONS OF VARIOUS HARMONIC/PERCUSSIVE SOUND SEPARATION ALGORITHMS BASED ON ANISOTROPIC CONTINUITY OF SPECTROGRAM.", ICASSP2012, Mar.2012
- H.Kameoka, H.Nakano, K.Ochiai, Y.Imoto, K.Kahino, S.Sagayama,"Constrained and Regularized Variants of Non-negative Matrix Factorization Incorporating Music-Specific Constraints.", ICASSP2012, Mar.2012
- M.Mori, S.Uchida, H.Sakano,"How Is the Importance of Global Structurefor Characters ?",DAS2012, Mar.2012
- D.Mikami, K.Otsuka, S.Kumano, J.Yamato,"Enhancing Memory-based Particle Filter with Detection-based Memory Acquisition for Robustness under Severe Occlusion.", VISAPP2012, Feb.2012

#### Other Conference Papers

- K.Kashino,"Detection and the Use of Similarity and Dissimilarity", DMASM 2012, Feb.2012

### 2011

#### Journal Papers

- D.Mikami, K.Otsuka, S.Kumano, J.Yamato,"Enhancing Memory-based Particle Filter with Detection-based Memory Acquisition for Robustness under Severe Occlusion.", Trans. IEICE. Jpn., Vol.E95-D No.11 pp.2693-2703, Nov.2011
- R.Mukai, T.Kurozumi, T.Kawanishi, H.Nagano, K.Kashino,"Robust Media Search Technology for Content-Based Audio and Video Identification."IEEE COMSOC MMTC E-Letter, Vol.6, No.1, Jan.2011

#### Book Chapter, Tutorial Papers

- M.Mori,"Recent Advances in Document Recognition and Understanding."Recent Advances in Document Recognition and Unders, InTech, Oct.2011
- K.Otsuka,"Conversation Scene Analysis.", IEEE Signal Processing Magazine, 28(4):127-131, 2011

#### Peer-reviewed Conference Papers

- A.Gumulia, B.Puzon, N.Kosugi,"Music Visualization: predicting the perceived speed of a composition.", ACM Multimedia, Dec.2011
- B.Puzon, N.Kosugi,"Extracting and Visualizing the Repetitive Structure of Music in Acoustic Data -- Misual Project --., iiWAS2011, Dec.2011
- T.Kawanishi, K.Kashino, Y.Q.Sun, S.Sato, D.D.Le, C.Zhu,"NTT Communication Science Laboratories and NII at TRECVID 2011 Instance Search Task.", TRECVID2011, Dec.2011
- R.Mukai, T.Kurozumi, T.Kawanishi, H.Nagano, K.Kashino,"NTT Communication Science Laboratories at TRECVID 2011 Content-Based Copy Detection."TRECVID2011, Dec.2011
- M.Tsuchida, S.Sakai, K.Ito, T.Kawanishi, K.Kashino, J.Yamato, T.Aoki,"Evaluating Color Reproduction Accuracy of Stereo One-shot Six-band Camera System.", Color and Imaging Conference, Nov.2011
- K.Takeda, Hi.Kameoka, H.Sawada, S.Araki, T.Yamada, S.Makino,"UNDERDETERMINED BSS WITH MULTICHANNEL COMPLEX NMF ASSUMING W-DISJOINT ORTHOGONALITY OF SOURCES.", TENCON2011, Nov.2011
- M.Nakano, J.Le Roux, H.Kameoka, T.Nakamura, N.Ono, S.Sagayama."Bayesian Nonparametric Spectrogram Modeling Based on Infinite Factorial Infinite Hidden Markov Model.", WASPAA2011, Oct.2011
- K.Kashino,"Content Identifies Itself - Production/Use Management for Moving Pictures with Robust Media Search Technology -.",CineGrid@TIFF, Oct.2011
- K.Kashino,"Large-scale audio and video analysis and identification.", MLSP2011. Sep.2011
- K.Otsuka,"Multimodal Conversation Scene Analysis for Understanding People’s Communicative Behaviors in Face-to-face Meetings.", HCI International 2011, Jul.2011
- M.Nakano, J.L.Roux, H.Kameoka, N.Ono, S.Sagayama,,"INFINITE-STATE SPECTRUM MODEL FOR MUSIC SIGNAL ANALYSIS."ICASSP,May 2011
- J.Takagi, Y.Ohishi, A.Kimura, M.Sugiyama, M.Yamada, H.Kameoka,"AUTOMATIC AUDIO TAG CLASSIFICATION VIA SEMI-SUPERVISED CANONICAL DENSITY ESTIMATION",ICASSP, May.2011
- N.Yasuraoka, H.Kameoka, T.Yoshioka, H.Okuno,"I-DIVERGENCE-BASED DEREVERBERATION METHOD WITH AUXILIARY FUNCTION APPROACH.", ICASSP2011, May.2011
- T.Nakano, A.Kimura, H.Kameoka, S.Miyabe, S.Sagayama, N.Ono, K.Kashino, T.Nishimoto,"AUTOMATIC VIDEO ANNOTATION VIA HIERARCHICAL TOPIC TRAJECTORY MODEL CONSIDERING CROSS-MODAL CORRELATIONS.",ICASSP2011, May.2011
- S.Kumano, K.Otsuka, D.Mikami, J.Yamato,"Analyzing Empathetic Interactions based on the Probabilistic Modeling of the Co-occurrence Patterns of Facial Expressions in Group Meetings.",FG (Automatic Face and Gesture Recognition) ,Mar.2011
- K.Hiramatsu, R.Mukai, K.Kashino,"NTT's new Media Retrieval System."2011 HPA Tech Retreat, Feb.2011
- R.Mukai, K.Hiramatsu,"Demonstration of RMS Technology.", DMASM 2011, Feb.2011
- K.Kashino,"Research and Development of Robust Media Search Technology.", DMASM2011 , Feb.2011

### 2010

#### Journal Papers

- K.Akamine K.Fukuchi A.Kimura S.Takagi,"Fully automatic extraction of salient objects from videos in near real-time."The Computer Journal.
- U.Watchareeruetai, A.Kimura, C.Bao, T.Kawanishi, K,Kashino,"Interest point detection based on stochastically derived stability."､ISPJ Transaction

#### Book Chapter, Tutorial Papers

- A.Kimura, H.Kameoka, K.Kashino,"Media Scene Learning: A novel framework for automatically extracting meaningful parts from audio and video signals.",NTT Technical Review,Vol. 8 No. 11 Nov. 2010
- M.Tsuchida, T.Kawanishi, J.Yamato,"High-resolution multiband imaging for accurate color reproduction.",NTT-Technical Review,Vol. 8 No. 11 Nov. 2010
- M.Mori,"Character Recognition", Sciyo『Character Recognition 』 Sep.2010

#### Peer-reviewed Conference Papers

- M.Mori, K.Kashino,"Fast Template Matching Based on Normalized Cross Correlation Using Adaptive Block Partitioning and Initial Threshold Estimation.", ISM (International Symposium on Multimedia), Dec.2010
- T.Maekawa, A.Kimura, H.Sakano "Wearable sensor device for automatic recording of hand drawings," Asian Confernece on Computer Vision (ACCV2010), November 2010.
- S.Gorga, K.Otsuka,"Conversation Scene Analysis based on Dynamic Bayesian Network and Image-based Gaze Detection.", ICMI-MLMI201, Nov.2010
- N.Kosugi,"Misual music visualization based on acoustic data.", iiWAS 2010, Nov.2010
- T.Kawanishi, A.Kimura, K.Kashino, S.Sato, D.L.Duy, X.Wu, S.Poullot,"NTT Communication Science Laboratories and NII at TRECVID 2010 Instance Search Task.",TRECVID 2010, Nov.2010
- R.Mukai, T.Kurozumi, K.Hiramatsu, T.Kawanishi, H.Nagano, K.Kashino,"NTT Communication Science Laboratories at TRECVID 2010 Content-Based Copy Detection.", TRECVID2010, Nov.2010
- T.Nakano, A.Kimura, H.Kameoka, S.Miyabe, S.Sagayama, N.Ono, K.Kashino, T.Nishimoto,"NTT-UT TRECVID2010 Semantic Indexing and Known-Item Search.",TRECVID Workshop, Nov.2010
- H.Kameoka, T.Yoshioka, M.Hamamura, J.Le Roux, K.Kashino,"Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation.",LVA/ICA 2010, Sep.2010
- K.Kashino,"Robust Media Search in the Cloud.", 13th German-Japanese Symposium(GJS2010), Sep.2010
- J.Le Roux, H.Kameoka, N.Ono, S.Sagayama,"Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency."Internat. Conf. on Digital Audio Effects (DAFx),Sep 2010
- J.Le Roux, E.Vincent, Y.Mizuno, H.Kameoka, N.Ono, S.Sagayama,"Consistent Wiener filtering: generalized time-frequency masking respecting spectrogram consistency.",LVA/ICA, Sep.2010
- M.Nakano, J.L.Roux, H.Kameoka, Y.Kitano, N.Ono, S.Sagayama,"Nonnegative Matrix Factorization with Markov-chained Bases for Modeling Time-varying patterns in Music Spectrograms."LVA/ICA, Sep.2010
- Y.Ohishi, H.Kameoka, D.Mochihashi, H.Nagano, K.Kashino,"Statistical Modeling of F0 Dynamics in Singing Voices Based on Second-order Linear System."Interspeech 2010, Sep.2010
- H.Kameoka, J.Le Roux, Ohishi,"A Statistical Model of Speech F0 Contours.", 2010SAPA, Sep.2010
- A.Kimura, H.Kameoka, M.Sugiyama, T.Nakano, E.Maeda, H.Sakano, K.Ishiguro "SemiCCA: Efficient semi-supervised learning of canonical correlations," Proc. IAPR International Conference on Pattern Recognition (ICPR 2010), pp. 2933-2936, August 2010.
- M.Nakano, H.Kameoka, J.Le Roux, Y.Kitano, N.Ono, S.Sagayama,"CONVERGENCE-GUARANTEED MULTIPLICATIVE ALGORITHMS FOR NONNEGATIVE MATRIX FACTORIZATION WITH BETA-DIVERGENCE.", MLSP2010, Aug.2010
- M.Tsuchida, T.Kawanishi, K.Ito, J.Yamato, T.Aoki,"Development of stereo-type one-shot multi-band camera system for accurate color reproduction.", ACM SIGGRAPH, Jul.2010
- S.Kuzuoka, A.Kimura, T.Uyematsu "Universal source coding for multiple decoders with side information," Proc. of International Symposium on Information Theory (ISIT2010), pp. 1-5, June 2010.
- M.Tsuchida, T.Kawanishi, J.Yamato, "Capturing and browsing Technology for Multiband Large Pixel-Number Pictures --- Giga-Pixels Accurate Colour Image Capturing System and Interactive Image Viewing Software ---", asiagraph2010, Jun.2010
- H.Tanaka, K.Yano, K.Hachimura, T.Nishiura, W.Choi, T.Fukumori, K.Furukawa, W.Wakita, M.Tsuchida, N.Saiwaki,"Digital Archiving of the World Intangible Cultural heritage "Gion Festival in Kyoto": Reproduction of "Fune-boko" Float of the Gion Festival Parade in"Virtual Kyoto” ---", asiagraph2010, Jun.2010

### 2009

#### Journal Papers

- S. Kumano, K. Otsuka, J. Yamato, E. Maeda, and Y. Sato, “Pose-Invariant Facial Expression Recognition Using Variable-Intensity Templates”, International Journal of Computer Vision, Vol. 83, Issue 2, pp. 178--194, Jun. 2009.
- A. Kimura, T. Uyematsu, S. Kuzuoka and S. Watanabe, "Universal source coding over generalized complementary delivery networks," IEEE Transactions on Information Theory, Vol.55, No.3, pp. 1360--1373, Mar. 2009.

#### Book Chapter, Tutorial Papers

- K. Otsuka and S. Araki, “Audio-Visual Technology for Conversation Scene Analysis”, NTT Technical Review, Vol. 7, No. 2, Feb. 2009

#### Peer-reviewed Conference Papers

- Kumano, Otsuka, Mikami, Yamato, "Recognizing Communicative Facial Expressions for Discovering Interpersonal Emotions in Group Meetings " ICMI, Nov 2009
- Miyazato, Kimura, Takagi, Yamato, "Real-time estimation of human visual attention with MCMC-based paritcle filter " ICME, Jul 2009
- Fukuchi, Miyazato, Kimura, Takagi, Yamato, "Saliency-based video segmentation with graph cuts and sequentially update priors " ICME, Jul 2009
- Kimura, Kashino, Fukuchi, Miyazato, Akamine, Takagi, "Towards cognitive developmental approach to visual scene understanding: Framework and core technologies " IEEE International Workshop on Computer Vision for Humanoid Robots, Sep 2009
- Mikami, Otsuka, Yamato, "Memory-based particle filter for face pose tracking under complex dynamics " CVPR, Jun 2009
- Kameoka, Nakatani, Yoshioka, "Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms" ICASSP, Apr 2009
- Kameoka, Ono, Kashino, Sagayama, "Complex NMF: A new sparse representation for acoustic signals" ICASSP, Apr 2009
- Kameoka, Kashino, "Composite autoregressive system for sparse source-filter representation of speech" ISCAS, May 2009
- T. Yoshioka, H. Kameoka, T. Nakatani, and H. G. Okuno, "Statistical Models for Speech Dereverberation,”to appear in Proc. 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2009), Oct. 2009.
- T. Kako, Y. Ohishi, H. Kameoka, K. Kashino and K. Takeda , "Automatic Identification for Singing Style Based on Sung Melodic Contour Characterized in Phase Plane," to appear in Proc. International Conference on Music Information Retrieval, (ISMIR 2009)
- S. Kumano, K. Otsuka, D. Mikami and J. Yamato, "Recognizing Communicative Facial Expressions for Discovering Interpersonal Emotions in Group Meetings", to appear in Proc. International Conference on Multimodal Interfaces (ICMI), Sep. 2009.
- K. Ishizuka, S. Araki, K. Otsuka, T. Nakatani, M. Fujimoto, “A Speaker Diarization based on the Probabilistic Fusion of Audio-Visual Location Information”, to appear in Proc. ACM ICMI-MLMI2009
- Y. Minami, H. Kameoka, "Switching Acausal Filters for Speech Modeling," to appear in Proc. 2009 IEEE International Workshop on Machine Learning for Signal Processing (Formerly the IEEE Workshop on Neural Networks for Signal Processing)
- K. Fukuchi, K. Miyazato, A. Kimura, S. Takagi and J. Yamato "Saliency-based video segmentation with graph cuts and sequentially-updated priors," in Proc. International Conference on Multimedia and Expo (ICME2009), pp. 638--641, New York, New York, USA, Jun.-Jul. 2009.
- K. Miyazato, A. Kimura, S. Takagi and J. Yamato "Real-time estimation of human visual attention with MCMC-based particle filter," in Proc. International Conference on Multimedia and Expo (ICME2009), pp. 250--257, New York, New York, USA, Jun.-Jul. 2009.
- D. Mikami, K. Otsuka, and J. Yamato, “Memory-based particle filter for face pose tracking robust under complex dynamics”, in Proc. IEEE Conference on Computer Vision and Pattern Recognition 2009 (CVPR2009), Oral Presentation (acceptance rate=4.1%)
- H. Kameoka, K. Kashino , "Composite Autoregressive System for Sparse Source-Filter Representation of Speech," in Proc. 2009 IEEE International Symposium on Circuits and Systems (ISCAS2009), pp. 2477--2480, 2009.
- H. Kameoka, N. Ono, K. Kashino , S. Sagayama, "Complex NMF: A New Sparse Representation for Acoustic Signals," in Proc. 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2009), pp. 3437--3440, 2009.
- H. Kameoka, T. Nakatani, T. Yoshioka, "Robust Speech Dereverberation Based on Non-negativity and Sparse Nature of Speech Spectrograms," in Proc. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2009), pp. 45--48, 2009.

#### Other Conference Papers

- J. Le Roux, H. Kameoka, E. Vincent, N. Ono, K. Kashino and S. Sagayama, "Complex NMF under spectrogram consistency constraints," in Proc. ASJ Autumn Meeting, 2-4-5, Sep. 2009.
- S. Kuzuoka, A. Kimura and T. Uyematsu, "Universal source coding for multiple decoders with side information," to appear in Proc. Shannon Theory Workshop (STW2009, domestic), Matsuyama, Ehime, Japan, Sep. 2009.
- J. Le Roux, H. Kameoka, N. Ono and S. Sagayama, "Spectrogram consistency and its application to phase reconstruction," in Proc. IPSJ SIGMUS Summer Workshop, 2009-MUS-81-8, Jul. 2009.

### 2008

#### Journal Papers

- O. Lozano and K. Otsuka, “Real-time visual tracker by Stream processing ---Simultaneous and fast 3D tracking of multiple faces in video sequences by using a particle filter ---,” Journal of VLSI Signal Processing Systems (Freely downloadable from http://www.springerlink.com/content/pk22n1632859082k/ )
- S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto, S. Sagayama, "Specmurt Analysis of Polyphonic Music Signals," IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, No. 3, pp. 639--650, 2008.
- A. Kimura, K. Kashino , T. Kurozumi and H. Murase, "A quick search method for audio signals based on piecewise linear representation of feature trajectories," IEEE Transactions on Audio, Speech and Language Processing, Vol.16, No.2, pp. 396--407, Feb. 2008.

#### Peer-reviewed Conference Papers

- Kimura, Pang, Takeuchi, Yamato, Kashino, "Dynamic Markov random fields for stochastic modeling of visual attention " ICPR, Dec 2008
- A. Kimura, D. Pang, T. Takeuchi, J. Yamato and K. Kashino "Dynamic Markov random fields for stochastic modeling of visual attention," in Proc. International Conference on Pattern Recognition (ICPR2008), Mo.BT8.35, Tampa, Florida, USA, Dec. 2008.
- M. Mori, M. Sawaki, J. Yamato,“Robust character recognition using adaptive feature extraction,” 23th International Conference Image and Vision Computing New Zealand, Christchurch, NZ, Nov. 2008.
- K. Otsuka, S. Araki, K. Ishizuka, M. Fujimoto, M. Heinrich, and J. Yamato, "A Realtime Multimodal System for Analyzing Group Meetings by Combining Face Pose Tracking and Speaker Diarization", in Proc. ACM 10th Int. Conf. Multimodal Interfaces (ICMI2008) , pp. 257--264 , 2008
- J. Le Roux, H. Kameoka, N. Ono, A. de Cheveigne, S. Sagayama, "Computational Auditory Induction by Missing-Data Non-Negative Matrix Factorization," in Proc. SAPA 2008 Workshop on Statistical and Perceptual Audition (SAPA 2008), in CD-ROM, Sep. 2008.
- Y. Ohishi, H. Kameoka, K. Kashino , K. Takeda , "Parameter Estimation Method of F0 Control Model for Singing Voices," in Proc. Interspeech2008 International Conference on Spoken Language Processing (ICSLP2008), pp. 139--142, Sep. 2008.
- N. Ono, K. Miyamoto, H. Kameoka, S. Sagayama, "A Real-time Equalizer of Harmonic and Percussive Components in Music Signals," in Proc. Ninth International Conference on Music Information Retrieval (ISMIR2008), pp. 139--144, Sep. 2008.
- T. Izumitani and K. Kashino, "A Robust Musical Audio Search Method Based on Diagonal Dynamic Programming Matching of Self-Similarity Matrices," Ninth International Conference on Music Information Retrieval (ISMIR2008), pp. 609--613, Sep. 2008.
- K. Otsuka and J. Yamato, “Fast and Robust Face Tracking for Analyzing Multiparty Face-to-Face Meetings”, 5th Joint Workshop on Machine Learning and Multimodal Interaction (MLMI2008) , Lecture Notes in Computer Science, Vol. 5237, pp. 14--25, 2008.
- S. Kumano , K. Otsuka, J. Yamato, E. Maeda and Y. Sato, "Combining Stochastic and Deterministic Search for Pose-Invariant Facial Expression Recognition", in Proc. British Machine Vision Conference (BMVC), 2008.
- N. Ono, K. Miyamoto, J. Le Roux, H. Kameoka, S. Sagayama, "Separation of a Monaural Audio Signal into Harmonic/Percussive Components by Complementary Diffusion on Spectrogram," in Proc. 2008 16th European Signal Processing Conference (EUSIPCO 2008), in CD-ROM, Aug. 2008.
- S. Kuzuoka, A. Kimura and T. Uyematsu "Universal coding for lossy complementary delivery problem," in Proc. International Symposium on Information Theory (ISIT2008), pp. 2177--2181, Toronto, Canada, Jul. 2008.
- D. Pang, A. Kimura, T. Takeuchi, J. Yamato and K. Kashino "A stochastic model of selective visual attention with a dynamic Bayesian network," in Proc. International Conference on Multimedia and Expo (ICME2008), pp. 1073--1076, Hannover, Germany, Jun. 2008.
- T. Izumitani, R. Mukai, and K. Kashino, "A Background Music Detection Method Based on Robust Feature Extraction," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 13--16, Apr. 2008.
- H. Kameoka, N. Ono, S. Sagayama, "Auxiliary Function Approach to Parameter Estimation of Constrained Sinusoidal Model for Monaural Speech Separation," in Proc. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2008), pp. 29--32, Mar. 2008.
- J. Le Roux, H. Kameoka, N. Ono, S. Sagayama, A. de Cheveigne, "Modulation Analysis of Speech through Orthogonal FIR Filterbank Optimization," in Proc. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2008), pp. 4189--4192, Mar. 2008.
- K. Miyamoto, H. Kameoka, T. Nishimoto, N. Ono, S. Sagayama, "Harmonic-Temporal-Timbral Clustering (HTTC) for the Analysis of Multi-instrument Polyphonic Music Signals," in Proc. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2008), pp. 113--116, Mar. 2008.
- O. Lozano and K. Otsuka, “Simultaneous and Fast 3D Tracking of Multiple Faces in Video by GPU-based Stream Processing”, in Proc. IEEE ICASSP2008(The 33rd International Conference on Acoustics, Speech, and Signal Processing), pp. 713--716, 2008 http://www.springerlink.com/content/pk22n1632859082k/

#### Other Conference Papers

- A. Kimura, D. Pang, T. Takeuchi, J. Yamato and K. Kashino , "Dynamic Markov random fields for stochatic modeling of visual attention," IEICE Technical Report (domestic), PRMU2008-117 (MVE2008-66), Toyonaka, Osaka, Japan, Nov. 2008.
- A. Kimura, "Particle-based simulation of the Gel'fand-Pinsker channel capacity and the Wyner-Ziv rate-distortion function," in Proc. Symposium on Information Theory and its Applications (SITA2008, domestic), pp. 200--203, Kinugawa, Tochigi, Japan, Oct. 2008.
- D. Pang, A. Kimura, T. Takeuchi, J. Yamato and K. Kashino "A stochastic model of selective visual attention with a dynamic Bayesian network," in Proc. Meeting on Image Recognition and Understanding (MIRU2008, domestic), pp. 1500--1505, Karuizawa, Nagano, Japan, Jul. 2008. (Selected as Best Interactive Session Award )

### 2007

#### Journal Papers

- A. Kimura, T. Uyematsu and S. Kuzuoka, "Universal coding for correlated sources with complementary delivery," IEICE Transactions on Fundamentals, Vol.E90-A, No.9, pp. 1840--1847, Sep. 2007. Pulished online in IEICE Transaction Online.
- J. Le Roux, H. Kameoka, N. Ono, A. de Cheveigne and S. Sagayama, "Single and Multiple Pitch Contour Estimation through Parametric Spectrogram Modeling of Speech in Noisy Environments," IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 4, pp. 1135--1145, May. 2007.
- H. Kameoka, T. Nishimoto, S. Sagayama, "A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering," IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 3, pp. 982--994, Mar. 2007.

#### Peer-reviewed Conference Papers

- K. Otsuka, H. Sawada, and J. Yamato, "Automatic Inference of Cross-modal Nonverbal Interactions in Multiparty Conversations", in Proc. ACM 9th Int. Conf. Multimodal Interfaces (ICMI2007), pp. 255--262, Nov. 2007.(Outstanding Paper Award)
- S. Kumano, K. Otsuka, J. Yamato, E. Maeda, and Y. Sato, "Pose-Invariant Facial Expression Recognition Using Variable-Intensity Templates," in Proc. 8th Asian Conference on Computer Vision (ACCV2007), Part I, LNCS Vol. 4843, pp. 324--334, 2007(Honorable Mention)
- J. Le Roux, H. Kameoka, N. Ono, A. de Cheveigne, S. Sagayama, "Single Channel Speech and Background Segregation through Harmonic-Temporal Clustering," in Proc. 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2007), pp. 279--282, Oct. 2007.
- T. Izumitani and K. Kashino, "A Musical Audio Search Method Based on Self-Similarity Features", International Conference on Multimedia & Expo (ICME2007), Jul. 2007.
- C. Leung, A. Kimura, T. Takeuchi and K. Kashino "A computational model of saliency depletion/recovery phenomena for the salient region extraction of videos," in Proc. International Conference on Multimedia and Expo (ICME2007), pp. 300--303, Beijing, China, Jul. 2007.
- A. Kimura, T. Uyematsu and S. Kuzuoka, "Universal coding for correlated sources with complementary delivery," in Proc. International Symposium on Information Theory (ISIT2007), pp. 1756--1760, Nice, France, Jun. 2007.
- K. Kashino , A. Kimura, H. Nagano, and T. Kurozumi : "Robust Search Methods for Music Signals Based on Simple Representation", in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol.IV, pp. 1421--1424 (Apr. 2007).
- K. Miyamoto, H. Kameoka, H. Takeda, T. Nishimoto, S. Sagayama, "Probabilistic Approach to Automatic Music Transcription from Audio Signals," in Proc. 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2007), Vol. 2, pp. 697--700, Apr. 2007.

#### Other Conference Papers

- A. Kimura, T. Uyematsu and S. Kuzuoka, "Universal coding for correlated sources over generalized complementary delivery networks," in Proc. Symposium on Information Theory and its Applications (SITA2007, domestic).
- S. Kuzuoka, A. Kimura and T. Uyematsu, "Simple coding schemes for lossless and lossy complementary delivery problems," in Proc. Shannon Theory Workshop (STW2007, domestic), pp. 43--50, Izu, Shizuoka, Japan, Sep. 2007.
- C. Leung, A. Kimura, T. Takeuchi and K. Kashino "A computational model of saliency depletion/recovery phenomena for the salient region extraction of videos," in Proc. Meeting on Image Recognition and Understanding (MIRU2007, domestic), pp. 582--587, Hiroshima, Japan, Jul. 2007.

### 2006

#### Journal Papers

- Y. Takemae, K. Otsuka, J. Yamato, and S. Ozawa, "The Subjective Evaluation Experiments of an Automatic Video Editing System Using Vision-based Head Tracking for Multiparty Conversations," IEEJ Trans. Electronics, Information and Systems, Vol. 126-C, No. 4, pp. 435--442, Apr. 2006.

#### Book Chapter, Tutorial Papers

- K. Kashino : "Auditory Scene Analysis in Music Signals", A. Klapuri and M. Davy (eds.): Signal Processing Methods for Music Transcription, pp. 299--325 (May. 2006).

#### Peer-reviewed Conference Papers

- A. Kimura and T. Uyematsu, "Multiterminal source coding with complementary delivery," in Proc. International Symposium on Information Theory and Its Applications (ISITA2006), pp. 189--194, Seoul, South Korea, Oct. 2006.
- T. Izumitani and K. Kashino, "Frequency Component Restoration for Music Sounds Using Local Probabilistic Models with Maximum Entropy Learning", ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA 2006), pp. 12--17, Sep. 2006.
- C. Chen, T. Kurozumi , and J. Yamato, ``Poster Image Matching by Color Scheme and Layout Information'', in Proc. ICME2006, Jul. 2006.
- K. Otsuka, J. Yamato, Y. Takemae, and H. Murase, "Conversation Scene Analysis with Dynamic Bayesian Network Based on Visual Head Tracking,", in Proc. ICME'06, Jul. 2006.
- T. Izumitani and K. Kashino, "Frequency Component Restoration for Music Sounds Using a Markov Random Field and Maximum Entropy Learning", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), pp. V257--V260, May. 2006.
- K. Otsuka, J. Yamato, Y. Takemae, and H. Murase, "Quantifying Interpersonal Influence in Face-to-face Conversations based on Visual Attention Patterns," in Proc. ACM CHI Extended Abstract, pp. 1175--1180, Apr. 2006.

#### Other Conference Papers

- A. Kimura, T. Uyematsu and S. Kuzuoka, "Universal source coding for complementary delivery," in Proc. Symposium on Information Theory and its Applications (SITA2006, domestic), Vol.2, pp. 803--806, Hakodate, Hokkaido, Japan, Nov.-Dec. 2006.
- A. Kimura and T. Uyematsu, "Information-theoretical analysis of index searching: Revised," in Proc. Symposium on Information Theory and its Applications (SITA2006, domestic), Vol.1, pp. 73--76, Hakodate, Hokkaido, Japan, Nov.-Dec. 2006.
- A. Kimura and T. Uyematsu, "Multiterminal source coding for cascading and feedback refinement systems," in Proc. Shannon Theory Workshop (STW2006, domestic), pp. 25--31, Kinosaki, Hyogo, Japan, Sep. 2006.
- A. Kimura and T. Uyematsu, "Multiterminal source coding with complementary delivering," IEICE Technical Report, IT2006-8, pp. 7--12, Nara, Japan, May. 2006, Presented at 2006 Hawaii, IEICE and SITA Joint Conference on Information Theory.

### 2005

#### Journal Papers

- M. Mori, M. Sawaki, N. Hagita, “Video text recognition using category-dependent feature extraction based on feature compensation,”Systems and Computers in Japan Vol. 36, Issue 10, pp. 1--8, Sep. 2005.

#### Peer-reviewed Conference Papers

- T. Kawanishi, M. Tsuchida, S. Takagi, A. Kimura and J. Yamato "Small cylindrical display using asherical mirror for anthropomorphic agents", in Proc. International Display Workshop / Asia Display (IDW/AD'05), pp. 1755--1758, Takamatsu, Kagawa, Japan, Dec. 2005.
- K. Otsuka, Y. Takemae, J. Yamato, and H. Murase, “A Probabilistic Inference of Multiparty-Conversation Structure Based on Markov-Switching Models of Gaze Patterns, Head Directions, and Utterances,” in Proc. ACM Int. Conf Multimodal Interfaces (ICMI)'05, pp. 191--198, Oct. 2005.
- Y. Takemae, K. Otsuka, J. Yamato: Effects of Automatic Video Editing System Using Stereo-Based Head Tracking for Archiving Meetings, IEEE International Conference on Multimedia & Expo (IEEE/ICME 2005).
- K. Otsuka, Y. Takemae, J. Yamato, and H. Murase, “Probabilistic Inference of Gaze Patterns and Structure of Multiparty Conversations from Head Directions and Utterances,” in Proc. 1st. International Workshop on Conversational Informatics, pp. 7--12, 2005.
- Y. Takemae, K. Otsuka, and J. Yamato, “Development of Automatic Video Editing System Based on Stereo-Based Head Tracking for Archiving Meetings,” The Third International Conference on Active Media Technology (AMT2005), p.269, 2005.
- Y. Takemae, K. Otsuka, and J. Yamato “ Automatic Video Editing System Using Stereo-Based Head Tracking for Multiparty Conversation,” ACM Conference on Human Factors in Computing Systems (ACM/CHI2005), pp. 1817--1820, 2005.

### 2004

#### Journal Papers

- M. Mori, M. Sawaki, N. Hagita, H. Murase, N. Mukawa, “Robust feature extraction method based on run-length compensation for degraded character recognition,”Systems and Computers in Japan, Vol. 35, Issue 9, pp. 1--9, Aug. 2004.

#### Peer-reviewed Conference Papers

- Y. Takemae, K. Otsuka, N. Mukawa, “An Analysis of Speakers' Gaze Behavior for Automatic Addressee Identification in Multiparty Conversation and Its Application to Video Editing,” in Proc. IEEE International Workshop on Robot and Human Interactive Communication (IEEE/RO-MAN 2004), pp. 581--586, 2004.
- T. Kawanishi, T. Kurozumi , K. Kashino and S. Takagi, ``A Fast Template Matching Algorithm with Adaptive Skipping Using Inner-Subtemplates' Distances'', in Proc. ICPR2004, Aug. 2004.
- K. Kashino , A. Kimura, and T. Kurozumi : "A quick video search method based on local and global feature clustering", in Proc. International Conference on Pattern Recognition (ICPR) (Aug. 2004).
- K. Otsuka and N. Mukawa, “A Particle Filter for Tracking Densely Populated Objects Based on Explicit Multiview Occlusion Analysis,” in Proc. ICPR2004(International Conference on Pattern Recognition), Volume. 4, pp. 23--26, Aug. 2004.
- A. Kimura, T. Kawanishi and K. Kashino , "Acceleration of similarity-based partial image retrieval using multistage vector quantization," in Proc. International Conference on Pattern Recognition (ICPR2004), Vol.2, pp. 993--996, Cambridge, United Kingdom, Aug. 2004.
- K. Otsuka and N. Mukawa, “Multiview Occlusion Analysis for Tracking Densely Populated Objects Based on 2-D Visual Angles,” in Proc. CVPR2004(IEEE Conference on Computer Vision and Pattern Recognition), Volume 1, pp. 90--97,Jun. 2004.
- A. Kimura, T. Kawanishi and K. Kashino , "Similarity-based partial image retrieval guaranteeing same accuracy as exhaustive matching," in Proc. International Conference on Multimedia and Expo (ICME2004), Vol. 3, pp. 1895--1898, Taipei, Taiwan, Jun. 2004.
- K. Kashino and Simon Godsill: "Bayesian estimation of simultaneous musical notes based on frequency domain modelling", in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May. 2004).
- Y. Takemae, K. Otsuka, and N. Mukawa, “Impact of Video Editing Based on Participants’ Gaze in Multiparty Conversation,” CHI2004(ACM Conference on Human Factors in Computing Systems), pp. 1333--1336 , 2004.

### 2003

#### Journal Papers

- K. Kashino , T. Kurozumi , and H. Murase: "A Quick Search Method for Audio and Video Signals Based on Histogram Pruning", IEEE Transactions on Multimedia, vol.5, no.3, pp. 348--357 (Sep. 2003). (IEEE Transactions on Multimedia Paper Award)
- K. Kashino , T. Kurozumi , and H. Murase: "Learning-based Active Search Library Enables Instantaneous Information Retrieval for Broadcast Commercials and Music", NTT REVIEW, vol.15, no.2, pp. 38--41 (Mar. 2003).

#### Peer-reviewed Conference Papers

- T. Kawanishi, T. Kurozumi , S. Takagi and K. Kashino , ``Skipping Template Matching Guaranteeing Same Accuracy with Exhaustive Search'', in Proc. ICAPR2003, pp. 209--212, Dec. 2003.
- M. Mori, “Video text recognition using feature compensation as category-dependent feature extraction,” 7th International Conference on Document Analysis and Recognition, pp. 645--649, Edinburgh, Scotland, Aug. 2003.
- A. Kimura, K. Kashino , T. Kurozumi and H. Murase, "Dynamic-segmentation-based feature dimension reduction for quick audio/video searching," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP2003), Vol.3, pp. 357--360, Hong Kong, Apr. 2003 (cancelled). in Proc. International Conference on Multimedia and Expo (ICME2003), Vol.2, pp. 389--392, Baltimore, Maryland, USA, Jul. 2003.
- H. Nagano, K. Kashino , and H. Murase: "A Fast Search Algorithm for Background Music Signals Based on the Search for Numerous Small Signal Components", in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), presented at ICME2003, vol.5, pp. 796--799 (Apr. 2003).
- Y. Takemae, K. Otsuka, and N. Mukawa, "Video Cut Editing Rule Based on Participants` Gaze in Multiparty Conversation," ACM Multimedia 2003, pp. 303--306, 2003.

### 2002

#### Peer-reviewed Conference Papers

- T. Kurozumi , K. Kashino and H. Murase, ``A Robust Audio Searching Method for Cellular-Phone-Based Music Information Retrieval'', in Proc. ICPR2002, Vol. 3, pp. 991--994, Aug. 2002.
- H. Nagano, K. Kashino , and H. Murase: "Fast Music Retrieval Using Polyphonic Binary Feature Vectors", IEEE International Conference on Multimedia and Expo (ICME), vol.1, pp. 101--104 (Aug. 2002).
- M. Mori, M. Sawaki, N. Hagita, “Category-dependent Feature Extraction for recognition of degraded handwritten characters,” 16th International Conference on Pattern Recognition, vol.3, pp. 155--159, Quebec, Canada, Aug. 2002.
- A. Kimura, K. Kashino , T. Kurozumi and H. Murase, "A quick search method for multimedia signals using feature compression based on piecewise linear maps," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP2002), Vol.4, pp. 3656--3659, Orlando, Florida, USA, May. 2002.

### 2001

#### Book Chapter, Tutorial Papers

- K. Kashino , H. Murase: "A Sound Source Identification Method for Music Performances Using Auditory Stream Extraction", NTT REVIEW, vol.13, no.2, pp. 40--47 (Feb. 2001).

#### Peer-reviewed Conference Papers

- T. Kurozumi , K. Kashino and H. Murase, ``A Method for Robust and Quick Video Searching Using Probabilistic Dither-voting'', in Proc. ICIP2001, Vol. 2, pp. 653--656, Oct. 2001.
- M. Mori, M. Sawaki, N. Hagita, H. Murase, N. Mukawa, “Robust Feature Extraction Based on Run-length Compensation for Degraded Handwritten Character Recognition,” Sixth International Conference on Document Analysis and Recognition, pp. 650--654, Seattle, Washington, Sep. 2001.
- A. Kimura, K. Kashino , T. Kurozumi and H. Murase, "Very quick audio searching : Introducing global pruning to the Time-Series Active Search," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP2001), Vol.3, pp. 1429--1432, Salt Lake City, Utah, USA, May. 2001.

### 2000

#### Journal Papers

- K. Kashino , G. Smith, and H. Murase: "Quick Audio Retrieval Based on Histogram Feature Sequences", Journal of Acoustical Society of Japan (E), Vol.21, no.4, pp. 217--219 (Jul. 2000).

#### Peer-reviewed Conference Papers

- K. Kashino , T. Kurozumi , and H. Murase: "Feature Fluctuation Absorption for a Quick Audio Retrieval from Long Recordings", in Proc. International Conference for Pattern Recognition (ICPR), vol.3, pp. 102--105 (Sep. 2000).

### 1999

#### Journal Papers

- K. Kashino and H. Murase: "A Sound Source Identification System for Ensemble Music Based on Template Adaptation and Music Stream Extraction", Speech Communication, Vol.27, pp. 337--349 (Mar. 1999).

#### Invited Talks

- K. Kashino and H. Murase: "Quick Audio-Visual Search Using Time-Series Active Search", in Proc. IWHIT/SM99, pp. 9--14 (Oct. 1999).

#### Peer-reviewed Conference Papers

- K. Kashino , G. Smith, and H. Murase: "Time-Series Active Search for Quick Retrieval of Audio and Video", in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol.6, pp. 2993--2996 (Mar. 1999).

### 1998

#### Book Chapter, Tutorial Papers

- K. Kashino , K. Nakadai, T. Kinoshita, and H. Tanaka: "Application of Bayesian Probability Network to Music Scene Analysis", In "Computational Auditory Scene Analysis", Lawrence Erlbaum Associates, pp. 21--26 (May. 1998).

#### Peer-reviewed Conference Papers

- K. Kashino and H. Murase: "Music Recognition Using Note Transition Context", in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol.6, pp. 3593--3596 (May. 1998).
- G. Smith, H. Murase, and K. Kashino : "Quick Audio Retrieval Using Active Search", in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol.6, pp. 3777--3780 (May. 1998).

### 1997

#### Peer-reviewed Conference Papers

- K. Kashino and H. Murase: "A Music Stream Segregation System Based on Adaptive Multi-Agents", in Proc. International Joint Conference on Artificial Intelligence (IJCAI), vol.2, pp. 1126--1131 (Aug. 1997).
- K. Kashino and H. Murase: "Sound Source Identification for Ensemble Music Based on the Music Stream Extraction", Working Notes of IJCAI Workshop of Computational Auditory Scene Analysis (IJCAI-CASA), pp. 127--134 (Aug. 1997).
- T. Nakatani, K. Kashino , and H. G. Okuno: "Integration of Speech Stream and Music Stream Segregations Based on a Sound Ontology", Working Notes of IJCAI Workshop of Computational Auditory Scene Analysis (IJCAI-CASA), pp. 25--32 (Aug. 1997).

We are developing a technique for detecting and locating desired media information, such as sounds, images, and video fragments, from the huge amount of information available in the real world or on the Internet. We are also developing some recognition techniques for improving human communication and man-machine communication by non-verbal information.

Group Leader Akisato Kimura