Kevin Duh
Research Associate
Linguistic Intelligence Research Group
NTT Communication Science Laboratories
2-4 Hikaridai, Seika-cho, Keihanna Science City
Kyoto 619-0237, JAPAN
Email: firstname.lastname@lab.ntt.co.jp
(my old one email still works: X@cslab.kecl.ntt.co.jp [where X = kevinduh] )

Research Interests

  • Natural Language Processing (machine translation, computational linguistics for resource-poor languages)
  • Machine Learning (semi-supervised learning, structured prediction, graphical models, domain adaptation)
  • Web and Information Retrieval (web search, learning to rank, social media)

Biosketch

I received my B.S. from Rice University in 2003 and Ph.D. from the University of Washington in 2009, both in Electrical Engineering. My research interests are in natural language processing, machine learning, and information retrieval. My PhD dissertation explored "semi-supervised learning for ranking", and I was graciously supported by the U.S. National Science Foundation Fellowship during this time. I have had the opportunity to work with and learn from many good folks, including my professors at the UW SSLI Lab (Katrin Kirchoff, Jeff Bilmes, Mari Ostendorf) and my mentors at Microsoft Research (Sumit Basu, John Dunagan, Simon Corston-Oliver).

My current research focus here at NTT CS Labs is Statistical Machine Translation.

Selected Publications [Full List]

  1. Distributed Learning-to-Rank on Streaming Data using Alternating Direction Method of Multipliers (NIPS-BigLearn2011)
    • Large-scale distributed/stream training for web search
  2. Alignment Inference and Bayesian Adaptation for Machine Translation (MTsummit11)
    • A flexible way to adapt MT using Bayesian update of alignment inference results (as opposed to model parameters)
  3. Generalized Minimum Bayes Risk System Combination (ICJNLP2011)
    • A generalization of the MBR principle for improved MT system combination.
  4. Is Machine Translation Ripe for Cross-lingual Sentiment classification? (ACL2011)
    • Examines what might go wrong (from the domain adaptation perspective) when we use MT to bootstrap classifiers across languages.
  5. Flexible Sample Selection strategies for Transfer Learning in Ranking (IPM Journal, 2011)
    • A new transfer ranking algorithm based on sample selection in function space, with experiments in Yahoo LTR Challenge and Microsoft LETOR.
  6. Analysis of Translation Model Adaptation in SMT (IWSLT 2010) [slides]
    • Analysis of the influence of domain adaptation on different parts of the training pipeline, for multiple languages and datasets (EMEA, Europarl, KDE, OpenSubtitles, TED Talks).
  7. Automatic Evaluation of Translation Quality for Distant Language Pairs (EMNLP 2010) [code]
    • An evaluation metric that is sensitive to word order correlates better with human judgments for, e.g., SVO-SOV language translation.
  8. N-best Reranking by Multitask Learning (WMT 2010) [slides]
    • Discriminative training is difficult when the number of features grows rapidly as data size increases. By recasting the problem as multitask learning, we can limit overfitting.
  9. Learning to Rank with Partially-Labeled Data (SIGIR 2008) [slides]
    • Frames a semi-supervised ranking problem where some lists have no labels whatsover. Learning novel feature representations from unlabeled data outperforms supervised ranking.
  10. Beyond Log-linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking (ACL 2008)
    • A simple Boosting meta-algorithm for improving minimum error rate training in machine translation

Frequent terms from paper abstracts (by Wordle):

Activities

  • ACL International Sponsorship Comittee 2010-2011, Asia-Pacific Representative (with Haifeng Wang)
  • Co-organizer, NAACL 2009 Workshop on Semi-supervised Learning for Natural Language Processing (with Qin Wang and Dekang Lin)
  • Instructor, EE511, Statistical Learning. Co-taught in Spring 2008 with Prof. Mari Ostendorf
  • Co-Chair, ACL/COLING 2006 Student Research Workshop (with Marine Carpuat and Rebecca Hwa (faculty mentor))
  • Program Committee / Reviewer,
    • Journals: Information Retrieval Journal, Computer Speech and Language Journal, IEEE Transactions on Audio, Speech, and Language Processing, ACM Transactions on Asian Language Information Processing, IEEE Potentials Magazine, Natural Language Engineering, IEEE Trans. on Computers
    • NLP Conferences: ACL (2012, 2011, 2010,2009, 2008, 2006), NAACL (2012, 2010,2009, 2007), PACLING 2007, EMNLP (2010,2008, 2007), MT Summit 2009, LREC 2010, EACL 2012
    • ML/AI Conferences: AISTATS (2012, 2010, 2007), IJCAI 2011, AAAI (2012, 2011), ICML (2012, 2011), NIPS 2011
    • IR Conferences: SIGIR 2010

Miscellany

View Kevin Duh's profile on LinkedIn