メディア知能処理研究

研究内容概要

人工知能技術のうち、文書データや音声・音響データ、計測データ等を扱う技術の研究開発に取り組んでいます。また人工知能の透明性や公平性に関わる研究開発を推進し、社会から信頼される人工知能技術の実現をめざしています。機械学習、Deep Learning、言語処理、大規模言語モデル、パターン認識、知識処理、推論、説明可能AIなどの技術を深化させるとともに、開発した技術の実社会への適用も進めています。

技術キーワード

自然言語処理（大規模言語モデル構築、自然言語推論、論述構造解析）、音響/時系列信号ファウンデーションモデル(異常診断、音響キャプショニング)、音声認識 (音響/言語モデル適応、End-to-End、ダイアライゼーション、音声強調/分離、話者照合、Kaldi/ESPnet活用)、音響認識 (異常音検知、シーン分類、キャプション生成)、信号処理と機械学習 (スパースモデリング、信号復元、状態推定/予測のための機械学習)、対話エージェント、リスク推論、知識学習、説明可能AI、信頼できるAI（説明性・透明性・公平性・頑健性などの診断と改善）
応用例：業務特化型LLM、特許検索、テキスト情報抽出、対話解析、チャットボット、高度RPA、コンタクトセンター音声書き起こし、議事録作成、自動音声応答、音響診断、保守知識支援、面談支援など

Publication List

2021年から2024年12月現在までの英文での発表は以下のとおりです。

Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi, "Online Neural Diarization of Unlimited Numbers of Speakers," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.

Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola Garcia, "Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization," SLT 2022.

Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, Yohei Kawaguchi, "Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization," DCASE 2022.

Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi, "Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques," DCASE 2022.

Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yohei Kawaguchi, "Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models," INTERSPEECH 2022.

Harsh Purohit, Masaaki Yamamoto, Takashi Endo, Yohei Kawaguchi, "Hierarchical Conditional Variational Autoencoder Based Acoustic Anomaly Detection," EUSIPCO 2022.

Kota Dohi, Takashi Endo, Yohei Kawaguchi, "Disentangling Physical Parameters for Anomalous Sound Detection Under Domain Shifts," EUSIPCO 2022.

Tomoya Nishida, Kota Dohi, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi, "Anomalous Sound Detection Based on Machine Activity Detection," EUSIPCO 2022.

Natsuo Yamashita, Shota Horiguchi, Takeshi Homma, "Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization," Odyssey 2022.

Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola Garcia, "Encoder-Decoder Based Attractors for End-to-End Neural Diarization," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.

Y. Okamoto, S. Horiguchi, M. Yamamoto, K. Imoto, and Y. Kawaguchi, "Environmental Sound Extraction Using Onomatopoeia," in Proc. IEEE ICASSP, 2022.

S. Horiguchi, Y. Takashima, P. Garcia, S. Watanabe, and Y. Kawaguchi, "Multi-Channel End-to-End Neural Diarization with Distributed Microphones," in Proc. IEEE ICASSP, 2022.

T. Homma, Q. Sun, T. Fujioka, R. Takawaki, E. Ankyu, K. Nagamatsu, D. Sugawara, and E. T. Harada, "Emotional Speech Synthesis for Companion Robot to Imitate Professional Caregiver Speech," arXiv preprint, 2021.

Y. Kawaguchi, K. Imoto, Y. Koizumi, N. Harada, D. Niizumi, K. Dohi, R. Tanabe, H. Purohit, and T. Endo, "Description and Discussion on DCASE 2021 Challenge Task 2," in Proc. DCASE, 2021.

S. Horiguchi, S. Watanabe, P. Garcia, Y. Xue, Y. Takashima, and Y. Kawaguchi, "Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors," in Proc. IEEE ASRU, 2021.

S. Horiguchi, Y. Fujita, S. Watanabe, Y. Xue, and P. Garcia, "Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization," arXiv preprint, 2021.

R. Tanabe, H. Purohit, K. Dohi, T. Endo, Y. Nikaido, T. Nakamura, and Y. Kawaguchi, "MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions," in Proc. IEEE WASPAA, 2021.

A. Yamaguchi, G. Morio, H. Ozaki, K. Yokote, and K. Nagamatsu, "Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based Summarization," in Proc. AutoMin, 2021.

K. Ito, T. Fujioka, Q. Sun, and K. Nagamatsu, "Audio-Visual Speech Emotion Recognition by Disentangling Emotion and Identity Attributes," in Proc. INTERSPEECH, 2021.

Y. Takashima, Y. Fujita, S. Horiguchi, S. Watanabe, P. Garcia, and K. Nagamatsu, "Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization," in Proc. INTERSPEECH, 2021.

S. Horiguchi, N. Yalta, P. Garcia, Y. Takashima, Y. Xue, D. Raj, Z. Huang, Y. Fujita, S. Watanabe, and S. Khudanpur, "The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap," The Third DIHARD Speech Diarization Challenge, 2021. (2^nd place in all the tasks)

A. I. Adiba, T. Homma, and T. Miyoshi, "Towards Immediate Backchannel Generation Using Attention-Based Early Prediction Model" in Proc. IEEE ICASSP, 2021.

K. Dohi, T. Endo, H. Purohit, R. Tanabe, and Y. Kawaguchi, "Flow-Based Self-Supervised Density Estimation for Anomalous Sound Detection" in Proc. IEEE ICASSP, 2021.

K. Ito, M. Yamamoto, and K. Nagamatsu, "Audio-Visual Speech Enhancement Method Conditioned on the Lip Motion and Speaker Discriminative Embeddings" in Proc. IEEE ICASSP, 2021.

S. Horiguchi, P. Garcia, Y. Fujita, S. Watanabe, and K. Nagamatsu, "End-to-End Speaker Diarization as Post-Processing" in Proc. IEEE ICASSP, 2021.

H. Ozaki, G. Morio, T. Morishita, and T. Miyoshi, "Project-Then-Transfer: Effective Two-Stage Cross-Lingual Transfer for Semantic Dependency Parsing" in Proc. EACL, 2021.

G. Morio*, H. Ozaki*, Y. Koreeda, T. Morishita, and T. Miyoshi, "i-Parser: Interactive Parser Development Kit for Natural Language Processing" in Proc. AAAI 2021. (*Equal contribution).

S. Horiguchi, Y. Fujita, and K. Nagamatsu, "Block-Online Guided Source Separation" in Proc. IEEE SLT, 2021.

Y. Takashima, Y. Fujita, S. Watanabe, S. Horiguchi, P. Garcia, and K. Nagamatsu, "End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection" IEEE SLT, 2021.

Y. Xue, S. Horiguchi, Y. Fujita, S. Watanabe, P. Garcia, and K. Nagamatsu, "Online End-to-End Neural Diarization with Speaker-Tracing Buffer" IEEE SLT, 2021.

M. Mase, A. B. Owen, and B. B. Seiler, "Cohort Shapley values for algorithmic fairness" arXiv preprint, 2021.

B. B. Seiler, M. Mase, and A. B. Owen, "What makes you unique?" arXiv preprint, 2021.

M. Ham"oto and M. Egi, "Model-agnostic Ensemble-based Explanation Correction Leveraging Rashomon Effect" in Proc. IEEE SSCI, 2021.

H. Namba and M. Egi, "Piecewise Simplification Approach for Accurate and Understandable Model," in Proc. IEEE SSCI, 2021.

Y. Koreeda and C. Manning, "ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts," in Proc. Findings of the Association for Computational Linguistics: EMNLP 2021, EMNLP, 2021.

Gaku Morio, Hiroaki Ozaki, Terufumi Morishita, Kohsuke Yanai, "End-to-end Argument Mining with Cross-corpora Multi-task Learning," Transactions of the Association for Computational Linguistics, 2022.

Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Nobuo Nukaga, "Rethinking Fano’s Inequality in Ensemble Learning," in Proc. ICML 2022.

Naokazu Uchida, Takeshi Homma, Makoto Iwayama, Yasuhiro Sogawa, "Reducing Offensive Replies in Open Domain Dialogue System," In Proc. INTERSPEECH 2022.

Amalia Istiqlali Adiba, Takeshi Homma, Yasuhiro Sogawa, "Unsupervised Domain Adaptation on Question-Answering System with Conversation Data," in Proc. SIGDIAL 2022.

Masaki Hamamoto, Hiroyuki Namba, Masashi Egi, "Ensemble-Based Method for Correcting Global Explanation of Prediction Model," in IEICE Transactions of Information and Systems 2023.

Benjamin B. Seiler, Masayoshi Mase, Art B. Owen, "What makes you unique?," in Electronic Journal of Statistics 2023.

Y. Tsuchiya, Y. Mori, and M. Egi, "Explainable Reinforcement Learning Based on Q-Value Decomposition by Expected State Transitions," Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAI-MAKE 2023) , 2023.

Y. Tsuchiya and M. Hamamoto, "Explanation Framework for Optimization-Based Scheduling: Evaluating Contributions of Constraints and Parameters by Shapley Values," ICAPS 2023 Workshop Human-Aware and Explainable Planning (HAXP), 2023.

Masayoshi Mase, Art B. Owen, and Benjamin B. Seiler, "Variable Importance Without Impossible Data," Annual Review of Statistics and Its Application, 2023.

N. Hama, Masayoshi Mase, Art B. Owen, "Deletion and Insertion Tests in Regression Models," Journal of Machine Learning Research (JMLR), 2023.

T. Nishida, T. Endo, and Y. Kawaguchi, "Zero-Shot Domain Adaptation of Anomalous Samples for Semi-Supervised Anomaly Detection," ICASSP, 2023

T. Morishita, G. Morio, A. Yamaguchi, and Y. Sogawa, "Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic," ICML, 2023

A. Ito, S. Horiguchi, "Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model," INTERSPEECH, 2023

K. Shimonishi, K. Dohi, and Y. Kawaguchi, "Anomalous Sound Detection Based on Sound Separation," INTERSPEECH, 2023

T. Okamoto, K. Shimonishi, K. Imoto, K. Dohi, S. Horiguchi, and Y. Kawaguchi, "CAPTDURE: Captioned Sound Dataset of Individual Sources," INTERSPEECH, 2023

M. Tsunokake, A. Yamaguchi, Y. Koreeda, H. Ozaki, and Y. Sogawa, "Hitachi at SemEval-2023 Task 4: Exploring Various Task Formulations Reveals the Importance of Description Texts on Human Values," SemEval, 2023

Y. Koreeda, K. Yokote, H. Ozaki, A. Yamaguchi, M. Tsunokake, and Y. Sogawa, "Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task Strategies for Genre and Framing Detection in Online News," SemEval, 2023

T. Sasazawa, T. Morishita, H. Ozaki, O. Imaichi, and Y. Sogawa, "Controling Keywords and Their Positions in Text Generation," INLG, 2023

T. Fujii, K. Shibata, A. Yamaguchi, T. Morishita, and Y. Sogawa, "How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese," ACL Student Research Workshop, 2023

A. Yamaguchi, H. Ozaki, T. Morishita, G. Morio, and T. Sogawa, "How Does the Task Complexity of Masked Pretraining Objectives Affect Downstream Performance?," ACL, 2023

T. Koreeda, T. Morishita, O. Imaichi, and Y. Sogawa, "LARCH: Large Language Model-based Automatic Readme Creation with Heuristics," CIKM, 2023

T. V. Ho, S. Horiguchi, S. Watanabe, P. Garcia, and T. Sumiyoshi, "Synthetic Data Augmentation for ASR with Domain Filtering," APSIPA ASC, 2023

T. Morishita, T. Koreeda, A. Yamaguchi, G. Morio, O. Imaichi, and Y. Sogawa, "CHICOT: A Developer-Assistance Toolkit for Code Search with High-Level Contextual Information," AAAI, 2024

S. Horiguchi, K. Dohi, and Y. Kawaguchi, "Streaming Active Learning for Regression Problems Using Regression via Classification," ICASSP, 2024

K. Dohi and Y. Kawaguchi, "Distributed Collaborative Anomalous Sound Detection by Embedding Sharing," EUSIPCO, 2024

T. Vu Ho, K. Dohi, and Y. Kawaguchi, "Stream-based Active Learning for Streaming Anomalous Sound Detection in Machine Condition Monitoring", INTERSPEECH, 2024

T. Morishita, A. Yamaguchi, G. Morio, T. Tomonari, O. Imaichi, and Y. Sogawa, "JFLD: A Japanese Benchmark for Deductive Reasoning based on Formal Logic", LREC-COLING, 2024

T. Morishita, G. Morio, A. Yamaguchi, and Y. Sogawa, "Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus", NeurIPS, 2024

R. Nagase, T. Sumiyoshi, N. Yamashita, K. Dohi, and Y. Kawaguchi, "Can We Estimate Purchase Intention Based on Zero-shot Speech Emotion Recognition?," in Proc. APSIPA ASC, 2024.

K. Dohi, A. Ito, H. Purohit, T. Nishida, T. Endo, and Y. Kawaguchi, "Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data," arXiv preprint, 2024.

H. Purohit, T. Nishida, K. Dohi, T. Endo, and Y. Kawaguchi, "MIMII-Gen: Generative Modeling Approach for Simulated Evaluation of Anomalous Sound Detection System," arXiv preprint, 2024.

N. Yamashita, M. Yamamoto, and Y. Kawaguchi, "End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features," arXiv preprint, 2024.

R. Ogura, T. Nishida, and Y. Kawaguchi, "Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Trainings," arXiv preprint, 2024.

T. Nishida, H. Purohit, K. Dohi, T. Endo, and Y. Kawaguchi, "Timbre Difference Capturing in Anomalous Sound Detection," arXiv preprint, 2024.

A. Ito, K. Dohi, and Y. Kawaguchi, "CLaSP: Learning Concepts for Time-Series Signals from Natural Language Supervision," arXiv preprint, 2024.

H. Ozaki, N. Tanahashi, N. Masuda, K. Yamada, M. Kato, and N. Isagawa "Analytical Methodology and a Simulator for ESG-Financial Indicators Based on Causal Hypothesis Graph," STAI, 2024

最新のパブリケーションリストは下記のページをご覧下さい。
https://hitachi-speech.github.io/

学会実績

トピックス, "複雑なブラックボックス型AIを判断基準が明確なAIに変換するAI単純化技術を開発,"
https://www.hitachi.co.jp/rd/news/topics/2021/2112_ais.html

Qiita Zine, "XAIにNLP。なぜ日立は、世界トップクラスのスタンフォード大とAI領域の共同研究を続けるのか,"
https://zine.qiita.com/interview/202112-hitachi/

Hitachi Industrial AI blog, "Automatic meeting minutes generation using state-of-the-art natural language processing,"
https://www.hitachi.com/rd/sc/aiblog/202112_automatic-meeting-minutes-generation/index.html

Qiita Zine, "注目度の高まる「音声処理技術」領域で、日立製作所メンバーの研究開発姿勢を探る,"
https://zine.qiita.com/interview/202111-hitachi-5/

ニュースリリース, "音声データ活用によりカスタマーエクスペリエンス向上を支援する「音声テキスト化クラウドサービス」を販売開始,"
https://www.hitachi.co.jp/New/cnews/month/2021/10/1012.html

ニュースリリース, "三菱UFJモルガン・スタンレー証券、音声認識やAIを活用したお客さま応対のモニタリングシステムを導入,"
https://www.hitachi.co.jp/New/cnews/month/2021/10/1001.html

トピックス,"地域密着型ショッピングモール施設の運営を効率化し、集客力アップに貢献するサイバーフィジカルシステム(CPS)を試作,"
https://www.hitachi.co.jp/rd/news/topics/2021/0614_nonowa_poc.html

お知らせ, "日立の異音検知ソリューションが日本音響学会の「第29回日本音響学会技術開発賞」を受賞,"
https://www.facebook.com/hitachi.it/posts/%E6%97%A5%E7%AB%8B%E3%81%AE%E7%95%B0%E9%9F%B3%E6%A4%9C%E7%9F%A5%E3%82%BD%E3%83%AA%E3%83%A5%E3%83%BC%E3%82%B7%E3%83%A7%E3%83%B3%E3%81%8C%E7%AC%AC29%E5%9B%9E-%E6%97%A5%E6%9C%AC%E9%9F%B3%E9%9F%BF%E5%AD%A6%E4%BC%9A%E6%8A%80%E8%A1%93%E9%96%8B%E7%99%BA%E8%B3%9E%E3%82%92%E5%8F%97%E8%B3%9E%E3%81%97%E3%81%BE%E3%81%97%E3%81%9F%E7%95%B0%E9%9F%B3%E6%A4%9C%E7%9F%A5%E3%82%BD%E3%83%AA%E3%83%A5%E3%83%BC%E3%82%B7%E3%83%A7%E3%83%B3%E3%81%AF%E6%97%A5%E7%AB%8B%E3%81%AE%E8%87%AA%E7%A4%BE%E5%B7%A5%E5%A0%B4%E3%81%A7%E3%81%AE%E5%AE%9F%E7%B8%BE%E3%83%8E%E3%82%A6%E3%83%8F%E3%82%A6%E3%82%92%E3%82%82%E3%81%A8%E3%81%AB%E5%AE%9F%E7%94%A8%E5%8C%96%E3%81%97%E3%81%9F%E3%83%9E%E3%82%A4%E3%82%AF%E6%A9%9F%E8%83%BD%E6%90%AD/4000020563392835/

Qiita Zine, "リスクテイクしてこそ研究者だ。音響と画像認識で成果を出し続ける日立研究員のマインド,"
https://zine.qiita.com/interview/202103-hitachi/

ニュースリリース, "社会イノベーション事業における「AI倫理原則」を策定,"
https://www.hitachi.co.jp/New/cnews/month/2021/02/0222.html

Qiita Zine, "AIはブラックボックス？判断根拠を説明する「XAI」を活用して社会課題に挑む日立製作所,"
https://zine.qiita.com/interview/202102-hitachi/

Hitachi Industrial AI blog, "Uncovering the mystery of ensemble learning through the information theoretical lens,"
https://www.hitachi.com/rd/sc/aiblog/202209_theoretical-framework-of-el/index.html

Qiita Zine, "だから音声は面白い！日立製作所が進める、「人の感情」を可視化する新規サービスの作り方,"
https://zine.qiita.com/interview/202207-hitachi-2/

協創の森ウェビナー, "人とAIが共進化するために本質的な視点とは│協創の森ウェビナー第17回「サイバーシステムの社会実装とその課題」プログラム3「人とAIが共進化する社会に向けて」,"
https://linkingsociety.hitachi.co.jp/_ct/17664780

Insights from AI/Analytics, "熟慮、熟議を経て政策参画へ　サイバー・デモクラシーを支えるAI技術,"
https://www.hitachi.co.jp/rd/sc/ai-analytics/009/index.html

研究トピックス, "生成AIの論理的思考能力を強化するための学習データを自動作成する基本技術を開発",
https://rd.hitachi.co.jp/_ct/17736579

生成AI活用のフロントランナー, "RAGの高度化で生成AIを次のステージへ",
https://deh.hitachi.co.jp/_ct/17733925