Publications

You can also find my articles on my Google Scholar profile.

Papers


MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026), 2026

We investigate whether audio LLMs introduce new biases through paralinguistic cues in clinical decision-making. Testing 170 clinical cases across 36 voice profiles, we found surgical recommendations varied by up to 35% between audio and text inputs, with age-related disparities reaching 12%, raising serious concerns about perpetuating healthcare disparities.

Recommended citation: Tam, Z.R., & Chen, Y.N. (2026). "MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making." In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Download Paper

Expected Harm: Rethinking Safety Evaluation of (Mis) Aligned LLMs

Published in arXiv preprint, 2026

We introduce Expected Harm metric that combines threat severity with execution likelihood, revealing that models exhibit Inverse Risk Calibration—disproportionately refusing difficult-to-execute threats while remaining vulnerable to easily-executable ones. This miscalibration increases jailbreak success rates by up to 2×.

Recommended citation: Chen, Y.S., Tam, Z.R., Wu, C.K., & Chen, Y.N. (2026). "Expected Harm: Rethinking Safety Evaluation of (Mis) Aligned LLMs." arXiv preprint arXiv:2602.01600.
Download Paper

The Context Trap: Why End-to-End Audio Language Models Fail Multi-turn Dialogues

Published in Proceedings of the 16th International Workshop on Spoken Dialogue System Technology (IWSDS 2026), 2026

We systematically compare end-to-end audio language models with modular systems in multi-turn dialogue tasks, revealing that E2E configurations consistently underperform due to deficiencies in context maintenance and topic tracking, challenging assumptions about their superiority over modular approaches.

Recommended citation: Tam, Z.R., Chang, W.Y., & Chen, Y.N. (2026). "The Context Trap: Why End-to-End Audio Language Models Fail Multi-turn Dialogues." In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 76–82, Trento, Italy.
Download Paper

None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering

Published in arXiv preprint, 2025

This study examines how “None of the Above” (NA) options affect LLM performance on multiple-choice questions. Results reveal a consistent 30-50% performance drop when NA is the correct answer, with domain dependency showing minimal impact on math reasoning but severe effects on uncertainty handling tasks like business ethics.

Recommended citation: Tam, Z.R., Wu, C.K., & Chen, Y.N. (2025). "None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering." arXiv preprint arXiv:2503.01550.
Download Paper

Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models

Published in arXiv, 2025

This study formalizes the task of risk-aware decision making in LLMs, explores how models adapt their decisions to different risk levels, and proposes skill decomposition solutions to improve performance. The findings show that even advanced LMs require explicit prompt chaining to handle risk-aware decision making effectively.

Recommended citation: Wu, C.K., Tam, Z.R., Lin, C.Y., Chen, Y.N., & Lee, H. (2024). "Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models." arXiv preprint arXiv:2503.01332.
Download Paper

Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity

Published in NeurIPS 2025, 2025

A systematic analysis revealing that fine-tuning with LLM-generated data not only improves target task performance but also reduces out-of-domain degradation compared to fine-tuning with ground truth data and ways to mitigate it

Recommended citation: Wu, C.C., Tam, Z.R., Lin, C.Y., Lee, H.Y., & Chen, Y.N. (2025). "Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity." arXiv preprint arXiv:2501.14315.
Download Paper

Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Large Language Model Performance

Published in EMNLP 2024 Industry Track, 2024

Structured generation, the process of producing content in standardized formats like JSON and XML, is widely utilized in real-world applications to extract key output information from large language models (LLMs).

Recommended citation: Tam, Z.R., Wu, C.K., Tsai, Y.L., Lin, C.Y., Lee, H., & Chen, Y.N. (2024). "Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Large Language Model Performance." EMNLP Industry Track, 1218-1236.
Download Paper

I Need Help! Evaluating LLM’s Ability to Ask for Users’ Support: A Case Study on Text-to-SQL Generation

Published in EMNLP 2024 Main Track, 2024

This study explores the proactive ability of LLMs to seek user support. We propose metrics to evaluate the trade-off between performance improvements and user burden, and investigate whether LLMs can determine when to request help under varying information availability.

Recommended citation: Wu, C.K., Tam, Z.R., Wu, C.C., Lin, C.Y., Lee, H., & Chen, Y.N. (2024). "I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation." arXiv preprint arXiv:2407.14767.
Download Paper

Personalized EDM Subject Generation via Co-factored User-Subject Embedding

Published in PAKDD, 2024

This paper introduces the Co-Factored User-Subject Embedding based Personalized EDM Subject Generation Framework (COUPES), a model for creating personalized Electronic Direct Mail (EDM) subjects.

Recommended citation: Chen, Y.H., Tam, Z.R., & Shuai, H.H. (2024). "Personalized EDM Subject Generation via Co-factored User-Subject Embedding." Pacific-Asia Conference on Knowledge Discovery and Data Mining, 55-67.
Download Paper

An improved traditional chinese evaluation suite for foundation model

Published in arXiv, 2024

We present TMMLU+, a new benchmark designed for Traditional Chinese language understanding. TMMLU+ is a multi-choice question-answering dataset with 66 subjects from elementary to professional level. It is six times larger and boasts a more balanced subject distribution than its predecessor, Taiwan Massive Multitask Language Understanding (TMMLU).

Recommended citation: Tam, Z.R., Pai, Y.T., Lee, Y.W., Chen, J.D., Chu, W.M., Cheng, S., & Shuai, H.H. (2024). "An improved traditional chinese evaluation suite for foundation model." arXiv preprint arXiv:2403.01858.
Download Paper

Openassistant conversations-democratizing large language model alignment

Published in NeurIPS, 2024

Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT.

Recommended citation: Köpf, A., Kilcher, Y., von Rütte, D., Anagnostidis, S., Tam, Z.R., et al. (2024). "Openassistant conversations-democratizing large language model alignment." NeurIPS, 36.
Download Paper

Improving entity disambiguation using knowledge graph regularization

Published in PAKDD, 2022

Entity disambiguation plays the role on bridging between words of interest from an input text document and unique entities in a target Knowledge Base (KB).

Recommended citation: Tam, Z.R., Wu, Y.L., & Shuai, H.H. (2022). "Improving entity disambiguation using knowledge graph regularization." Pacific-Asia Conference on Knowledge Discovery and Data Mining, 341-353.
Download Paper

Gradient normalization for generative adversarial networks

Published in ICCV, 2021

In this paper, we propose a novel normalization method called gradient normalization (GN) to tackle the training instability of Generative Adversarial Networks (GANs) caused by the sharp gradient space.

Recommended citation: Wu, Y.L., Shuai, H.H., Tam, Z.R., & Chiu, H.Y. (2021). "Gradient normalization for generative adversarial networks." ICCV, 6373-6382.
Download Paper

Character-preserving coherent story visualization

Published in ECCV, 2020

Story visualization aims at generating a sequence of images to narrate each sentence in a multi-sentence story.

Recommended citation: Song, Y.Z., Tam, Z.R., Chen, H.J., Lu, H.H., & Shuai, H.H. (2020). "Character-preserving coherent story visualization." European Conference on Computer Vision, 18-33.
Download Paper