Masahiro_Fukui

1.

Structural Analysis of Rebuttals to Evaluate Argumentative Interaction in Parliamentary Debates Masahiro Fukui1, Satoshi Nakamura1 1. Meiji University

2.

Contribution • We created a corpus of 20 English parliamentary debates with 679 rebuttals. • We proposed four structural features and showed they moderately predict argumentative interaction via expert-LLM evaluation. • We developed DebaTube, a visualization system for video exploration. Graphs of rebuttal structures User interface of DebaTube (See the paper for details) 2

3.

Background • Parliamentary debate is turn-based format where two teams argue for or against a given topic. → educationally valuable for learning dialogic skills. • Previous works have focused on winner prediction and ignore argumentative interaction i.e., engaging with and build upon each other’s arguments. The rule of parliamentary debate 3

4.

Definition of Rebuttal Structure • Rebuttal structures can reflect characteristics of dialogic debates. • e.g.,) Rebuttal order is aligned with that of the opponent; frequent counter-rebuttals suggest deeper engagement • We define rebuttal structures as graphs with Argumentative Discourse Units (ADUs) as nodes placed chronologically and rebuttals as edges. red: blue: proposition opposition 4

5.

Research question What structural features of rebuttals indicate the quality of argumentative interaction in parliamentary debate? 5

6.

Structural Features of Rebuttals Four structural features of rebuttal 𝑹𝒂𝒍𝒍𝒚 → Re-rebuttals Good 𝑶𝒓𝒅𝒆𝒓 → Crossings of rebuttal edges 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 Bad → Rebuttals to 2+ speeches back 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍 → Gaps between rebuttals to the same statement 6

7.

Structural Features of Rebuttals Four structural features of rebuttal 𝑹𝒂𝒍𝒍𝒚 Proposition Opposition → Re-rebuttals Good 𝑶𝒓𝒅𝒆𝒓 → Crossings of rebuttal edges ADU 1 ADU 2 ADU 3 ADU 4 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 Bad → Rebuttals to 2+ speeches back 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍 → Gaps between rebuttals to the same statement ADU 5 ADU 6 ADU 7 ADU 8 7

8.

Structural Features of Rebuttals Four structural features of rebuttal 𝑹𝒂𝒍𝒍𝒚 Proposition Opposition → Re-rebuttals Good 𝑶𝒓𝒅𝒆𝒓 → Crossings of rebuttal edges 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 Bad → Rebuttals to 2+ speeches back 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍 → Gaps between rebuttals to the same statement Topic A Topic B Topic C Topic A Topic B Topic C 8

9.

Structural Features of Rebuttals Four structural features of rebuttal 𝑹𝒂𝒍𝒍𝒚 Proposition Opposition → Re-rebuttals Good 𝑶𝒓𝒅𝒆𝒓 → Crossings of rebuttal edges 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 Bad → Rebuttals to 2+ speeches back 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍 → Gaps between rebuttals to the same statement Topic A Topic B Topic C Topic C Topic A Topic B 9

10.

Structural Features of Rebuttals Four structural features of rebuttal 𝑹𝒂𝒍𝒍𝒚 Proposition Opposition → Re-rebuttals Good 𝑶𝒓𝒅𝒆𝒓 Prop. 1st → Crossings of rebuttal edges 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 Bad → Rebuttals to 2+ speeches back 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍 → Gaps between rebuttals to the same statement Opp. 1st Prop. 2nd Opp. 2nd 10

11.

Structural Features of Rebuttals Four structural features of rebuttal 𝑹𝒂𝒍𝒍𝒚 Proposition Opposition → Re-rebuttals Good 𝑶𝒓𝒅𝒆𝒓 Bad ADU 1 ADU 2 → Crossings of rebuttal edges ADU 3 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 ADU 4 → Rebuttals to 2+ speeches back 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍 → Gaps between rebuttals to the same statement ADU 5 ADU 6 ADU 7 ADU 8 11

12.

Structural Features of Rebuttals Four structural features of rebuttal 𝑹𝒂𝒍𝒍𝒚 Proposition Opposition → Re-rebuttals Good 𝑶𝒓𝒅𝒆𝒓 Bad ADU 1 ADU 2 → Crossings of rebuttal edges ADU 3 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 ADU 4 → Rebuttals to 2+ speeches back 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍 → Gaps between rebuttals to the same statement ADU 5 ADU 6 ADU 7 ADU 8 12

13.

Corpus Construction Target 20 videos of intermediate-level debate rounds by Japanese students speaking in English. Raters A judge expert & an LLM [1]. Criteria Whether both teams demonstrated high-quality argumentative interaction on a four-point Likert scale. [1] Liu, X., Liu, P., He, H.: An empirical analysis on large language models in debate evaluation. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Volume 2. 470–487 (2024) 13

14.

Result • The perfect agreement rate was 60% and Cohen's Kappa for binary classifications (scores 3-4 vs. 1-2) was 0.490, indicating moderate agreement. • Leave-one-out cross-validation showed that the Multiple Linear Regression model performed best (r = 0.609). • 𝑹𝒂𝒍𝒍𝒚 > 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 and 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 >> 𝑂𝑟𝑑𝑒𝑟 Best Best 14

15.

Discussion: 𝑅𝑎𝑙𝑙𝑦, 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 & 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 Succeeded in reflecting strong and weak characteristics of arguments. - (a) Best 𝑅𝑎𝑙𝑙𝑦 and (b) Best 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 & 4th best 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 captured strength. - (c) Worst 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 captured weakness of arguments. Good (rated 3.0 / 3.5) Bad (rated 2.5) 15

16.

Discussion: 𝑅𝑎𝑙𝑙𝑦, 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 & 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 Succeeded in reflecting strong and weak characteristics of arguments. - (a) Best 𝑅𝑎𝑙𝑙𝑦 and (b) Best 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 & 4th best 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 captured strength. - (c) Worst 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 captured weakness of arguments. Good (rated 3.0 / 3.5) Bad (rated 2.5) 16

17.

Discussion: 𝑅𝑎𝑙𝑙𝑦, 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 & 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 Succeeded in reflecting strong and weak characteristics of arguments. - (a) Best 𝑅𝑎𝑙𝑙𝑦 and (b) Best 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 & 4th best 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 captured strength. - (c) Worst 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 captured weakness of arguments. Good (rated 3.0 / 3.5) Bad (rated 2.5) 17

18.

Discussion: 𝑂𝑟𝑑𝑒𝑟 Failed to catch characteristics of strong arguments. - (d) Although high rating and debaters showed well-ordered rebuttals, 𝑂𝑟𝑑𝑒𝑟 was the third worst. Good (rated 3.5) well-ordered rebuttals 18

19.

Discussion: 𝑂𝑟𝑑𝑒𝑟 Failed to catch characteristics of strong arguments. - (d) Although high rating and debaters showed well-ordered rebuttals, 𝑂𝑟𝑑𝑒𝑟 was the third worst. Good (rated 3.5) A speech with bad-ordered rebuttals 19

20.

Application: DebaTube To bridge the gap between graph-based mechanical evaluation frameworks and human interpretability in practical debate learning, we developed DebaTube. 20

21.

Application: DebaTube This visualizes rebuttal structures for video exploration to learn from. Users can overview each round’s characteristics, pin rounds and click ADU to jump to the scene. This allows users to effectively find interesting rounds by checking typical patterns of strong rebuttals or comparing same speakers before watching movies. 21

22.

Summary Background Existing debate evaluation methods focus on individual rebuttals or winner prediction, ignoring debate’s quality as a dialog. RQ What structural features of rebuttals indicate high-quality argumentative interaction in parliamentary debate? Experiment Analyzed 20 debates using four proposed structural features compared against expert-LLM ratings via regression. Result Rally was the strongest predictor (40.7%), followed by Distance (28.8%) and Interval (25.3%), with moderate correlation (r=0.609). 22

Nakamura Laboratory (Meiji University)

関連スライド

周辺視野に対するぼかしエフェクトが作業時の集中力に及ぼす影響の調査

商品選択においてフォントがユーザの選択行動に及ぼす影響の調査

手書きとフォントの文字形状の違いによる記憶効果の比較

Make-up FLOW 2.0: 美容系YouTuberの化粧フローチャートの共有・取り入れ手法

周辺視野における妨害刺激の減衰が集中度に及ぼす影響

色覚特性を考慮したゲームの有利不利制御のAmong Usを用いた検証

各ページのテキスト