manpu2022(Sakurai_SpeakerLineDataset)

1.9K Views

September 26, 22

#comic #manga #text line #speaker-line dataset #漫画 #自然言語処理 #データセット構築 #アノテーション #発言者推定

スライド概要

Nakamura Laboratory (Meiji University)

@nkmr-lab

スライド一覧

明治大学総合数理学部先端メディアサイエンス学科中村聡史研究室

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

周辺視野に対するぼかしエフェクトが作業時の集中力に及ぼす影響の調査

Nakamura Laboratory (Meiji University) 31.7K

商品選択においてフォントがユーザの選択行動に及ぼす影響の調査

Nakamura Laboratory (Meiji University) 24.2K

手書きとフォントの文字形状の違いによる記憶効果の比較

Nakamura Laboratory (Meiji University) 20.8K

Make-up FLOW 2.0: 美容系YouTuberの化粧フローチャートの共有・取り入れ手法

化粧メイク化粧工程フローチャート美容系youtuber 取り入れ

Nakamura Laboratory (Meiji University) 17.3K

周辺視野における妨害刺激の減衰が集中度に及ぼす影響

Nakamura Laboratory (Meiji University) 17.1K

ComiQA: A Comic Quiz Sharing Service that Helps Users to Recollect the Content of Previous Volumes

comic manga recollection qa service

Nakamura Laboratory (Meiji University) 16.4K

各ページのテキスト

A Method to Annotate Who Speaks a Text Line in Manga and Speaker-Line Dataset for Manga109 Tsubasa Sakurai, Risa Ito, Kazuki Abe and Satoshi Nakamura School of Interdisciplinary Mathematical Sciences, Meiji University

Background Increased research and services utilizing e-comics Automatic translation, content-based recommendation and search, spoiler prevention ➔ Various studies on the contents of comics are required Automatic translation (Mantra) © Akamatsu Ken, LoveHina

Background Recognition of the components of comics The area of comic frames, the area of text lines and the face of a character Frame Line Face © Akamatsu Ken, LoveHina Line

Background Recognition of the components of comics The area of comic frames, the area of lines and the face of a character Other studies on the components of comics the content of lines, facial expressions, the speaker of the lines and relationships between characters etc.

Required dataset Focus on the relationship between lines and characters Who speaks these text line in the frame Line Face © Akamatsu Ken, LoveHina Line

Related work Methods for automatic estimation of the speaker Estimation by distance from the tail of the speech balloon ➔ Speech balloon and speaker association for comics and manga understanding [Rigaud et al. 2015] © Shindou Uni, NichijouSoup

Difficulty of speaker estimation Examples where existing methods cannot be used © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka No speech balloon Distant character is the speaker

Difficulty of speaker estimation Examples where existing methods cannot be used © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka 吹き出しがない Clarify the factors to consider遠いキャラが発話者 in machine learning and what are the difficulties

10.

Related work The eBDtheque dataset Speaker information is available, but the number of data is small ➔ eBDtheque: A Representative Database of Comics [Rigaud et al. 2013] The Manga109 dataset Large number of data, but no speaker information ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017]

11.

Related work The Manga109 dataset 109 comics drawn by professional cartoonists, with annotations ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017] 4 types of annotations l Position of the frames l Body position and character name l Face position and character name Frame Line Body Face Line l Text line position and text string © Akamatsu Ken, LoveHina

12.

Research purpose Propose and develop systems to easily construct datasets Analyze Speaker-Line Dataset for Manga109 Identifying the characteristics of comics for speaker estimation Line Face © Akamatsu Ken, LoveHina Line

13.

Conventional method How to assign annotations Manual selection of speakers is very difficult

14.

Informational Design Speakers and lines are often close together Enables quick annotation © Akamatsu Ken, LoveHina

15.

Fitts's law Fitts's law [Fitts 1954] Which tasks are difficult?

16.

Fitts's law Fitts's law [Fitts 1954] The larger the target and the shorter the distance to the target, the quicker the movement D W 𝑫 𝑇 = 𝑎 + 𝑏 log ! ( + 1) 𝑾 𝑎：Time taken for start and end operations 𝑏：Effect of mouse speed on time taken

17.

Proposed method How to assign annotations Drag and drop lines to the speaker

18.

Dataset Construction Dataset construction system Building a dataset of mapping between lines and speakers

19.

Dataset Construction Dataset construction system Building a dataset of mapping between lines and speakers Number of the annotations l A total of 749,856 lines annotated by 56 people l Average of about 5 persons evaluating per line

20.

Speaker-Line Dataset for Manga109 Result of dataset construction Number of annotations assigned per person Manga109 147,918 total speaking Fewer annotations ↓ less validity

21.

Appropriate number of evaluators Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup

22.

Appropriate number of evaluators Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup

23.

Appropriate number of evaluators Number of people evaluated and rating agreement Possibility to change the speaker candidate © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup

24.

Analysis of our dataset Result of dataset construction Agreement rate of the annotations

25.

Appropriate number of evaluators Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% By one annotator

26.

Appropriate number of evaluators Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% By one annotator Perfect Match Rate =60% By two annotators Perfect Match Rate =40% By three annotators

27.

Appropriate number of evaluators Changes in the rate of perfect matches in ratings Need appropriate evaluator above a certain level 10 points

28.

Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame

29.

Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame © Akamatsu Ken, LoveHina

30.

Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame © Akamatsu Ken, LoveHina

31.

32.

33.

34.

35.

36.

Discussion and prospects Speaker-Line Dataset construction system System for annotating even a large number of lines Blurred evaluations in certain genres and scenes Genre: science fiction, battle Scene: difficult-to-grasp frames (e.g. battle scenes, darkness) Difficulty of speaker estimation Blurring exists even in human evaluation Consideration of the best number of people to annotate

37.

Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene who? © Yagami Ken, HisokaReturns © Kato Masaki, ARMS

38.

Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene

39.

Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene Clarify the level of difficulty in annotation & Reconsider the required number of annotators

40.

Summary Background Research purpose Proposed methods The need to recognize the components of comics Propose and develop systems to easily construct datasets Analysis of Speaker-Line Dataset for Manga109 The larger the target and the shorter the distance to the target, the quicker the movement Speakers and lines are often close together ➔ Drag and drop lines to the speaker Focus on the relationship between lines and characters (speaker estimation) Dataset construction Analysis of datasets Discussion and prospects A total of 749,856 annotations assigned by 56 people Average of about 5 persons evaluating per line Agreement rate of the annotations Blurred evaluations in certain genres and scenes Changes in the rate of perfect matches in ratings Scenes with difficult to map Difficulty of speaker estimation Efficiency of annotation assignment