manpu2022(Sakurai_SpeakerLineDataset)

1.4K Views

September 26, 22

スライド概要

profile-image

明治大学 総合数理学部 先端メディアサイエンス学科 中村聡史研究室

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

A Method to Annotate Who Speaks a Text Line in Manga and Speaker-Line Dataset for Manga109 Tsubasa Sakurai, Risa Ito, Kazuki Abe and Satoshi Nakamura School of Interdisciplinary Mathematical Sciences, Meiji University

2.

Background Increased research and services utilizing e-comics Automatic translation, content-based recommendation and search, spoiler prevention ➔ Various studies on the contents of comics are required Automatic translation (Mantra) © Akamatsu Ken, LoveHina

3.

Background Recognition of the components of comics The area of comic frames, the area of text lines and the face of a character Frame Line Face © Akamatsu Ken, LoveHina Line

4.

Background Recognition of the components of comics The area of comic frames, the area of lines and the face of a character Other studies on the components of comics the content of lines, facial expressions, the speaker of the lines and relationships between characters etc.

5.

Required dataset Focus on the relationship between lines and characters Who speaks these text line in the frame Line Face © Akamatsu Ken, LoveHina Line

6.

Related work Methods for automatic estimation of the speaker Estimation by distance from the tail of the speech balloon ➔ Speech balloon and speaker association for comics and manga understanding [Rigaud et al. 2015] © Shindou Uni, NichijouSoup

7.

Related work Methods for automatic estimation of the speaker Estimation by distance from the tail of the speech balloon ➔ Speech balloon and speaker association for comics and manga understanding [Rigaud et al. 2015] Tail of the speech balloon © Shindou Uni, NichijouSoup

8.

Difficulty of speaker estimation Examples where existing methods cannot be used © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka No speech balloon Distant character is the speaker

9.

Difficulty of speaker estimation Examples where existing methods cannot be used © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka 吹き出しがない Clarify the factors to consider遠いキャラが発話者 in machine learning and what are the difficulties

10.

Related work The eBDtheque dataset Speaker information is available, but the number of data is small ➔ eBDtheque: A Representative Database of Comics [Rigaud et al. 2013] The Manga109 dataset Large number of data, but no speaker information ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017]

11.

Related work The Manga109 dataset 109 comics drawn by professional cartoonists, with annotations ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017] 4 types of annotations l Position of the frames l Body position and character name l Face position and character name Frame Line Body Face Line l Text line position and text string © Akamatsu Ken, LoveHina

12.

Research purpose Propose and develop systems to easily construct datasets Analyze Speaker-Line Dataset for Manga109 Identifying the characteristics of comics for speaker estimation Line Face © Akamatsu Ken, LoveHina Line

13.

Conventional method How to assign annotations Manual selection of speakers is very difficult

14.

Informational Design Speakers and lines are often close together Enables quick annotation © Akamatsu Ken, LoveHina

15.

Fitts's law Fitts's law [Fitts 1954] Which tasks are difficult?

16.

Fitts's law Fitts's law [Fitts 1954] The larger the target and the shorter the distance to the target, the quicker the movement D W 𝑫 𝑇 = 𝑎 + 𝑏 log ! ( + 1) 𝑾 𝑎:Time taken for start and end operations 𝑏:Effect of mouse speed on time taken

17.

Proposed method How to assign annotations Drag and drop lines to the speaker

18.

Dataset Construction Dataset construction system Building a dataset of mapping between lines and speakers

19.

Dataset Construction Dataset construction system Building a dataset of mapping between lines and speakers Number of the annotations l A total of 749,856 lines annotated by 56 people l Average of about 5 persons evaluating per line

20.

Speaker-Line Dataset for Manga109 Result of dataset construction Number of annotations assigned per person Manga109 147,918 total speaking Fewer annotations ↓ less validity

21.

Appropriate number of evaluators Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup

22.

Appropriate number of evaluators Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup

23.

Appropriate number of evaluators Number of people evaluated and rating agreement Possibility to change the speaker candidate © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup

24.

Analysis of our dataset Result of dataset construction Agreement rate of the annotations

25.

Appropriate number of evaluators Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% By one annotator

26.

Appropriate number of evaluators Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% By one annotator Perfect Match Rate =60% By two annotators Perfect Match Rate =40% By three annotators

27.

Appropriate number of evaluators Changes in the rate of perfect matches in ratings Need appropriate evaluator above a certain level 10 points

28.

Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame

29.

Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame © Akamatsu Ken, LoveHina

30.

Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame © Akamatsu Ken, LoveHina

31.

Scenes with difficulty to map Specific situations (on battleships) © Kato Masaki, ARMS

32.

Scenes with difficulty to map Specific situations (darkness) © Kato Masaki, ARMS

33.

Scenes with difficulty to map Specific scenes (battle scenes) © Oi Masakazu, Joouari

34.

Scenes with difficulty to map Specific situations (internal speak) © Oi Masakazu, Joouari

35.

Scenes with difficulty to map Unusual cases (Case closed series) which? © Gosho Aoyama, Detective Conan (Case closed)

36.

Discussion and prospects Speaker-Line Dataset construction system System for annotating even a large number of lines Blurred evaluations in certain genres and scenes Genre: science fiction, battle Scene: difficult-to-grasp frames (e.g. battle scenes, darkness) Difficulty of speaker estimation Blurring exists even in human evaluation Consideration of the best number of people to annotate

37.

Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene who? © Yagami Ken, HisokaReturns © Kato Masaki, ARMS

38.

Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene

39.

Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene Clarify the level of difficulty in annotation & Reconsider the required number of annotators

40.

Summary Background Research purpose Proposed methods The need to recognize the components of comics Propose and develop systems to easily construct datasets Analysis of Speaker-Line Dataset for Manga109 The larger the target and the shorter the distance to the target, the quicker the movement Speakers and lines are often close together ➔ Drag and drop lines to the speaker Focus on the relationship between lines and characters (speaker estimation) Dataset construction Analysis of datasets Discussion and prospects A total of 749,856 annotations assigned by 56 people Average of about 5 persons evaluating per line Agreement rate of the annotations Blurred evaluations in certain genres and scenes Changes in the rate of perfect matches in ratings Scenes with difficult to map Difficulty of speaker estimation Efficiency of annotation assignment