1.3K Views
September 26, 22
スライド概要
明治大学 総合数理学部 先端メディアサイエンス学科 中村聡史研究室
A Method to Annotate Who Speaks a Text Line in Manga and Speaker-Line Dataset for Manga109 Tsubasa Sakurai, Risa Ito, Kazuki Abe and Satoshi Nakamura School of Interdisciplinary Mathematical Sciences, Meiji University
Background Increased research and services utilizing e-comics Automatic translation, content-based recommendation and search, spoiler prevention ➔ Various studies on the contents of comics are required Automatic translation (Mantra) © Akamatsu Ken, LoveHina
Background Recognition of the components of comics The area of comic frames, the area of text lines and the face of a character Frame Line Face © Akamatsu Ken, LoveHina Line
Background Recognition of the components of comics The area of comic frames, the area of lines and the face of a character Other studies on the components of comics the content of lines, facial expressions, the speaker of the lines and relationships between characters etc.
Required dataset Focus on the relationship between lines and characters Who speaks these text line in the frame Line Face © Akamatsu Ken, LoveHina Line
Related work Methods for automatic estimation of the speaker Estimation by distance from the tail of the speech balloon ➔ Speech balloon and speaker association for comics and manga understanding [Rigaud et al. 2015] © Shindou Uni, NichijouSoup
Related work Methods for automatic estimation of the speaker Estimation by distance from the tail of the speech balloon ➔ Speech balloon and speaker association for comics and manga understanding [Rigaud et al. 2015] Tail of the speech balloon © Shindou Uni, NichijouSoup
Difficulty of speaker estimation Examples where existing methods cannot be used © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka No speech balloon Distant character is the speaker
Difficulty of speaker estimation Examples where existing methods cannot be used © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka 吹き出しがない Clarify the factors to consider遠いキャラが発話者 in machine learning and what are the difficulties
Related work The eBDtheque dataset Speaker information is available, but the number of data is small ➔ eBDtheque: A Representative Database of Comics [Rigaud et al. 2013] The Manga109 dataset Large number of data, but no speaker information ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017]
Related work The Manga109 dataset 109 comics drawn by professional cartoonists, with annotations ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017] 4 types of annotations l Position of the frames l Body position and character name l Face position and character name Frame Line Body Face Line l Text line position and text string © Akamatsu Ken, LoveHina
Research purpose Propose and develop systems to easily construct datasets Analyze Speaker-Line Dataset for Manga109 Identifying the characteristics of comics for speaker estimation Line Face © Akamatsu Ken, LoveHina Line
Conventional method How to assign annotations Manual selection of speakers is very difficult
Informational Design Speakers and lines are often close together Enables quick annotation © Akamatsu Ken, LoveHina
Fitts's law Fitts's law [Fitts 1954] Which tasks are difficult?
Fitts's law Fitts's law [Fitts 1954] The larger the target and the shorter the distance to the target, the quicker the movement D W 𝑫 𝑇 = 𝑎 + 𝑏 log ! ( + 1) 𝑾 𝑎:Time taken for start and end operations 𝑏:Effect of mouse speed on time taken
Proposed method How to assign annotations Drag and drop lines to the speaker
Dataset Construction Dataset construction system Building a dataset of mapping between lines and speakers
Dataset Construction Dataset construction system Building a dataset of mapping between lines and speakers Number of the annotations l A total of 749,856 lines annotated by 56 people l Average of about 5 persons evaluating per line
Speaker-Line Dataset for Manga109 Result of dataset construction Number of annotations assigned per person Manga109 147,918 total speaking Fewer annotations ↓ less validity
Appropriate number of evaluators Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup
Appropriate number of evaluators Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup
Appropriate number of evaluators Number of people evaluated and rating agreement Possibility to change the speaker candidate © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup
Analysis of our dataset Result of dataset construction Agreement rate of the annotations
Appropriate number of evaluators Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% By one annotator
Appropriate number of evaluators Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% By one annotator Perfect Match Rate =60% By two annotators Perfect Match Rate =40% By three annotators
Appropriate number of evaluators Changes in the rate of perfect matches in ratings Need appropriate evaluator above a certain level 10 points
Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame
Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame © Akamatsu Ken, LoveHina
Analysis of our dataset Result of dataset construction Presence or absence of speaker in the frame © Akamatsu Ken, LoveHina
Scenes with difficulty to map Specific situations (on battleships) © Kato Masaki, ARMS
Scenes with difficulty to map Specific situations (darkness) © Kato Masaki, ARMS
Scenes with difficulty to map Specific scenes (battle scenes) © Oi Masakazu, Joouari
Scenes with difficulty to map Specific situations (internal speak) © Oi Masakazu, Joouari
Scenes with difficulty to map Unusual cases (Case closed series) which? © Gosho Aoyama, Detective Conan (Case closed)
Discussion and prospects Speaker-Line Dataset construction system System for annotating even a large number of lines Blurred evaluations in certain genres and scenes Genre: science fiction, battle Scene: difficult-to-grasp frames (e.g. battle scenes, darkness) Difficulty of speaker estimation Blurring exists even in human evaluation Consideration of the best number of people to annotate
Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene who? © Yagami Ken, HisokaReturns © Kato Masaki, ARMS
Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene
Discussion and prospects Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene Clarify the level of difficulty in annotation & Reconsider the required number of annotators
Summary Background Research purpose Proposed methods The need to recognize the components of comics Propose and develop systems to easily construct datasets Analysis of Speaker-Line Dataset for Manga109 The larger the target and the shorter the distance to the target, the quicker the movement Speakers and lines are often close together ➔ Drag and drop lines to the speaker Focus on the relationship between lines and characters (speaker estimation) Dataset construction Analysis of datasets Discussion and prospects A total of 749,856 annotations assigned by 56 people Average of about 5 persons evaluating per line Agreement rate of the annotations Blurred evaluations in certain genres and scenes Changes in the rate of perfect matches in ratings Scenes with difficult to map Difficulty of speaker estimation Efficiency of annotation assignment