432 Views
November 14, 19
スライド概要
ICEC発表資料
明治大学 総合数理学部 先端メディアサイエンス学科 中村聡史研究室
Music Video Clip Impression Emphasis Method by Font Fusion Synchronized with Music Kosuke Nonaka (Meiji University) Junki Saito, Satoshi Nakamura (Meiji University) HII
Background We have many opportunity to listen to the music. They usually have a lyrics.
Background Design of lyrics is usually static throughout the music video. It is like the texts are just telling the content of the lyrics. This can cause problems such as giving a boring impression to users.
Background Lyric Speaker (COTODAMA Inc.) Means of visualization of lyrics are proposed in various way. It is useful to extend auditory and visually experience.
Related Work : Influence to Content Comic / Poster Creators choose appropriate fonts to emphasize the content.
Related Work : Influence to Content Not only in comics and on posters, we want to make music videos richer.
Related Work : Selecting suitable font Selecting existing font FontMatcher [Choi et al. 2018] The system suggests suitable font according to input image for designing. The fonts suggested by the system were evaluated equally score to font selected by designer.
Related Work : Character Design Creating new font Fusing a basic font with any font [Suveeranont et al. 2010] Fusing fonts with similar shapes on a two-dimensional map [Campbell et al. 2014] They focus only on the shape of font.
Our purpose Enhancing impression Selecting a font Creating new font We want to make font according to the impression of the music video and generate any font by blending some font rather than selecting existing font.
Our purpose We want to make viewing experience of music video richer with font. To realize this... We investigated whether blended fonts can enhance the impression compared with the existing fonts.
Proposed method Our final destination Font Blending system 1 Input music video 2 Estimating the impression 3 Generating blended font by fusing 4 appropriate fonts output
Proposed method Our actual method Font Blending system Input music video We constructed impression dataset. We applied Saito's method to blend fonts. output
Dataset construction All participants were Japanese. So we used Japanese text and music videos. 19 participants They read a text in a certain font and watched music videos and evaluated impressions of C1 ~ C6 on a 5-point scale. C1 Grand C2 Vigorous C3 Sad C4 Violence C5 Funny C6 Cute
14 fonts used in this study fonts used in Saito's method fonts added by our in this study
Impression classes 5 impression classes from MIREX [Music Information Retrieval Evaluation eXchange] + "cute" [Yamamoto et al. 2016] Adjectives C1 grand, massive C2 vigorous, exciting C3 sad, painful C4 violence, aggressive C5 funny, unique C6 cute, lovely
Proposed method Fontender [Saito et al. 2016] 2 impression classes Input from user Fusing 4 nearest fonts Result of blended Cute Grand
Proposed method Our method 6 impression classes Impression of a music video as input Fusing 4 nearest fonts Result of blended Cute Grand
Blending algorithm 1. We expressed a character as numerical formula. stroke x = a0 cos 0t + b0 sin 0t + a1 cos 1t + ... y = c0 cos 0t + d0 sin 0t + c1 cos 1t + ... https://nkmr-lab.github.io/Char2Fourier/
Blending algorithm 1. We expressed a character as numerical formula. stroke x = a0 cos 0t + b0 sin 0t + a1 cos 1t + ... y = c0 cos 0t + d0 sin 0t + c1 cos 1t + ... diameter z = e0 cos 0t + f0 sin 0t + e1 cos 1t + ... By regarding a character as a set of circles, we can express a stroke weight.
Blending algorithm 2. Blended font was generated by weighting and averaging coefficient. stroke x = a0 cos 0t + b0 sin 0t + a1 cos 1t + ... y = c0 cos 0t + d0 sin 0t + c1 cos 1t + ... diameter z = e0 cos 0t + f0 sin 0t + e1 cos 1t + ...
Blending algorithm Number of fonts : 4 These 4 fonts were the nearest to the impression of the music video. The closer to impression of the music, the higher the rate of blending. 6 dimensional space around 6 impression classes
Blending algorithm rate of blending The closer to impression of the music, the higher the rate of blending. Font A Font B
Demo Music Video Impression -> Grandness, Violence bold Music Video Impression -> Sadness, Cuteness thin rounded
Experiment All participants were Japanese. So we used music videos with lyrics in Japanese. 19 Participants 54 videos (applying 3 fonts to each of the 18 videos) Proposed method Blended font Blending 4 fonts on the basis of impression value Baseline method Neighborhood font An existing font determined to be the closest to the music video impression Impression_0 font A font generated with a fusion input value as 0
impression_0 font Baseline method Neighborhood font An existing font determined to be the closest to the music video impression Impression_0 font A font generated with a fusion input value as 0
Experiment 19 Participants 54 videos (applying 3 fonts to each of the 18 videos) C1 (grandness) C2 (vigorousness) C3 (sadness) C4 (violence) C5 (funniness) C6 (cuteness) -2 +2 Does the font match the music video? -2 +2
Result (matching degree) High matching degree C2(vigorous), C6(cute) Low matching degree C1(grand), C4(violence)
Investigation why inappropriate font generated Was the selected font appropriate? Did the degree of enhancement depend on impression classes? Did the font really match?
Investigation why inappropriate font generated Was the selected font appropriate? Did the degree of enhancement depend on impression classes? Did the font really match?
Was the selected font appropriate? Graph with font impression values sorted in ascending order by impression class C3 (sad) C4 (violence)
Was the selected font appropriate? Graph with font impression values sorted in ascending order by impression class C3 (sad) C4 (violence) The numbers of highly rated fonts were extremely small.
Was the selected font appropriate? C3 (sad) The same font was selected every time. There is a possibility that the same font was selected every time with the music videos with high impression value of C3(sad) and C4(violence). The result of the blending might be biased. -> We should select appropriate font considering variation of impression value.
Investigation why inappropriate font generated Was the selected font appropriate? Did the degree of enhancement depend on impression classes? Did the font really match?
Did the degree of enhancement depend on impression classes? Distribution of font impression value is uneven. We inspected which impression classes tend to be enhanced.
Did the degree of enhancement depend on impression classes? C3(sadness), C5(funniness), C6(cuteness) -> expressible by font design C1 (grandness), C2(vigorousness), C4(violence) -> difficult to express
Investigation why inappropriate font generated Was the selected font appropriate? Did the degree of enhancement depend on impression classes? Did the font really match?
Analysis focusing on matching degree In our method, we blend 4 fonts... This font has unnecessary impression ! blending Low matching degree font !
To prevent from generating inappropriate font 6 dimensional space around 6 impression classes Setting threshold distance for blending We plan to avoid fusing fonts that are too far apart.
Analysis focusing on matching degree Low matching degree font ! We want to prove utility of blended font. Analysis for only high matching degree font is needed.
Result The degree of matching with the music video is greatly affected by blended fonts. The degree of enhancement was calculated by computing the difference based on value of impression_0 font. High matching degree Low matching degree
Result When using existing font, degree of enhancement is almost constant. The degree of enhancement was calculated by computing the difference based on value of impression_0 font. High matching degree Low matching degree
Result When using blended font, degree of enhancement is affected by degree of matching. The degree of enhancement was calculated by computing the difference based on value of impression_0 font. High matching degree Low matching degree
Analysis focusing on matching degree If the font matches the mood of the music video. -> It can enhance the impression. If the font doesn't match the mood of the music video. -> It suppresses the impression rather.
Application example We will suggest application examples... Applying our method to karaoke with font reflecting impression There is a possibility that users can sing more emotionally. We want to link with a system that automatically estimates impressions of lyrics and music videos It enables to generate blended font automatically and efficiently.
Challenges / Future work Conducting experiments have also revealed some challenges. It is necessary to select a font considering variations of impression values. We should select fonts to fill the lack of C3(sad) and C4(violence). We should try other showing way and fusing parameter. We want to use color, animation in the future. Also, it may be effective to use content of lyric. [Sato et al. 2016]
Limitation This fusion method cannot be used when the number of strokes or stroke order is different. result of averaging
Summary Purpose To investigate whether impression can be emphasized Result C3(sadness), C5(funniness), C6(cuteness) -> expressible by font design C1 (grandness), C2(vigorousness), C4(violence) -> difficult to express The degree of matching is greatly affected by the blended font.