777 Views
September 13, 24
スライド概要
Spoilers of sports matches reduce the enjoyment of time-shifted viewing. On YouTube, users who like sports often inadvertently know the outcomes of matches by seeing thumbnails of recommended sports videos. Therefore, this paper focused on YouTube video thumbnails and verified the possibility of detecting images that contain spoiler information on YouTube. We constructed a dataset of sports spoiler images comprising 4,531 thumbnails from baseball, soccer, and basketball. In addition, we proposed three detection methods: the Image-Recognition method using optical character recognition (OCR), emotion assessment, and posture assessment; the Vision-Direct method using the OpenAI Vision API only; and the Vision-Text method that judges using the spoiler dictionary for an image’s description by the OpenAI Vision API. We evaluated the accuracy of these methods, and our results indicated that the Vision-Text method achieved an accuracy of 85% in detecting spoiler images. Furthermore, the evaluation results indicated that the Vision-Text method might be the most effective for detecting spoiler images in baseball and soccer. In contrast, the Vision-Direct method seems to be the most effective in basketball.
明治大学 総合数理学部 先端メディアサイエンス学科 中村聡史研究室
2024.09.13 CollabTech 2024 Detecting Sports Spoiler Images on YouTube Yuichiro Kinoshita, Takumi Takaku, Satoshi Nakamura Meiji University, Japan 1
Which Team Won? 2
Spoiler Problems Spoilers: Information that reveals key details of content Spoilers negatively impact people’s experiences. • In movies: Reduce the desire to watch the film [Tsang+ 2009] • In comics: Decrease the interest in continuing the story [Maki+ 2018] • In sports: Diminish the enjoyment of watching the game [Shiratori+ 2018] This study focuses on spoilers in sports 3
Online Sports Viewing Many people prefer to watch matches in real time. Time differences or personal reasons often hinder live viewing. People use recordings or rebroadcasts. Time differences At 8 p.m. in Spain At 3 a.m. in Japan 4
Spoilers in Sports When watching a match later, there is a risk of encountering spoilers about the match outcome. Spoilers reduce tension and enjoyment [Shiratori+ 2018] W.Davis @spoilertweets・1m GOAL!! Spain2-1France #EURO2024 Spoilers on news sites (https://edition.cnn.com) Spoilers on Twitter (X) 5
Related Work on Sports Spoilers Blocking spoilers on the Web by masking the text[Nakamura+ 2012] Detecting Twitter (X) posts containing spoilers about baseball [Sasano+ 2019] and football [Jeon+ 2016] [Shiratori+ 2018] Previous studies have focused on text-based spoilers 6
Spoilers in Images The match outcome can be revealed through images. No method for preventing spoilers through images Displaying the final score Showing the standout player 7
Study Goal & Approach Study goal: Preventing image-based spoilers We verified the possibility of detecting spoiler images by ①Constructing a dataset of sports spoiler images ②Analyzing the characteristics of spoiler images ③Proposing and evaluating the performance of three methods for detecting spoiler images 8
Targeted Media Focused on YouTube thumbnails of sports highlight videos Why YouTube Thumbnails? Avoiding spoilers on YouTube is especially difficult due to its recommendation algorithm. 9
Encountering Spoilers on YouTube If you often watch FC Barcelona videos, YouTube is likely to recommend FC Barcelona videos in any situation. Encountering spoilers through the thumbnails of recommended videos Spoiler!! 10
Dataset Construction Focused on baseball, football, and basketball Collecting thumbnails from 13 YouTube channels using YouTube Data API Totaling 4,531 thumbnails (around 1,500 for each sport) 11
Definition of Spoiler Images The criteria for determining spoilers vary among individuals. There are various definitions of spoilers, but the definition varies depending on the study [Maki+ 2018] [Shiratori+ 2018] We defined spoiler images as images that enable the prediction of match outcomes based on the results of a preliminary annotation. 12
Levels of Spoilers The level of spoilers varies depending on the image. • Including the final score • Clearly revealing the outcome • Showing players only • Merely allowing speculation about the result 13
Establishing Three Spoiler Levels • Match outcomes cannot be predicted • Match outcomes can be somewhat predicted • Match outcomes are clearly predicted Annotating images by choosing one from three spoiler levels 14
Spoiler Label Annotation Each of the 3 annotators evaluated all 4,531 images. We determined an image as a spoiler when: • 2 or more annotators selected “Match outcomes can be somewhat predicted” • At least 1 annotator selected “Match outcomes are clearly predicted” 15
Annotation Results The label agreement rate: 0.78 The proportion of spoiler images: 0.24 Number of images Proportion of spoilers Baseball 1,506 0.19 Football 1,620 0.58 Basketball 1,405 0.20 Higher proportion compared to other sports 16
Spoiler and Non-Spoiler Images Spoiler images Non-spoiler images 17
Four Characteristics of Spoiler Images • The final match outcome is displayed • Players’ expressions include smiling or shouting • Players strike poses that express joy or excitement • Players from the same team gather to celebrate 18
Detection Methods of Spoiler Images Based on the characteristics of spoiler images, we propose 3 detection methods: • Image-Recognition Method • Vision-Direct Method • Vision-Text Method 19
Image-Recognition Method Leveraging the Google Cloud Vision API and YOLOv8 • OCR: Identifying scores • Emotion assessment: Detecting smiles or shouting • Pose assessment: Recognizing raising or spreading arms OCR Emotion assessment Pose assessment
Vision-Direct Method Leveraging the OpenAI Vision API (gpt-4-vision-preview) Using the API’s responses directly as the detection result Vision Spoiler / Non-Spoiler Prompt Please analyze this YouTube video thumbnail and determine if it is a spoiler or nonspoiler image. Define a spoiler image as one that reveals the outcome of an event, characterized by the presence of a score or result-related words, players exhibiting emotions of joy or triumph, such as smiling or cheering poses. If the image is a spoiler, respond with ‘Spoiler.’ If it is a non-spoiler, respond with ‘Non-spoiler.’ Do not output 21
Vision-Text Method Leveraging the OpenAI Vision API (gpt-4-vision-preview) Converting images to text and determining spoiler images through word matching with a spoiler dictionary Vision Word matching The image shows … Spoiler / Non-spoiler Prompt This image is a thumbnail for a YouTube video. Please describe this image. 22
Evaluation of Proposed Methods Evaluating the performance of our proposed methods using the constructed dataset Evaluation metrics • Recall • Precision • F1 score 23
Results Using the Entire Dataset Image-Recognition Recall Precision F1 score 0.90 0.39 0.55 Vision-Direct Vision-Text 0.72 0.75 0.74 0.80 0.76 0.78 Vision-Text > Vision-Direct > Image-Recognition (F1 score) Image-Recognition: The highest recall but lowest precision Determining most images as spoilers 24
Results for Each Sport (F1 Score Only) Image-Recognition Baseball Football Basketball 0.39 0.77 0.35 Vision-Direct Vision-Text 0.57 0.78 0.82 0.65 0.83 0.75 Vision-Text: Top F1 score in baseball & football Vision-Direct: Top F1 score in basketball F1 scores varied significantly across different sports. Football tended to have higher F1 score than other sports. 25
Variations in Results Across Sports Scoring frequency might have influenced the differences in results. Football Low scoring frequency The team that scored is likely to win Images capturing scoring scenes are likely considered spoilers Baseball & Basketball High scoring frequency Difficult to predict match results Need to consider the context in which the points were scored 26
Misdetection of Image-Recognition High recall and low precision Identifying most images as spoilers A very high number of false positives due to overly simplistic detection of poses expressing joy or excitement 27
Misdetection of Vision-Direct Seems to be an effective method (F1 score: 0.74) Configuring the prompt to output only detection results Unable to clarify the causes of misdetection False positive False negative 28
Misdetection of Vision-Text Appears to be the most effective method (F1 score: 0.78) Some false positives due to the reliance on word matching The image is showing a scene from a soccer match. …Their dynamic postures suggest they are either running or about to make contact with the ball. …The vivid imagery, team details, and action pose of the players are all aimed at capturing the excitement of the game. Example of output (false positive) 29
Future Work Improving the accuracy of the Vision-Text method Word matching outputs Evaluating the number and frequency of words from the spoiler dictionary in Expanding the dataset Increasing the variety of sports Collecting images from news sites 30
Summary Background: Spoilers through seeing YouTube thumbnails Purpose: Detecting sports spoiler images Methods: Image-Recognition, Vision-Direct, & Vision- Text Results: Vision-Text achieved the highest F1 score at 0.78 Future Work: Improving the accuracy of the Vision-Text Expanding the dataset 31