DIS-CO: Discovering Copyrighted Content in VLMs Training Data

1Carnegie Mellon University, 2Instituto Superior Técnico 3UC Berkeley

Motivated by the hypothesis that a VLM is able to recognize images from its training data, we propose DIS-CO, a novel approach to infer the inclusion of copyrighted content during the model's development.

Introduction

Large Vision-Language Models (VLMs) are trained on vast datasets scraped from the web, often with little transparency regarding data sources. This raises ethical and legal concerns, particularly when copyrighted content is suspected of having been used.

So how can we verify whether a VLM has seen a specific copyrighted work without access to its training data?

The Key Idea:

In a black-box setting, where model attributes like token probabilities are inaccessible, the most reliable way to determine whether a model has seen specific content is to make it reveal knowledge that goes beyond general understanding.
To achieve this, we need a task designed with two key properties:

  • The success rate is low if the target content was not in training. This helps reduce false positives and ensures that models are not merely guessing correctly by chance.
  • A high success rate indicates the target content was in training. If a model consistently provides correct responses, it strongly suggests prior exposure to that content.
We propose a Frame-to-Title Prediction task. Simply put, if a model was trained on a particular movie, it should be able to recognize frames from that movie. If it wasn’t, it should have a much harder time identifying them.

If you are still unsure about this idea, below, you'll find an interactive challenge where you can try to perform the same task the models did. Can you score better than GPT-4o? 👀

Movie Guessing Quiz

Movie Frame

Whether you completed the full quiz or just explored a few examples, we hope the key idea came accross: this task isn't easy! And for the cases you got right, think about it: had you seen those movies before? 🤔

What we found particularly intriguing is that even with these highly neutral images, the models still manage to recognize them pretty regularly.

This surprising ability led us to investigate the phenomenon more systematically, ultimately shaping our approach and leading to the development of DIS-CO.

DIS-CO

We begin by assembling our MovieTection benchmark, where we collect 14,000 movie frames, categorized into main and neutral types to introduce varying levels of difficulty. For each frame, we also generate a corresponding caption using the Qwen2-VL 7B model.

Models are then queried with both the image frames and their corresponding captions, generating free-form predictions for each. We can then refine our detection of suspect content by excluding cases where the image-based predictions overlap with caption-based ones.

Our rationale is that when a model correctly identifies a movie based solely on its caption, it suggests that the frame could be highly representative of the movie. So much so that even a textual description provides enough clues for the model to make an accurate guess using its general knowledge acquired from public data (i.e. OpenSubtitles), rather than relying on memorization.

Finally, when deciding whether a suspect movie was part of a model’s training data, we compare its task performance against a baseline value reflecting general movie knowledge. If a movie’s recognition rate is significantly higher than expected - especially after removing cases where captions alone were sufficient for identification - it strongly suggests the model may have been exposed to that content during training.

Key Findings

DIS-CO is the better approach ⭐

  • Suspect
  • Clean
Accuracy scores on MovieTection
Suspect Split - Neutral Frames
GPT-4o Gemini-1.5 Pro LLaMA-3.2 90B Qwen2-VL 72B
Captions 0.128 0.079 0.078 0.075
MCQA 0.721 0.550 0.540 0.617
⌊DIS-CO⌋ 0.226 0.152 0.134 0.122
DIS-CO 0.338 0.209 0.176 0.176
Note. The two DIS-CO variants differ in whether the overlap between caption-based and image-based predictions was removed. ⌊DIS-CO⌋ applies this removal, while DIS-CO considers all frames.
  • Multiple-Choice (MCQA) achieves high accuracy on suspect movies but also presents false positives on clean movies. This tradeoff reduces its reliability, as it misclassifies clean content as suspect.
  • Captions are weak indicators of memorization. Their suspect-level accuracy remains low across models, with Qwen2-VL 72B performing below 10%, highlighting their limitations.
  • Both DIS-CO variants improve detection for suspect movies, while clean movies are hardly guessed, indicating that DIS-CO is false-positive free.

Popular / Higher-Quality movies are more easily recognized 📈

  • Our analysis shows that box-office success correlates with the models' performance on the task, suggesting that popular movies are more frequently included in training data.
  • Higher IMDb-rated movies also tend to be more recognizable, indicating that quality can also be seen as a proxy for the likelihood of memorization.

It's possible to prevent memorization leaks 🙊

  • We found that replacing a movie's name with a neutral label (e.g., "Copyrighted Content") and fine-tuning on this label effectively suppresses memorization disclosure in our task.
  • A small number of fine-tuning frames (~600) is sufficient to suppress memorization of a specific movie, making this a practical and efficient mitigation strategy.

BibTeX

@misc{duarte2025disco,
        title={{DIS-CO: Discovering Copyrighted Content in VLMs Training Data}}, 
        author={André V. Duarte and Xuandong Zhao and Arlindo L. Oliveira and Lei Li},
        year={2025},
        eprint={2502.17358},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2502.17358}, 
  }