I am a 2nd year PhD student on the Dual Degree program between Carnegie Mellon University & Instituto Superior Técnico, supervised by Prof. Lei Li and Prof. Arlindo Oliveira.
Before starting my PhD, I completed my undergraduate studies in Electrical and Computer Engineering and a master's in Data Science, both at Instituto Superior Técnico.
My research in mainly focused on Security and Privacy of GenAI models, with an enphasis on Membership Inference Attacks. In other words: Trying to figure out wheter specific data was used to train a certain model!
I'm always happy to connect and collaborate. If you work in a similar area or see potential for us to work together, feel free to reach out :)
",
which does not match the baseurl
("
") configured in _config.yml
.
baseurl
in _config.yml
to "
".
André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li
Under review, 2025
DIS-CO identifies copyrighted content in VLMs training data by showing that models can link movie frames to their titles in a free-form text generation setting, even when the frames are highly challenging, suggesting prior exposure during training.
André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li
Under review, 2025
DIS-CO identifies copyrighted content in VLMs training data by showing that models can link movie frames to their titles in a free-form text generation setting, even when the frames are highly challenging, suggesting prior exposure during training.
André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li, Arlindo L. Oliveira
Findings EMNLP 2024
LumberChunker is a document segmentation method using LLMs to enhance retrieval by creating contextually coherent, variable-sized content chunks.
André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li, Arlindo L. Oliveira
Findings EMNLP 2024
LumberChunker is a document segmentation method using LLMs to enhance retrieval by creating contextually coherent, variable-sized content chunks.
André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li, Arlindo L. Oliveira
Proceedings of ICML 2024 ; Best Scientific Paper Award at Responsible AI Forum
DE-COP is a novel method to identify copyrighted content present in LLM training datasets. It works by showing that a model can recognize exact text excerpts if they were seen during training. It is applicable to models with/without logit outputs.
André V. Duarte, João Marques, Miguel Graça, Miguel Freire, Lei Li, Arlindo L. Oliveira
Proceedings of ICML 2024 ; Best Scientific Paper Award at Responsible AI Forum
DE-COP is a novel method to identify copyrighted content present in LLM training datasets. It works by showing that a model can recognize exact text excerpts if they were seen during training. It is applicable to models with/without logit outputs.