Proceedings of the Second Workshop on Privacy in NLP

Oluwaseyi Feyisetan, Sepideh Ghanavati, Shervin Malmasi, Patricia Thaine (Editors)


Anthology ID:
2020.privatenlp-1
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | PrivateNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2020.privatenlp-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the Second Workshop on Privacy in NLP
Oluwaseyi Feyisetan | Sepideh Ghanavati | Shervin Malmasi | Patricia Thaine

pdf bib
Surfacing Privacy Settings Using Semantic Matching
Rishabh Khandelwal | Asmit Nayak | Yao Yao | Kassem Fawaz

Online services utilize privacy settings to provide users with control over their data. However, these privacy settings are often hard to locate, causing the user to rely on provider-chosen default values. In this work, we train privacy-settings-centric encoders and leverage them to create an interface that allows users to search for privacy settings using free-form queries. In order to achieve this goal, we create a custom Semantic Similarity dataset, which consists of real user queries covering various privacy settings. We then use this dataset to fine-tune a state of the art encoder. Using this fine-tuned encoder, we perform semantic matching between the user queries and the privacy settings to retrieve the most relevant setting. Finally, we also use the encoder to generate embeddings of privacy settings from the top 100 websites and perform unsupervised clustering to learn about the online privacy settings types. We find that the most common type of privacy settings are ‘Personalization’ and ‘Notifications’, with coverage of 35.8 % and 34.4 %, respectively, in our dataset.

pdf bib
Differentially Private Language Models Benefit from Public Pre-training
Gavin Kerrigan | Dylan Slack | Jens Tuyls

Language modeling is a keystone task in natural language processing. When training a language model on sensitive information, differential privacy (DP) allows us to quantify the degree to which our private data is protected. However, training algorithms which enforce differential privacy often lead to degradation in model quality. We study the feasibility of learning a language model which is simultaneously high-quality and privacy preserving by tuning a public base model on a private corpus. We find that DP fine-tuning boosts the performance of language models in the private domain, making the training of such models possible.