The Limits of Privacy-Preserving Data Publishing

Date 01/03/2024 - 31/12/2024
Type Policy
Author Theresa Stadler, Carmela Troncoso, Melanie Kolbe-Guyot
EPFL Laboratory Security and Privacy Engineering Laboratory (SPRING)

As the European Union advances towards the creation of planned data spaces and the ongoing open data movement, it has become imperative to reevaluate the current landscape of data privacy-policies and publishing practices. This policy paper identifies a fundamental problem in the way sensitive data is published and protected, arguing for a necessary paradigm shift in the approach to data privacy.

At the core of the main issue is the misconception surrounding Privacy Enhancing Technologies (PETs) and synthetic data. These technologies promise privacy preservation, yet the paper exposes their inherent limitations and false assurances through empirical scrutiny. PETs are over-relied upon, even as evidence demonstrates that no technology, including anonymization, can entirely prevent the fundamental leakage of sensitive data during analysis.

Current policies focus on specific technologies but fail to recognize these fundamental limits. There is a trade-off between data utility and privacy that cannot be fully mitigated, highlighting the inherent compromises that anonymization technologies must navigate. Consequently, the concept of “open data” without constraints or defined analysis functions with strong privacy guarantees is not fully attainable.

The management of data brings systemic responsibilities, with the paper noting that synthetic anonymization solutions are not only technically challenging but also inconsistently effective. The paper urges a reconsideration of the distribution of responsibilities, emphasizing the importance of a more comprehensive control over the entire data supply chain. Data managers often bear the brunt, while technology producers may not be held accountable for the promises they make.

This paper argues that the continued emphasis on technology-specific privacy solutions presents clear risks. Relying on the practitioner’s discretion to define and evaluate risks is shown to be a flawed approach. Instead, policy options should include the adoption of procedural standards and thorough case-by-case assessments of privacy and utility trade-offs.

To address these challenges, the paper provides key policy recommendations focused on shifting the paradigm from technology dependence to the intrinsic nature of data-use cases. It proposes developing an evaluation framework that prioritizes data utility while acknowledging fundamental leakage. Policy recommendations also encourage a strategic pivot towards nuanced risk and leakage assessments, emphasizing the need for systemic transparency and restructuring accountability throughout the data supply chain.

This policy paper concludes by reiterating the urgent need for a paradigm shift in the way data-sharing policies are conceptualized and enacted. In the spirit of an informed and pragmatic approach, it calls on data protection authorities and publishers to redefine the role of data publishers in the privacy ecosystem, to rethink technology recommendations, and to align data protection strategies with the empirical realities of privacy trade-offs. The overarching goal is to create a more holistic and accountable data privacy framework that respects individual rights while maintaining the integrity and utility of shared data.