A new approach to data differential privacy based on regression models under heteroscedasticity with applications to machine learning repository data

Manchini, Carlos; Ospina, Raydonal; Leiva, Victor; Martin-Barreiro, Carlos

Abstract

Generation of massive data in the digital age leads to possible violations of individual pri-vacy. The search for personal data becomes an increasingly recurrent exposure today. The present work corresponds to the area of differential privacy, which guarantees data confi-dentiality and robustness against invasive identification attacks. This area stands out in the literature for its rigorous mathematical basis capable of quantifying the loss of privacy. A differentially private method based on regression models was developed to prevent inver-sion attacks while retaining model efficacy. In this paper, we propose a novel approach to improve the data privacy based on regression models under heteroscedasticity, a common aspect, but not studied, in practical situations of differential privacy. The influence of pri-vacy restriction on the statistical performance of the estimators of model parameters is evaluated using Monte Carlo simulations, including a study of performance associated with test rejection rates for the proposed approach. The results of the numerical evaluation show high inferential distortion for stricter privacy restrictions. Empirical illustrations with real-world data are presented to show potential applications.(c) 2022 Elsevier Inc. All rights reserved.

Más información

Título según WOS: A new approach to data differential privacy based on regression models under heteroscedasticity with applications to machine learning repository data
Título de la Revista: INFORMATION SCIENCES
Volumen: 627
Editorial: Elsevier Science Inc.
Fecha de publicación: 2023
Página de inicio: 280
Página final: 300
DOI:

10.1016/j.ins.2022.10.076

Notas: ISI