Researchers have developed a system that allows data owners to regulate how much of their privacy may be breached when personal information is being analyzed.
The novel system, APEx, also lessens the burden on data scientists who traditionally have had to compromise the accuracy of their analysis to give their clients certain privacy guarantees.
APEx translates data scientists’ queries and data owners’ accuracy boundaries into the appropriate private mechanism – a rigorous mathematical definition of privacy which by looking at the output it cannot be determined whether any individual's data was included in the original dataset. The mechanism then incurs the least privacy leakage possible and returns a noisy answer to the data scientist who would have specified beforehand their accuracy guarantee.
“While general purpose differentially private query answering systems exist, they are not really meant to support interactive querying, and they fall short in two key respects,” said Chang Ge, a PhD candidate in Waterloo’s David R. Cheriton School of Computer Science.
“In order to achieve high accuracy, the analyst has to be familiar with the privacy literature to understand how the system adds noise and to identify if the desired results can be achieved in the first place. And somewhat ironically, these systems do not provide any guarantees to the data analyst on the quality they really care about, namely correctness of query answers.”
APEx solves these two issues by choosing the suitable private mechanism with the least privacy loss that answers an input query under a specified accuracy guarantee. Data analysts can then reliably explore data while ensuring a provable guarantee of privacy to data owners.
In developing APEx, Ge, professor Ihab Ilyas and assistant professor XI He of Waterloo’s Cheriton School of Computer Science, and Duke University’s Associate Professor Ashwin Machanavajjhala conducted a comprehensive empirical evaluation on real datasets with query and application benchmarks. They found that many types of jobs do not require 100 per cent precise answers; 90 per cent correctness might be good enough for the data scientist to finish their jobs.
“This system could help prevent future data breaches if policymakers were to pass legislation that would require APEx to be implemented by companies,” said Ge. “The policymaker will determine the privacy budget for a particular dataset. Once this is determined you can just leave the rest to APEx and customers could, in turn, be more confident that their data is protected.”
The paper detailing the new system titled “APEx: Accuracy-Aware Differentially Private Data Exploration,” authored by Ge, Ilyas and He of Waterloo’s Faculty of Mathematics and Duke University’s Machanavajjhala is slated to be presented at the 2019 SIGMOD conference in June.