ACS findings highlights the challenges of preserving users’ privacy in a connected world.

While many organisations sharing data to improve services might be struggling to protect individual privacy, however a project by the Australian Computer Society (ACS) has identified five ways governments and businesses can reduce risks.

A three week ACS Directed Ideation competition, similar to a hackathon, saw teams explore how to secure privacy while sharing data and led to a report laying out a framework for preserving individuals’ personally identifiable information.

With smart services for homes, factories, cities, and governments rolling out, the opportunity to create locally optimised or individually personalised services becomes more realistic however that data sharing also presents a number of privacy-based challenges.

The report’s findings from the three week competition included:

  • Conclusion 1: The use case for data strongly influences the risk framework for data safety and the methods (aggregation, generalisation, obfuscation, perturbation) appropriate for increasing data safety. 
  • Conclusion 2: It’s feasible to develop a meaningful measure for Personal Information Factor (PIF). Information theoretic metrics such as PIF show promise as a way to measure the privacy risk of unit record data for aggregated, generalised or obfuscated data, and can be enhanced to cover perturbed data. Additional work would need to be done to relate the privacy risk metric to the legal definition of privacy, and the assumed attacker model. 
  • Conclusion 3: It’s feasible to develop a meaningful measure of Relative Utility for datasets which have been protected through aggregation, generalisation, obfuscation and perturbation. Information Theoretic metrics based on Mutual Information (between original and protected datasets) show promise. 
  • Conclusion 4: Dealing with “trajectories” is a critical problem for the release and use of datasets. Development of means to deal with datasets linked to form a trajectory are possible. The methods explored shows some promise, however the complexity of the approaches may limit real-world implementation. 
  • Conclusion 5: Understanding the relationship between different features in a dataset helps to identify those features which pose the highest risk of reidentification and those which have the greatest impact on utility after protection methods are applied.

The 2019 program explored aspects of the 2018 ACS Technical Whitepapers on Privacy preserving Data Sharing and further developed frameworks explored in a similar event in February 2019. Both events attempted to develop a Personal Information Factor (PIF) and developing Data Safety factors based on the description in the technical whitepapers. The PIF and frameworks were evaluated on open datasets and synthetic datasets.

NSW Chief Data Scientist, Dr Ian Oppermann, said there was still a long way to go before this problem is completely solved.

“We had teams throw incredible intellectual effort, a lot of computing and quite a lot of really creative intelligence against a problem which is complex and subtle,” Oppermann said.

“We turned the dial another half-a-turn and that has helped inform a lot of what we will do going forwards, but clearly there’s a lot more work to be done.” 

The full report can be downloaded from the ACS website.