Differentially Private Machine Learning: Theory, Algorithms, and Applications

Kamalika Chaudhuri, Dept. of Computer Science and Engineering, UC San Diego
Anand D. Sarwate, Dept. of Electrical and Computer Engineering, Rutgers University

A tutorial given at the 2017 Neural Information Processing Systems (NIPS)
December 4-9, 2017
Long Beach, CA, USA.

Abstract

Differential privacy has emerged as one of the de-facto standards for measuring privacy risk when performing computations on sensitive data and disseminating the results. Algorithms that guarantee differential privacy are randomized, which causes a loss in performance, or utility. Managing the privacy-utility tradeoff becomes easier with more data. Many machine learning algorithms can be made differentially private through the judicious introduction of randomization, usually through noise, within the computation. In this tutorial we will describe the basic framework of differential privacy, key mechanisms for guaranteeing privacy, and how to find differentially private approximations to several contemporary machine learning tools: convex optimization, Bayesian methods, and deep learning.

Downloads and links

View the slides from the tutorial.
Video from the tutorial should be coming soon.
(Partial) list of references (.bib format).

Biographies

Kamalika Chaudhuri received a Bachelor of Technology degree in Computer Science and Engineering in 2002 from Indian Institute of Technology, Kanpur, and a PhD in Computer Science from University of California at Berkeley in 2007. She held a postdoctoral researcher position at the Information Theory and Applications Center at UC San Diego from 2007-2009, and a postdoctoral researcher position in UC San Diego from 2007 to 2010. In July 2010, she joined the CSE department at UC San Diego as an assistant professor. She received an NSF CAREER Award in 2013 and a Hellman Faculty Fellowship in 2012. Kamalika's research interests are in the statistical foundations of machine learning. She is particularly interested in privacy-preserving machine learning, which addresses how to learn good models and predictors from sensitive data, while preserving the privacy of individuals.

Anand D. Sarwate is Assistant Professor in the Department of Electrical and Computer Engineering at Rutgers, the State University of New Jersey. He has B.S. degrees in Mathematics and Electrical Science and Engineering from MIT (2002) and M.S. and Ph.D. regress from UC Berkeley (2005, 2008). Before joining Rutgers he was a postdoc at the ITA Center at UCSD a Research Assistant Professor at TTI-C. He received the NSF CAREER award in 2015. Anand's interests are in information theory, machine learning, and signal processing, with applications to distributed systems, privacy and security, and biomedical research.

Support

The work of K. Chaudhuri was supported by the NSF under IIS-1253942, ONR under N00014-16-1-2616 and a Google Faculty Research Award. The work of A.D. Sarwate was supported by the NSF under CCF-1453432 and SaTC-1617849, the NIH under 1R01DA040487-01A1, and DARPA and US Navy under contract N66001-15-C-4070.

References

  1. M. Abadi, A. Chu, I. Goodfello, H.B. McMahan, I. Mironov, K. Talwar, L. Zhang, Deep Learning with Differential Privacy, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communication Security (CCS '16), pp. 303--318, October 24--28 2016. [BibTeX entry]
  2. L. Backstrom, C. Dwork, J.M. Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, WWW, 2007. [BibTeX entry]
  3. R. Bassily, K. Nissim, A. Smith, T. Steinke, U. Stemmer, J. Ullman, Algorithmic Stability for Adaptive Data Analysis, Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing (STOC '16), pp. 1046--1059, June 19--21 2016. [BibTeX entry]
  4. R. Bassily, A. Smith, A. Thakurta, Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds, Proceedings of the 55th Annual Symposium on the Foundations of Computer Science (FOCS '14), October 18--21 2014. [BibTeX entry]
  5. R. Bassily, A. Smith, A. Thakurta, Private Empirical Risk Minimization, Revisited, ArXiV report number arXiv:1405.7085 [cs.LG], May, 2014. [BibTeX entry]
  6. A. Beimel, H. Brenner, S.P. Kasiviswanathan, K. Nissim, Bounds on the Sample Complexity for Private Learning and Private Data Release, Machine Learning 94(3): pp. 401--437, March 2014. [BibTeX entry]
  7. O. Bousquet, A. Elisseeff, Stability and Generalization, Journal of Machine Learning Research 2: pp. 499--526, 2002. [BibTeX entry]
  8. K. Chaudhuri, S.A. Vinterbo, A Stability-based Validation Procedure for Differentially Private Machine Learning, Advances in Neural Information Processing Systems 26, December 2013. [BibTeX entry]
  9. K. Chaudhuri, C. Monteleoni, A.D. Sarwate, Differentially Private Empirical Risk Minimization, Journal of Machine Learning Research 12: pp. 1069--1109, March 2011. [BibTeX entry]
  10. J.C. Duchi, M.I. Jordan, M.J. Wainwright, Local Privacy and Statistical Minimax Rates, Proceedings of the 54th Annual Symposium on the Foundations of Computer Science (FOCS '13), October 26--29 2013. [BibTeX entry]
  11. C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, M. Naor, Our Data, Ourselves: Privacy Via Distributed Noise Generation, Advances in Cryptology - EUROCRYPT 2006, 2006. [BibTeX entry]
  12. C. Dwork, G. Rothblum, S. Vadhan, Boosting and Differential Privacy, Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS '10), pp. 51--60, October 2010. [BibTeX entry]
  13. C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating Noise to Sensitivity in Private Data Analysis, Theory of Cryptography, pp. 265--284, March 4--7 2006. [BibTeX entry]
  14. C. Dwork, J. Lei, Differential Privacy and Robust Statistics, Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC '09), 2009. [BibTeX entry]
  15. K. Fukuchi, Q.K. Tran, J. Sakuma, Differentially Private Empirical Risk Minimization with Input Perturbation, ArXiV report number arXiv:1710.07425 [stat.ML], October, 2017. [BibTeX entry]
  16. P. Jain, A.G. Thakurta, (Near) Dimension Independent Risk Bounds for Differentially Private Learning, Proceedings of the 31st International Conference on Machine Learning, 2014. [BibTeX entry]
  17. P. Kairouz, S. Oh, P. Viswanath, The Composition Theorem for Differential Privacy, IEEE Transactions on Information Theory 63(6): pp. 4037--4049, June 2017. [BibTeX entry]
  18. F. McSherry, K. Talwar, Mechanism Design via Differential Privacy, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS '07), pp. 94--103, October 2007. [BibTeX entry]
  19. I. Mironov, Rényi Differential Privacy, Proceedings of IEEE 30th Computer Security Foundations Symposium (CSF 2017), August 21--25 2017. [BibTeX entry]
  20. A. Narayanan, V. Shmatikov, Robust De-anonymization of Large Sparse Datasets, Proceedings of the 2008 IEEE Symposium on Security and Privacy (SSP), 2008. [BibTeX entry]
  21. K. Nissim, S. Raskhodnikova, A. Smith, Smooth sensitivity and sampling in private data analysis, Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (STOC '07), 2007. [BibTeX entry]
  22. B.I.P. Rubinstein, P.L. Bartlett, L. Huang, N. Taft, Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning, Journal of Privacy and Confidentiality 4(1): pp. 65--100, 2012. [BibTeX entry]
  23. S. Song, K. Chaudhuri, A.D. Sarwate, Learning from Data with Heterogeneous Noise using SGD, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp. 894--902, May 2015. [BibTeX entry]
  24. S. Song, K. Chaudhuri, A.D. Sarwate, Stochastic Gradient Descent with Differentially Private Updates, Proceedings of the 2013 Global Conference on Signal and Information Processing (GlobalSIP 2013), pp. 245--248, December 2013. [BibTeX entry]
  25. R. Wang, Y. Li, X. Wang, H. Tang, X. Zhou, Learning Your Identity and Disease from Research Papers: Information Leaks in Genome Wide Association Study, Proceedings of the 16th ACM conference on Computer and Communications Security (CCS '09), pp. 534--544, November 9--13 2009. [BibTeX entry]
  26. S.L. Warner, Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias, Journal of the American Statistical Association 60(309): pp. 63--69, March 1965. [BibTeX entry]
  27. J. Zhang, Z. Zhang, X. Xiao, Y. Yang, M. Winslett, Functional Mechanism: Regression Analysis Under Differential Privacy, Proceedings of the VLDB Endowment 5(11): pp. 1364--1375, July 2012. [BibTeX entry]