PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization

Abstract

Private deep neural network (DNN) inference based on secure twoparty computation (2PC) enables secure privacy protection for both the server and the client. However, existing secure 2PC frameworks suffer from a high inference latency due to enormous communication. As the communication of both linear and non-linear DNN layers reduces with the bit widths of weight and activation, in this paper, we propose PrivQuant, a framework that jointly optimizes the 2PC-based quantized inference protocols and the network quantization algorithm, enabling communication-efficient private inference. PrivQuant proposes DNN architecture-aware optimizations for the 2PC protocols for communication-intensive quantized operators and conducts graph-level operator fusion for communication reduction. Moreover, PrivQuant also develops a communicationaware mixed precision quantization algorithm to improve the inference efficiency while maintaining high accuracy. The network/protocol co-optimization enables PrivQuant to outperform prior-art 2PC frameworks. With extensive experiments, we demonstrate PrivQuant reduces communication by 11×, 2.5 × and 2.8×, which results in 8.7×, 1.8 × and 2.4× latency reduction compared with SiRNN, COINN, and CoPriv, respectively.

Publication
2024 IEEE/ACM International Conference on Computer Aided Design
Tianshi Xu
Tianshi Xu
Second-Year PhD student

Tianshi Xu is now a second-year Ph.D. student at the School of Integrated Circuit, Peking University. His research interests include privacy and security of AI, especially privacy-preserving deep learning (PPDL).