PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization

Tianshi Xu, Shuzhang Zhong, Wenxuan Zeng, Runsheng Wang, Meng Li

August, 2024

Abstract

Private deep neural network (DNN) inference based on secure twoparty computation (2PC) enables secure privacy protection for both the server and the client. However, existing secure 2PC frameworks suffer from a high inference latency due to enormous communication. As the communication of both linear and non-linear DNN layers reduces with the bit widths of weight and activation, in this paper, we propose PrivQuant, a framework that jointly optimizes the 2PC-based quantized inference protocols and the network quantization algorithm, enabling communication-efficient private inference. PrivQuant proposes DNN architecture-aware optimizations for the 2PC protocols for communication-intensive quantized operators and conducts graph-level operator fusion for communication reduction. Moreover, PrivQuant also develops a communicationaware mixed precision quantization algorithm to improve the inference efficiency while maintaining high accuracy. The network/protocol co-optimization enables PrivQuant to outperform prior-art 2PC frameworks. With extensive experiments, we demonstrate PrivQuant reduces communication by 11×, 2.5 × and 2.8×, which results in 8.7×, 1.8 × and 2.4× latency reduction compared with SiRNN, COINN, and CoPriv, respectively.

Type

Conference paper

Publication

2024 IEEE/ACM International Conference on Computer Aided Design

conference

PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization

Abstract

Tianshi Xu

Second-Year PhD student