posted on 2025-11-04, 22:04authored byH Gao, C Yue, Anh DinhAnh Dinh, Z Huang, BC Ooi
Modern data analytics pipelines are highly dynamic, as they are constantly monitored and fine-tuned by both data engineers and scientists. Recent systems managing pipelines ease creating, deploying, and tracking their evolution. However, privacy concerns emerge as many of them are deployed on the public cloud with less or no trust. Unfortunately, the unique nature of pipelines prevents the adoption of existing confidential computing techniques with different computational patterns and large performance overhead. Being a potential approach, trusted execution environments (TEEs) are efficient in protecting the confidentiality and integrity of data and computation. However, fast-changing pipelines with latency requirements bring the challenge of reducing the cold start overhead --- the main bottleneck in the latest TEE. To support end-to-end private pipeline evolution, we present SecCask, a TEE-based data analytics pipeline management system. SecCask overcomes the problems of a naive design that isolates complete pipeline execution in one enclave by administering enclaves and runtimes. To reduce cold start overheads, our approach consists of reusing trusted runtimes for different pipeline components and caching them to avoid the cost of initialization. We leverage the latest Intel SGX to conduct experiments on representative workloads. The results demonstrate that SecCask reduces the total execution time by 68.4% compared to not reusing, is faster than running all components in one enclave, and incurs a modest average performance overhead of 29.9% over insecure baselines.