CRII: CNS: An Experimental Infrastructure to Reduce Latency Long-tail in Real-time Stream Processing

Project: Research project

Project Details

Description

Because of Covid, businesses are reassessing their decades-old traditional strategies. Leading companies such as Walmart are embracing event streaming and cloud-centric web applications to rapidly reshape their business models. However, unexpected performance variances in these web-facing applications could significantly and adversely affect consumers and, ultimately, online businesses. The project addresses this challenge with an experimental infrastructure and research aimed at relieving potential performance concerns from service providers to achieve good performance and high resource efficiency for web-facing applications.Consistent low latency has become essential for user-facing, latency-sensitive web applications such as e-commerce and real-time stream processing due to its business impact. For example, the Bing search engine team reported that a 500-millisecond delay could lead to a decrease in revenue by 1.2%. Despite continuous efforts made by practitioners, the latency long-tail problem still consistently occurs, where a small number of requests take a long response time (e.g., multiple seconds) to return. Existing (and state-of-the-art) infrastructures mainly focus on the correlation between execution time variance localized in the critical paths (i.e., the longest path in time for a request from start to finish) and latency long-tail in web-facing applications. However, our preliminary experimental results suggest that asynchronous, very short but intense resource demands (milliseconds level, referred to as the “millibottlenecks”) outside of critical paths can also cause significant latency long-tail, causing serious end-user dissatisfaction and leading to significant revenues loss. This project targets the unique challenges of millibottlenecks outside of critical paths by designing a fine-grained monitoring toolkit and developing sophisticated quantitative analyses for millibottlenecks as well as effective cures for latency long-tail problems to achieve the goals of reducing latency long-tail in emerging real-time processing applications.This project responds to this research challenge with an experimental infrastructure, to reduce latency long-tail caused by millibottlenecks in real-time stream processing. The project will proceed along with three tasks. First, the experimental infrastructure starts with the initial results of the observed latency long-tail in emerging real-time stream processing through fine-grained performance monitoring (e.g., both system- and application-level metrics). Second, the team will propose to methodically evaluate and validate millibottlenecks (including that outside of the critical path) and to determine their causal relationship to the latency long-tail. Third, effective cures will be proposed to reduce latency long-tail by disrupting the causality from millibottlenecks to the latency long-tail problem with negligible overhead. Furthermore, the knowledge gained on the latency long-tail problem caused by various millibottlenecks will enable more reliable performance studies and long-term advances in the design and implementation of mission-critical web applications in clouds.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date4/1/233/31/25

Funding

  • National Science Foundation: $175,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.