4.5 Article

Developing Systems for Real-Time Streaming Analysis

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
Volume 21, Issue 3, Pages 561-580

Publisher

AMER STATISTICAL ASSOC
DOI: 10.1080/10618600.2012.657144

Keywords

Massive datasets; Radio astronomy; Real-time methods; Streaming data

Funding

  1. Los Alamos National Laboratory LDRD [20080729DR]
  2. U.S. Department of Energy [DE-AC52-06NA25396]

Ask authors/readers for more resources

Sources of streaming data are proliferating and so are the demands to analyze and mine such data in real time. Statistical methods frequently form the core of real-time analysis, and therefore, statisticians increasingly encounter the challenges and implicit requirements of real-time systems. This work recommends a comprehensive strategy for development and implementation of streaming algorithms, beginning with exploratory data analysis in a flexible computing environment, leading to specification of a computational algorithm for the streaming setting and its initial implementation, and culminating in successive improvements to computational efficiency and throughput. This sequential development relies on a collaboration between statisticians, domain scientists, and the computer engineers developing the real-time system. This article illustrates the process in the context of a radio astronomy challenge to mitigate adverse impacts of radio frequency interference (noise) in searches for high-energy impulses from distant sources. The radio astronomy application motivates discussion of system design, code optimization, and the use of hardware accelerators such as graphics processing units, field-programmable gate arrays, and IBM Cell processors. Supplementary materials, available online, detail the computing systems typically used for streaming systems with real-time constraints and the process of optimizing code for high efficiency and throughput.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available