Temporal Data Mining (TDM)

Developing a Temporal Data Mining (TDM) System for In-Situ Decommissioning (ISD) Sensor Network Test Bed

Big Data, a concept widely popular nowadays, is defined as extremely large and complex data sets that need to be analyzed computationally to reveal patterns, trends, and associations, and is difficult to process using non-computational methods. Temporal Data Mining (TDM) is an active and rapidly evolving area in Big Data science. In 2006, Laxman and Unnikrishnan firstly gave a complete survey on TDM theories and developed new algorithms for discovering frequency episodes in the event stream. Their new algorithms are, both space-wise and time-wise, significantly more efficient than the earlier algorithms reported. These new TDM techniques soon found their applications in the real world, such as neuronal network studies and the automobile industry.

Is it possible to borrow TDM concepts/algorithms and apply them to nuclear sciences, especially to the practice in nuclear site monitoring and restoration? In-situ monitoring and decommissioning, like an automobile assembly line, generate a large amount of data which is time-specific, age-specific, and developmental stage-specific. This large amount of data may be useful to find unknown patterns of material failure, system breakdown, radiation field change, liquid leaking, with tempera data mining techniques.

At SRNL, researchers have established an ISD Sensor Network Test Bed, a unique, small-scale, and configurable environment, for the assessment of prospective sensors on actual ISD system material, at a minimal cost. The extensive data collected by the ISD sensors are ideal for temporal data mining to validate ISD system performance and predict possible system failures (or future accidents). A fast, robust, and efficient real-time data acquisition and data mining system based on current computer technology is urgently needed. We propose:

  1. Design and implement a real-time data acquisition system for ISD sensor network testbed located in SRNL. Through the data acquisition system, real-time data from various sensors (temperate, pressure, humidity, radiation field, leakage, etc.) will be synchronized and then stream into the data server with time stamps. The data acquisition system will also have the capability to control the devices remotely according to the feedbacks from data analysis and temporal data mining (TDM).
  2. Design and implement a web-based temporal data mining system for the ISD sensor network testbed. This web-based temporal data analysis and data mining system frequently visit the data server located off-site through web GUIs. The system can be accessed over the internet through a safe membership authorization from anywhere. The sketch of the data acquisition and data mining system is shown below.

A new algorithm for sequence prediction over long categorical event streams will be applied to this big dataset. In this, the set of significant frequent episodes associated with each target event type is obtained based on formal connections between frequent episodes and Hidden Markov Models (HMMs), and a mixture of such HMMs are used for estimating the likelihood for every target event type (e.g. material failure or accidents).

This project is unique in that it is the first time in the world to combine computer real-time data gathering, temporal data mining, and possibly remote control in a nuclear deactivation and decommission scene. It applies contemporary computer concepts (big data/data mining/machine learning) to traditional areas of nuclear sciences (permanent entombment of contaminated, large nuclear structures via in-situ decommissioning). If this project can be implemented, it will greatly improve the speed of emergency response as well as diminish the needs of nuclear personals, which then will significantly reduce the cost of the whole nuclear industry.