Virtual Sensors to Predict and Control THM Formation on a Distribution System

Central Arkansas Water uses virtual sensing to address disinfection by-products in its distribution system.


Central Arkansas Water serves the Metropolitan Little Rock, AR area.  One of two water treatment plants is the 24 mgd Ozark Point Water Treatment Plant which was experiencing problems with elevated total THM (TTHM) levels on its distribution system.  Fortunately, the TTHM levels in the finished water were frequently well within the new Stage 2 DBP standards, but TTHM formation continued to occur within the distribution system.  The utility wanted to know what the TTHM levels would be at any location on the distribution system under all operating conditions and what preventive measures could be taken to optimize for TTHM formation. Advanced Data Mining Intl of Greenville, SC (ADMi) used lab and SCADA data, along with machine learning techniques, to build a virtual sensor to determine daily TTHM levels at the various critical quarterly monitoring points in the distribution system.  A similar approach has been used in other systems for the control of THM and HAA5 species and could also be used be used to predict levels of non-regulated contaminants such as nitrosamines, etc.

Building Virtual Sensors – An Introduction

Computer models of physical systems generally fall into one of two categories, deterministic (or mechanistic) models and empirical correlation functions, also referred to as models when deployed as predictive applications. Deterministic models are created from first-principles equations, while empirical correlation approaches adapt generalized mathematical functions to fit a line or surface through data from two or more variables. The most commonly used empirical approach is ordinary least squares (OLS), which relates variables using straight lines, planes, or hyper-planes, whether the actual relationships are linear or not. For systems that are well characterized by data, empirical models can be developed much faster and are more accurate; however, empirical models are prone to problems when poorly applied. Over-fitting and multi-collinearity caused by correlated input variables can lead to invalid mappings between input and output variables (Roehl et al 2003). Calibrating both deterministic and empirical models attempt to optimally synthesize a line or surface through the observed data. Calibrating models is made difficult when:

  • Calibration data have substantial measurement error – statistical measures of accuracy will be poor because the measurements are the standard to which model predictions are compared.
  • Calibration data are incomplete.
    • Limited data does not cover the full range of behaviors of interest.
    • Limited variables may only be able to provide a partial explanation of the causes of variability.
  • The model’s functional form is inadequate for describing the process’s physics – e.g., the physics must be described by multiple interdependent variables having non-linear relationships.