D1.1 Technology survey: Prospective and challenges - Revised version (2018): Uncertain data

D1.1 Technology survey: Prospective and challenges - Revised version (2018)

◄ Previous: 2.1 Data sources Next: 2.3 ICT support and technologies for data management ►

2 Data

2.2 Uncertain data

Uncertainty in measurements arises due to randomness and complexity of physical phenomena and errors in observations and/or processing of the measured data. There is also a need for users of one discipline to understand the uncertainty of the data and products from another discipline prior to using them and that the methodology used in uncertainty estimation is consistent. Finally, there is also the benefit to various communities such as the public when seeing the uncertainty expressed for data and products of various disciplines (see Uncertainty analysis).

There are 160 standards related to the scope of UA issued by various ISO technical committees, which can be structured in three types: General UA, Flow Measurement UA, and Specific Flow Measurement UA (see Standards related to the scope of UA).

Uncertainty sources involve the following classes:

variables that are used (i.e., instruments, protocols, design site and flow characteristics);
spatial changes in the stream cross section due to the presence of bed forms, vegetation, and ice/debris presence;
temporal changes in the flow due to backwater, variable channel storage, and, unsteady flows (see Uncertainty sources).

The implementation of the UA assumes that the uncertainties involved are small compared with the measured values, with the exception being when the measurements are close to zero (see Practical Considerations).

Uncertainty analysis [top]

Uncertainty analysis is a rigorous methodology for estimating uncertainties in measurements and in the results calculated from them combining statistical and engineering concepts. The objective of a measurement is to determine the value of a measurand that is the value of the particular quantity to be measured. A measurement for a specified measurand therefore entails the measurement methods and procedures along with the effect of the influence quantities (environmental factors). In general, a measurement has imperfections that give rise to an error in the measurement result. Consequently, the result of a measurement is only an approximation or estimate of the value of the measurand and thus is complete only when accompanied by a statement of the uncertainty of that estimate. In practice, the required specification or definition of the measurand is dictated by the required accuracy of measurement. The accuracy of a measurement indicates the closeness of agreement between the result of a measurement and the value of the measurand.

The measurement error is defined as the result of a measurement minus a true value of the measurand. Neither the true value nor the value of the measurand can ever be known exactly because of the uncertainty arising from various effects.

In typical measurement situations, several physical parameters (e.g., flow velocity, depth, and channel width) are physically measured to obtain a derived quantity (e.g., stream discharge). The individual physical measurements are then used in a data reduction equation (e.g., velocity–area method) to obtain the targeted value. Consequently, the two major steps involved in the uncertainty analysis are:

identification and estimation of the uncertainties associated with the measurement of the individual variables, and
propagation of the individual measurement uncertainties in the final result.

While the methods for estimation of the elemental sources of uncertainty are quite similar among various communities (statistical analysis or use of previous experience, expert opinion, and manufacturer specifications), the methods used to determine how those sources of uncertainty are accounted for in the final result have differed widely [TCHME, 2003]. In addition, variations can even occur within a given methodology. Coleman and Steele [Coleman, 1999] discuss six different variations of the Taylor series expansion estimation method (which is the most used uncertainty-estimation approach for the propagation of uncertainties).

Uncertainty analysis is a critical component of the assessment of the performance of the flow measurement and techniques for both the conventional and newer instrumentation and methodologies. These analyses are of fundamental importance to the application of risk management procedures and sustainable water resources management, by ensuring that the methodology and instrumentation selected for a task will deliver the accuracy that is needed. These analyses would also enable investments in hydrological instrumentation in the most cost-effective manner.

Standards related to the scope of UA [top]

Given the vast amount of publications on the topic, a recent overview of the flow measurements standards issued by the International Standards Organization (ISO – the most authoritative institution in the area of standards) lists about 160 standards related to the scope of UA issued by various ISO technical committees, that can be structured in three types of Uncertainty Analysis (UA) publications (i.e., frameworks, standards, guidelines, or references):

General UA (GUA)
Flow Measurement UA (FMUA), and
Specific Flow Measurement UA (SFMUA).

General UA (GUA) approaches

UA was a major concern of scientists and practitioners, as well as of the standardization bodies. In 1986, the efforts of the American Society of Mechanical Engineers (ASME) led to the adoption of the ASME-PTC 19.1 Measurement Uncertainty standard [ASME, 1986], that was recognized also by: the Society of Automotive Engineers (SAE); the American Institute of Aeronautics and Astronautics (AIAA); ISO; the Instrument Society of America – currently the Instrumentation, Systems, and Automation Society (ISA); the US Air Force, and the Joint Army Navy NASA Air Force (JANNAF).

In parallel, due to intense international debates and lack of consensus, in 1978, the problem of unified approach of the uncertainty in measurements was addressed, by the Bureau International des Poids and Mesures (BIPM), at the initiative of the world’s highest authority in metrology, the Comité International des Poids et Mesures (CIPM), and a set of recommendation was elaborated. Eventually, the diverse approaches coagulated by ISO that assembled a joint group of international experts representing seven organizations: BIPM, ISO, International Electrotechnical Commission (IEC), International Federation of Clinical Chemistry (IFCC), International Union of Pure and Applied Chemistry (IUPAC), International Union of Pure and Applied Physics (IUPAP), and International Organization of Legal Metrology (OIML) that prepared the “Guide to Expression of Uncertainty in Measurement” [GUM, 1993], the first set of widely internationally recognized guidelines for the conduct of uncertainty analysis.

GUM provides general rules for the evaluation and expression of uncertainty in measurement rather than providing detailed and specific instructions tailored to any specific field of study. The main distinction between GUM and previous methods is that there is no inherent difference between an uncertainty arising from a random effect and one arising from a correction for a systematic effect (an error is classified as random if it contributes to the scatter of the data; otherwise, it is a systematic error). GUM uses a classification based on how the uncertainties are estimated:

Type A - evaluated statistically;
Type B (evaluated by other means).

GUM provides a realistic value of uncertainty based on standard’s methodology fundamental principle that all components of uncertainty are of the same nature and are to be treated identically. GUM / JCGM (100:2008) methodology is recognized today as being the most authoritative framework for a rigorous uncertainty assessment, however, it provides general rules for evaluating and expressing uncertainty in measurement rather than providing detailed, scientific- or engineering-specific instructions. GUM / JCGM (100:2008) does not discuss how the uncertainty of a particular measurement result, once evaluated, may be used for different purposes such as, for example, to draw conclusions about the compatibility of that result with other similar results, to establish tolerance limits in a manufacturing process, or to decide if a certain course of action may be safely undertaken.

Flow Measurement UA (FMUA) and Specific FMUA (SFMUA) approaches

Minimal guidance is available on UA for flow measurements [WMO, 2007]; Pilon et al., 2007). A new edition of the Guide to Hydrological Practices recently published in 2008 [WMO, 2008] reviews new instrumentation and technologies to produce hydrological information but does not address uncertainty analysis aspects of the data and information. Despite the many authoritative documents on flow measurement that were available (e.g., [ASME, 1971]), the first effort at developing a national standard for flow measurement in the U.S.A. was initiated in 1973 [Abernethy, 1985]. The first standard on flow measurement developed by ISO was “Measurement Flow by Means of Orifice Plates and Nozzles” [ISO, 1967] and is based on compromises between USA procedures and those in use throughout Western Europe. All of these efforts addressed the accuracy of flow measurement with various degrees of profundity. However, each of the resulting publications reported “personalized” procedures for estimating the uncertainty and was often biased by the judgment of the individuals involved in the development of procedure development [Abernethy, 1985].

Because of the diversity and large number of available standards on flow measurements (there are 64 ISO standards), guidance on the different types of standards (how they can be used and when), the decision process for implementation of standards, the various key access points for information about the standards and their availability is necessary. ISO/TR 8363 [ISO, 1997] is recommended as being the “standard of the standards” for flow measurements as it gives the most qualified guidance on the selection of an open channel flow measurement method and in the selection of an applicable standard. The first criterion that the ISO (1997) uses to select a specific flow measurement instrument or technique is the required or expected level of uncertainty of the measurement.

Uncertainty sources [top]

The estimation of the uncertainties of the stream-flow estimates at gaging station based on rating curves associated with the HQRC and IVRC methods involves two distinct aspects:

the estimation of the accuracy of the direct measurements for constructing and subsequently using the RCs, and
the estimation of the accuracy of the RCs themselves (i.e., regression, extrapolation, shifting).

Similarly, the CSA method is subject to uncertainty from the direct measurements and from the analytical methods and their assumptions.

Using the generic grouping of the sources of uncertainties in gaging methods proposed by [Fread, 1975], we can distinguish the following classes:

variables that are used (i.e., instruments, protocols, design site and flow characteristics)
spatial changes in the stream cross section due to the presence of bed forms, vegetation, and ice/debris presence. These changes are typically evolving slower in time (from storm event to season duration) and can be reversible or permanent.
temporal changes in the flow due to backwater, variable channel storage, and, unsteady flows. Typically, these changes are of the order of hours or days.

Assessment of the individual sources of uncertainties in the three categories above is not available from several reasons:

there is no comprehensive and widely accepted methodology to conduct uncertainty analysis (UA) for hydrometric measurements at this time. Efforts are made in this community to identify robust standardized methodologies for the assessment of uncertainties for both direct measurements (e.g., [Muste, 2012]) and rating curves (e.g., [Le Coz, 2014]). These efforts are quite extensive as conduct of UA requires specialized experiments similar to the calibrations executed by manufacturers for industrial flow meters. Obviously that these calibrations are much more difficult to conduct in field conditions.
the level of uncertainty in the HQRC, IVRC, and CSA method estimates induced by site conditions and changes in the flow status are un-known. The situation is especially critical for high flows (e.g., floods) as these events are not frequent and the preparation to acquire measurements are more involved than in steady flows.

Despite the challenge and high cost, these efforts are currently increased as the demand for data quality is also increasing [Le Coz, 2015], [Muste, 2015].

The same method generates different results, depending on the evolution of physical related phenomena. Thus for steady flows, the analysis HQRC vs. IVRC, the findings presented in [Muste, 2015] might suggest that the IVRC estimates are less precise (i.e., show more scattering) than HQRC estimates in steady flows.

Another aspect that distinguishes the IVRC from HQRC is that the former method is sensitive to the change in the flow structure passing through the line of sight.

To compare the performance of HQRC vs. CSA, studies were conducted at the USGS streamgage station 05454220 located on Clear Creek, a small stream located in Iowa, USA. The differences between the two curves are up to 20% for this site (differences are site specific), indicating that the less expensive (no calibration needed) synthetic RC can be used as surrogates when lack of re-sources are of concern for the monitoring agencies. Moreover, the increased availability of affordable radar- or acoustic-based sensors that non-intrusively measure the free surface elevation makes this simplified SA approach attractive for a plethora of applications where this degree of uncertainty is acceptable.

Unsteady flows are ephemeral but unavoidable in natural streams, therefore hysteresis is always present to some degree irrespective of the river size.

It is often stated that for most of the streams the hysteresis effects are small and cannot be distinguished from the uncertainty of the instruments and methods involved in constructing the RCs. On the other hand, theoretical considerations leave no doubt that the use of HQRCs for unsteady flows is associated with hysteresis, however small it may be [Perumal, 2014]. It is only by acquiring direct discharge measurements continuously during the whole extent of the propagation of the flood wave, as it is done in the present study, that it can be demonstrated the magnitude of the hysteresis effect. Fortunately, the advancement and efficiency of the new measurement technologies makes this task increasingly possible.

The non-unicity of the relationships flow variables during unsteady flows were also observed in detailed laboratory experiments conducted by Song and Graf [Song, 1996] where it was shown that during the passage of the hydrograph the mean cross-sectional velocities on the rising limb are larger than on the falling limb for the same flow depth. Unfortunately, this level of detail for the analysis cannot be easily achieved in field conditions.

The experimental evidence [Muste, 2013] suggests that recourse needs to be made to the fundamental equations for the unsteady open channel flow (e.g., Saint-Venant equations) when formulating protocols for IVRC method. The correction protocols would be similar to the corrections applied for the HQRC protocols used in unsteady flows. Another alternative for enhancing the performance of IVRC for unsteady flow would be to use the segmentation approach described by Ruhl and Simpson [Ruhl, 2005] in the construction of the curve for unsteady flows.

The comparison of the CSA vs HQRC [Muste, 2015] allows to note that on the rising limbs of the CSA method, the high flows occur faster and are larger than those predicted by the HQRC method. These findings are consistent with previous laboratory and field measurements (e.g., [Song, 1996]; [Perumal, 2004]; [Gunawan, 2010] and have practical implications for both flood and stream transport processes.

The main conclusions on the performance of the conventional methods in observing steady flows [Muste, 2015] is that the HQRC method is more robust and less sensitive to the changes in flow structures (produced by imperfections in the gaging site selection and ephemeral changes in the flow distribution) compared to IVRC and simplified CSA methods. In contrast, the HQRC performs poorer than the other methods in unsteady flows as the typical construction protocol for RCs is based on steady flow assumptions.

Many distributed systems use the event-driven approach in support of monitoring and reactive applications. Examples include: supply chain management, transaction cost analysis, baggage management, traffic monitoring, environment monitoring, ambient intelligence and smart homes, threat / intrusion detection, and so forth.

Events can be primitive, which are atomic and occur at one point in time, or composite, which include several primitive events that occur over a time interval and have a specific pattern. A composite event has an initiator (primitive event that starts a composite event) and a terminator (primitive event that completes the composite event). The occurrence time can be that of the terminator (point-based semantics) or can be represented as a pair of times, one for the initiator event, and the other for the terminator event [Paschke, 2008, Dasgupta 2009]. The interval temporal logic [Allen, 1994] is used for deriving the semantics of interval based events when combining them by specific operators in a composite event structure.

Event streams are time-ordered sequences of events, usually append-only (events cannot be removed from a sequence). An event stream may be bounded by a time interval or by another conceptual dimension (content, space, source, certainty) or be open-ended and unbounded. Event stream processing handles multiple streams aiming at identifying the meaningful events and deriving relevant information from them. This is achieved by means of detecting complex event patterns, event correlation and abstraction, event hierarchies, and relationships between events such as causality, membership, and timing. So, event stream processing is focused on high speed querying of data in streams of events and applying transformations to the event data. Processing a stream of events in their order of arrival has some advantages: algorithms increase the system throughput since they process the events “on the fly”; more specific they process the events in the stream when they occur and send the results immediately to the next computation step. The main applications benefiting from event streams are algorithmic trading in financial services, RFID event processing applications, fraud detection, process monitoring, and location-based services in telecommunications.

Temporal and causal dependencies between events must be captured by specification languages and treated by event processors. The expressivity of the specification should handle different application types with various complexities, being able to capture common use patterns. Moreover, the system should allow complete process specification without imposing any limiting assumptions about the concrete event process architecture, requiring a certain abstraction of the modelling process. The pattern of the interesting events may change during execution; hence the event processing should allow and capture these changes through a dynamic behaviour. The usability of the specification language should be coupled with an efficient implementation in terms of runtime performance: near real-time detection and non-intrusiveness [Mühl, 2006]. Distributed implementations for the events detectors and processors often achieve these goals. We observe that, by distributing the composite event detection, the scalability is also achieved by decomposing complex event subscriptions into sub-expressions and detecting them at different nodes in the system [Anicic, 2009]. We add to these requirements the fault tolerance constraints imposed to the event composition, namely: the correct execution in the presence of failures or exceptions should be guaranteed based on formal semantics. One can notice that not all these requirements can be satisfied simultaneously: while a very expressive composite event service may not result in an efficient or usable system, a very efficient implementation of composite event detectors may lead to systems with low expressiveness. In this chapter, we describe the existing solutions that attempt to balance these trade-offs.

Composite events can be described as hierarchical combinations of events that are associated with the leaves of a tree and are combines by operators (specific to event algebra) that reside in the other nodes. Another approach is continuous queries, which consists in applying queries to streams of incoming data [Chandrasekaran, 2002]. A derived event is generated from other events and is frequently enriched with data from other sources. Event representation must completely describe the event in order to make this information usable to potential consumers without need to go back to the source to find other information related to the event.

Many event processing engines are built around the Event, Condition, Action (ECA) paradigm [Chakravarthy, 2007], which was firstly used in Data Base Management Systems (DBMS) and was then extended to many other categories of system. These elements are described as a rule that has three parts: the event that triggers the rule invocation; the condition that restricts the performance of the action; and the action executed as a consequence of the event occurrence. To fit this model, the event processing engine includes components for complex event detection, condition evaluation, and rule management. In this model, the event processing means detecting complex events from primitive events that have occurred, evaluating the relevant context in which the events occurred, and triggering some actions if the evaluation result satisfies the specified condition. Event detection uses an event graph, which is a merge of several event trees [Chakravarthy, 1994]. Each tree corresponds to the expression that describes a composite event. A leaf node corresponds to a primitive event while intermediate nodes represent composite events. The event detection graph is obtained by merging common sub-graphs. When a primitive event occurs, it is sent to its corresponding leaf node, which propagates it to its parents. When a composite event is detected, the associated condition is submitted for evaluation. The context, which can have different characteristics (e.g. temporal, spatial, state, and semantic) is preserved in variables and can be used not only for condition evaluation but also in action performance.

Studies emphasize the strong dependence on the test location. The findings are in agreement with theoretical considerations and consistent with a handful of previous studies of similar nature. The existing studies point out that there is a need for initiating a systematic effort to evaluate the effect of flow unsteadiness on various types of RCs used at gages located in medium and small streams. Fortunately, this task is considerable eased nowadays by the availability of the new generation of non-intrusive (i.e., optical and image based) instruments that can be used to programmatically target monitoring of flood events throughout their duration.

Practical Considerations [top]

the measurement process is understood, critically analysed, and well defined
the measurement system and process are controlled
all appropriate calibration corrections have been applied
the measurement objectives are specified
the instrument package and data reduction procedures are defined

uncertainties quoted in the analysis of a measurement are obtained under full intellectual honesty and professional skills.

If all of the quantities on which the result of a measurement depends are varied, its uncertainty can be evaluated by statistical means (Type A evaluation method). However, because this is rarely possible in practice due to limited time and resources, the uncertainty of a measurement result is usually evaluated using a mathematical model of the measurement and the law of propagation of uncertainty. Thus implicit in GUM / JCGM (100:2008) is the assumption that a measurement can be modelled mathematically to the degree imposed by the required accuracy of the measurement. Because the mathematical model may be incomplete, all relevant quantities should be varied to the fullest practical extent so that the evaluation of the uncertainty can be based as much as possible on observed data.

The implementation of the Guide assumes that the result of a measurement has been corrected for all recognized significant systematic effects and that every effort has been made to identify such effects. In some case, the uncertainty of a correction for a systematic effect need not be included in the evaluation of the uncertainty of a measurement result. Although the uncertainty has been evaluated, it may be ignored if its contribution to the combined standard uncertainty of the measurement result is insignificant. In order to decide if a measurement system is functioning properly, the experimentally observed variability of its output values, as measured by their observed standard deviation (end-to-end approach in the AIAA, 1005 terminology), is often compared with the predicted standard deviation obtained by combining the various uncertainty components that characterize the measurement. In such cases, only those components (whether obtained from Type A or Type B evaluations) that could contribute to the experimentally observed variability of these output values should be considered.

It is recommended that a preliminary uncertainty analysis be done before measurements are taken. This procedure allows corrective action to be taken prior to acquiring measurements to reduce uncertainties. The pre-test uncertainty analysis is based on data and information that exist before the test, such as calibration, histories, previous tests with similar instrumentation, prior measurement uncertainty analysis, expert opinions, and, if necessary, special tests. Pre-test analysis determines if the measurement result can be measured with sufficient accuracy, to compare alternative instrumentation and experimental procedures, and to determine corrective actions. Corrective action resulting from pre-test analysis may include:

improvements to instrument calibrations if systematic uncertainties are unacceptable
selection of a different measurement methods to obtain the parameter of interest
repeated testing and/or increased sample sizes if uncertainties are unacceptable

Cost and time may dictate the choice of the corrective actions. If corrective actions cannot be taken, there may be a high risk that test objectives will not be met because of the large uncertainty interval, and cancellation of the test should be a consideration. Post-test analysis validates the pre-test analysis, provides data for validity checks, and provides a statistical basis for comparing test results.

◄ Previous: 2.1 Data sources Next: 2.3 ICT support and technologies for data management ►