D1.1 Technology survey: Prospective and challenges - Revised version (2018)

4 ICT based systems for monitoring, control and decision support

4.7 Fault tolerance - faults are the norm not the exception

Failure and network partitions are common in large-scale distributed systems; by replicating data we can avoid single points of failure. Replication has become an essential feature in storage systems and is leveraged extensively in cloud environments [Ghemawat, 2003]: it is the main reason behind the high availability potential of cloud storage systems. An important issue, we need to consider in geo-replicated storage systems, is data consistency. As strong consistency by means of synchronous replication may limit the performance of some applications, relaxed consistency models (e.g., weak, eventual, casual, etc.) therefore have been introduced to improve the performance while guaranteeing the consistency requirement of the specific application [DeCandia, 2007][Cassadra, 2009].

With the unpredictable diurnal/monthly changes in the data access and the variation of the network latency, static and traditional consistency solutions are not adequate [Li, 2012][Peglar, 2012]. Therefore, few studies have focused on exploring adaptive consistency models for one specific application [Chihoub, 2012]. However, environmental applications exhibit complex data access affinity, more importantly, data are shared by multiple applications which have a mix of consistency requirements, consequently, our goal is to study new metrics and configurable consistency models to maintain a high rate of consistency of shared data in geo-replicated storage systems while improving performance and throughput of multiple applications. Moreover, as the speed and size of generated data is rapidly growing, we intend to explore new techniques based on erasure coding to reduce the storage capacity while maintaining high data availability.