games project

The Energy Sensing and Monitoring Infrastructure (ESMI)

ESMI Architecture

The ESMI is the environment in which many software components cooperate with the aim of collecting monitoring data coming from the IT equipment, computing and monitoring key and green performance indicators, mining different kinds of relations among indicators, assessing the current status of the overall monitored IT infrastructure, and providing all the extracted information to DTE and RTE by both synchronous and asynchronous interfaces.

The Infrastructure Access Module (IAM) is the software through which several types of measures are executed on the IT equipment and sent to a central collector. It consists of a number of Nagios plug-ins, some of which have been developed to meet well specified GAMES requirements, properly installed, configured and running on servers and power sensors. Monitoring data are packaged in proper messages and sent to a Nagios Server that, in its turn and by means of a functionality implemented by the NDOUtils add on, stores data into a database. This is not the only functionality of the IAM since this module also translates in actions on IT equipment the adaptation strategies applied by the Run Time Environment modules. For instance, it provides the means to execute workload consolidation actions (such as virtual tasks deployment and migration) and dynamic power management actions (such as turn-ON/OFF servers).

The Aggregated Data Base (ADB) is the structured and normalized version of the monitoring data repository. Data produced and sent by the Nagios plug-ins, even when the Nagios transmission syntax is perfectly adopted, must be parsed and structured in order to be used not only for monitoring but also for supplying the adaptation strategies.

The ADB and its manager are part of another internal ESMI module, the Finalised Context Instances Module (FCIM), which is in charge of collecting measures, organising them in a structured way and providing access to them to the other ESMI components. The FCIM followed the Nagios enhancement evolution and includes a wide set of parsers to correctly and quickly sample, organise and store raw monitoring data into the ADB.

On top of the ESMI internal monitoring data repository another ESMI software component checks whether and which among a well defined set events really occur: it is the Provenance and Tracking Interface (PTI) module that implements the event based messaging system inside the ESMI. The asynchronous messaging system adopted in GAMES is based on the JMS middleware. A set of wrappers has been also developed to provide the same functionalities through a REST service based interface.

The three ESMI components mentioned above are linked together by means of a properly developed web application: the ESMI Link. It allows to easily configure, start and control the IAM, the FCIM and the PTI by means of a simple web interface. The GUI is intended to be used by system administrators to ease the installation and setup of the ESMI.

While the finalized context instances are stored into the ADB and managed by the FCIM module, a more complex Context Model has been designed to represent the service centre energy related data in a programmatic manner.

The Energy Aware Context Model (EACM) is constructed by mapping the RAP (Context Resources, Adaptation Actions, Context Policies) context model onto the service centre energy efficiency domain aiming at identifying / classifying service centre specific elements into RAP context model sets. Context Resources define the physical or virtual entities that capture and / or process the energy related context data. In a service centre we have identified three sub-types of Context Resources: Facility Resources (service centre sensors and facilities), Computing Resources (consume energy as a result of executing a specific workload i.e. servers) and Application Resources (service centre workload applications). Context Actions define the set of design time enabled adaptation actions that may be executed at run time to enforce the service centre energy efficiency goals. We have identified three types of adaptation actions: Facility Adaptation Actions (e.g. adjust the room temperature or start the air conditioner), Application Adaptation Actions (e.g. application redesign for energy efficiency) and IT Computing Adaptation Actions. Two main types of IT Computing Adaptation Actions are represented in the EACM model: Resource Consolidation actions (e.g. activity migration/deployment) and Dynamic Power Management actions (e.g. turn Off/On server, change the P-state of a processor etc.). Context Policies define the service centre energy efficiency goals through a design time defined set of Green and Key Performance Indicators (GPIs/KPIs). The model defines three types of GPIs/KPIs: (1) Environmental, imposing restrictions about the service centre ambient conditions (e.g. the temperature in the service centre must be under 21˚C), (2) IT Computing, describing the energy/performance characteristics of the service centre computing resources (e.g. the server CPU is efficiently used for a load between 60%-80% ) and (3) Application, specifying the rules (QoS requests) imposed by the business application for execution (e.g. for optimal execution time the application needs to have allocated 1Gb of physical memory)

In order to process monitoring data and understand the information that can be extracted from them, what related to configuration and status of the data centre must be formalized and available to the other parts of the overall GAMES software architecture. This information, in terms of hardware, installed software, relevant indicators, is stored into an ad hoc database, the Energy Practice Knowledge Base (EPKB).

By querying the EPKB, for instance, another GAMES software component can retrieve the historical values of indicators as well as get the number of servers that are running or the configurations of the virtual machines installed on a given server. It is worth noting that the Energy Practice Knowledge Base holds a central role in the whole architecture.

One of the ESMI software components that better exploits EPKB is the Integrated Energy Assessment Tool (IEAT) that plays a key role in supporting the data centre manager during the analysis of the collected data. The knowledge that can be extracted from this data helps to reduce the energy consumption and, at the same time, helps to ensure the fulfilment of the service level agreement. In particular, the assessment tool starts from the assumption that a set of indicators, including both GPI (Green Performance Indicators) and KPI (Key Performance Indicators), has been identified. These indicators state how well the running data centre works in terms of infrastructure, middleware and applications. Indeed, the indicators could refer to hardware elements (e.g., CPU), virtual machines, or business processes.

Generally speaking, the IEAT is composed by two main elements:

- The back end that performs all the operations required

i. to continuously obtain the values for the GPI and KPI and

ii. to execute data mining algorithms for discovering correlations or associations among the values of the indicators.

- The front end that makes available to the final user (e.g., data centre manager) a graphical tool for analysing the indicator values and for requesting the execution of data mining algorithms.

The GPI Calculator integrated into the IEAT is in charge of periodically calculating the values of the indicators, both GPI and KPI, that are defined in the EPKB with respect to the values gathered from the monitoring infrastructure. In its turn, the Data Mining Module (DMM) is the software components that integrates into the ESMI the data mining and machine learning algorithms implemented in WEKA. By means of DMM, association rules among indicators can be discovered as well as correlations and clusters are computed. So, e.g. in order to associate GPIs and KPIs, through the integrated tool page, the miner can select subset of indicators, require the computation of correlations among indicators values in a well defined time slot, set minimum confidence and support, launch the execution of the APRIORI algorithm, manually validate the extracted association rules.