games project

Run Time Environment (RTE)

RTE

This environment is responsible for the applications of the adaptation actions required to reduce the energy consumption of the Data Centre without affective the performance level of the applications running on it. It can be roughly split in three main software components, described in the followings.

Global Control Loop implementation

The Global Control Loop proactively identifies the service centre over-provisioned computing resources with the goal of putting them to low power states. It is based on techniques like: resource virtualisation, energy efficient resource provisioning, resource consolidation and dynamic power management. The Global Control Loop main implementation modules are the following: Context Monitoring Module, Context Analysis Module, Action Planning Module and Action Plan Enforcement Module.

Context Monitoring Module implemented functionality is to interact with the Context Accessing and Processing API (part of ESMI) using RMI (Remote Method Invocation) calls to update the local copy of the current EACM model ontology instance and to get adaptation knowledge from EPKB.

The Context Analysis Module implements the Global Control Loop adaptation methodology analysis phase referring to the evaluation of the low level GPIs/KPIs (such as servers’ loading values) for determining if adaptation actions planning is required. The evaluation of low level GPI/KPIs is implemented by means of reasoning techniques. The evaluation result is stored in the current EACM ontology instance and is used by the Context Analysis Module to calculate the context entropy value.

The Action Planning Module goal is to implement the Global Control Loop adaptation methodology planning phase aiming at determining the adaptation action plans that need to be executed. It has two main sub-modules: (i) Similar Context Identification Module – identifies similar context situations using clustering techniques and (ii) Adaptation Action plan Generation Module – constructs the adaptation action plans by means of reinforcement learning.

The Action Plan Enforcement Module executes the adaptation actions plans through the Infrastructure Access Module.

Local Control Loops implementation

The Local Control Loops are based on the ACPI standard which allows for dynamically changing the processor power states (C-states) and performance states (P-states) using low level operating systems calls. The main energy saving idea implemented by the local loops is to put the server CPU to Sleep State (C-3) when no workload is executed and oscillate between the CPU performances states (P-states) according to the workload characteristics when the CPU is in the Operating State (C-0). When deciding to change the P-State of the CPU, the Local Control Loops trade-off performance – energy by considering that in a higher P-State the CPU will consume more power while in a lower P-State, the CPU performance degradation will be higher. Two types of Local Control Loops have been implemented: a Fuzzy Logic based Local Control Loop and a Bio-inspired Local Control Loop. They are both installed on the service centre servers and can be dynamically activated or deactivated based on the suggestions received from the Design Time Environment (i.e. co-design methodology workload and energy consumption analysis).

The Fuzzy Logic Local Control Loop

The Fuzzy Logic based Local Control Loop tackles the CPU energy efficiency problem by using fuzzy sets and fuzzy functions to characterize the CPU current workload intensity and to determine the appropriate transitions of the CPU P-states. We have chosen fuzzy logic due to its ability to filter-out the noise and to progressively adapt to changes. The adaptation algorithm filters the situations in which the workload fluctuates for short periods of time because the costs of P-states transitions (in terms of consumed energy) can outweigh the benefit of the adaptation. It determines the membership of the workload level to a high or low fuzzy interval, and computes the value for a normalized control variable. The value of the control variable determines the dynamic frequency scaling action to be taken, as follows: if it is above 1, the CPU is transitioned in a higher P-State and if it is below -1 the CPU is transitioned in a lower P-State.

The Bio-inspired Local Control Loop

Bio-inspired Local Control Loop tackles the CPU energy efficiency problem using models, techniques and algorithms inspired from the human immune system. The loop implements the following human immune system models: (i) an antigen inspired model to represent the server current energy/performance data, (ii) an immune cell inspired model to represent associations between a detector (a server non optimal energy/performance state) and an effector (an adaptation action used to bring the server in an optimal energy/performance state) and (iii) an immune memory inspired model to represent the adaptation/optimization knowledge base (stores all the immune cells determined during the adaptation processes). The loop implements the following human immune system techniques: (i) self/non-self classification to determine if the current antigen represents an optimal/non optimal energy performance state, (ii) negative selection to create the detectors and effectors for the antigens classified as non-self and (iii) mutation based clonal selection to determine the adaptation actions (dynamic frequency scaling) to be executed for bringing the server in an optimal energy consumption state.

Storage Advanced Management Controller for Energy Efficiency (SAMCEE)

The  Storage  Advanced Management Controller for Energy Efficiency (SAMCEE) manages  application  data with the goal of reducing the energy consumed by the storage subsystem, while maintaining a specified QoS level. It includes two separate controllers: one for disk mode and another for file placement.

The  disk  mode  control  attempts to select the most energy efficient disk (acoustic  AMM)  mode  for  a  given  device's  access  patterns.  The file placement  controller  includes  a  backend  Network Attached Storage (NAS) server  that  manages  the  backend files and exports the managed files and directories via NFS. Application level files are split into smaller chunks, and  each chunk is placed separately on a selected device. The selection of the  most  appropriate  location for chunk placement is based on device and chunk  ranking,  calculated  according  to  usage centric energy efficiency criteria  for storage. The disks for chunk placement are selected among the top   ranking   devices  as  computed  at  a  given  point  in  time.  Data consolidation  and  high  device usage is achieved by giving high weight to the  capacity energy efficiency metric. Consolidating the data into a small number  of active devices allows to put unallocated disks in stand-by mode and thus saves energy. High device usage allows to leverage the disk access optimization  and  to  select  a  proper  acoustic  mode.  The  SAMCEE main sub-components are:

a)  The   two  controllers  for  file placement and acoustic mode selection

described above

b) The FUSE file splitter  and file level statistics collection module

A file splitter and file level statistics collection mechanism was designed and  developed using the user space file system framework FUSE. This module defines a mount point (exportable by NFS), and every user file stored under this  mount  point  is  managed,  split  into  chunks and monitored for its read/write IOPS and throughput. This mechanism and information are  used by SAMCEE for file placement control.

c) High level Application to file mapping agent module

The  application  to  file  mapping  agent  identifies  and maps high level services  and  processes to storage file resources. It also sends to SAMCEE (via  API  calls or the EPKB database) annotations (high level indications) on  application  files,  which  may be used by SAMCEE in its file placement decision making and in aggregating power consumption for an application per its associated files .

d) Linux extension module

In order to obtain statistical data that is unavailable in current Linux OS versions,  the  Linux  kernel  was  extended to collect the additional data items.  Examples of the Linux patch code include extension of the Linux VFS layer to collect the portion of disk sequential accesses and the disk queue length, number of read/write I/Os per device and the average throughput.

The  Linux  patch writes the newly collected data to the standard diskstats pseudo-file,  and  this  file is inspected and read by a special agent that manages  the  data.  Information  is  collected  for  all  Linux identified devices.

e) Python low level collection agent

In order to collect the data from a given set of devices (e.g., SATA disks) and communicate that information in a convenient manner, A Python perfAgent agent was designed and developed, which can be configured and accessed over a  TCP  socket in a convenient manner. Another agent – performanceService – persists the information into a mysql disk_perf table for further use.