A large data processing algorithm for energy efficiency in a heterogeneous cluster
ITM Web of Conferences
A large data processing algorithm efficiency in a heterogeneous cluster for energy
Lei Wang
Weichun Ge
Zhao Li
Zhenjiang Lei
Shuo Chen
ICT Department
State Grid Liaoning Electric Power Co.
Shenyang
China
.
It is reportedi that the electricity cost to operate a cluster may well exceed its acquisition cost, and the processing of big data requires large scale cluster and long period. Therefore, energy efficient processing of big data is essential for the data owners and users. In this paper, we propose a novel algorithm MinBalance to processing I/O intensive big data tasks energy efficiently in heterogeneous cluster. In the former step, four greedy policies are used to select the proper nodes considering heterogeneity of the cluster. While in the latter step, the workloads of the selected nodes will be well balanced to avoid the energy wastes caused by waiting. MinBalance is a universal algorithm and cannot be affected by the data storage strategies. Experimental results indicate that MinBalance can achieve over 60% energy reduction for large sets over the traditional methods of powering down partial nodes.
1 Introduction
With the development and application of information technology, the data produced is
presented. How to store, manage, and apply these data to become an explosive growth. A
general concern of the business community and academia. You know, there is great value in
big data, so research based on big data is also very much. Many scholars call it the fourth
paradigm of scientific research [1]-[2]. Cloud computing as a kind of Emerging economies
based on economies of scale have become big data the first platform for storage and
processing. The open source cloud meter is the platform Hadoop, HBase, and HadoopDB
have been widely studied and the application. More and more businesses are building their
own big data points the platform deals with growing business data and even offers Various
services based on big data [3]. A lot of hardware resources are required to handle big data.
Include servers, PCS, and even mobile devices. The making of these devices takes a lot of
energy, mainly electricity, to be used globally electricity is generated mainly by thermal
power so the big data also has great challenges to energy and environment [4]. In 2005, show
that a server is within the lifetime of its use the total amount of electricity consumed has
exceeded the purchase cost. And research show that, in 2008, the world's 4400 servers
consumed electricity 0.8% percent, if you go like that, at that rate By 2020, that proportion
will be 3.2%. Epa (US Environmental Protection Agency) issued a report statement in 2006,
the total electricity consumption of American IT agencies was 61 billion KWh, the electricity
bill alone is $4.5 billion [5]-[7]. So that's a concern Big data storage and processing
performance must also be used for energy consumption Give enough attention [8].
This paper mainly discusses the large data processing tasks of I/O intensive. The
computationally intensive tasks are affected by the real-time running state of the processor
large, and different hardware and operating system provided processor control machines.
There are differences in system, so this paper does not consider computationally intensive
large Numbers According to the task. Because of the data-intensive task for the processor
with a small dependency, for a server, the processing of each data block is reduced the time
and power consumed can be regarded as basically the same. A cluster consisting of n
heterogeneous nodes processes a Map-Reduce tasks, assume that the nodes involved in task
processing are C, the total energy consumed during task processing is
Toal cost maxTi Pi
ni C ni C
The Ti and pi represent the processing time of the I node Power consumption. By type (
1
)
the total energy is mainly affected by two factors: use the nodes that perform the tasks and
the maximum processing time of the nodes. Type (
1
) The two kinds of high-efficiency data
processing methods:
1) To select
Some suitable nodes perform tasks to reduce total power consumption;
2) Equilibrium
The load of a node reduces the maximum execution time.
According to the actual situation of the node, determine which tasks each node performs,
namely Equalize the load of each node, reduce task execution time, and further Reduces total
energy consumption of the system. The method has three distinct advantages:
1) Fully consider the heterogeneity of the nodes;
2) There is no copy storage strategy shut;
3) Comprehensively consider the total number of nodes and load balance the factor of
consumption.
2 Problem description
I/O intensive large data processing tasks for heterogeneous clusters. Energy efficient
processing problems can be formalized as follows: given a set, the group is composed of h
isomeric nodes and N = {n1, n2, n3, …, nh}, of which A node ni (1≤i≤h) takes the time and
work required to process a block of d (...truncated)