Analytical Review of Data Visualization Methods in Application to Big Data
Hindawi Publishing Corporation
Journal of Electrical and Computer Engineering
Volume 2013, Article ID 969458, 7 pages
http://dx.doi.org/10.1155/2013/969458
Review Article
Analytical Review of Data Visualization Methods in
Application to Big Data
Evgeniy Yur’evich Gorodov and Vasiliy Vasil’evich Gubarev
Novosibirsk State Technical University, St. Karla Marksa, 20-630073 Novosibirsk, Russia
Correspondence should be addressed to Vasiliy Vasil’evich Gubarev;
Received 28 March 2013; Revised 29 September 2013; Accepted 10 October 2013
Academic Editor: Mohammad S. Alam
Copyright © 2013 E. Y. Gorodov and V. V. Gubarev. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
This paper describes the term Big Data in aspects of data representation and visualization. There are some specific problems in Big
Data visualization, so there are definitions for these problems and a set of approaches to avoid them. Also, we make a review of
existing methods for data visualization in application to Big Data and taking into account the described problems. Summarizing
the result, we have provided a classification of visualization methods in application to Big Data.
1. Introduction
The customers need to process secondary data, which is not
directly connected to the customers business which has lead
to the phenomenon called Big Data. Bellow we will provide
the definition of the Big Data term.
Big Data, as mentioned by Gubarev Vasiliy Vasil’evich—
is a phenomenon, which have no clear borders, and can be
presented in unlimited or even infinite data accumulation.
And even more, the accumulated data can be presented in
various data formats, most of them are not structural data
flows.
Usually, under the term of Big Data we understand a large
data set, with volume growing exponentially. This data set can
be too large, too “raw”, or too unstructured for classical data
processing methods, used in relational data bases theory. Still,
the main concern in that question is not the data volume, but
the field of application of that data [1].
It is used to provide the following Big Data properties
in different analytical literature sources: large volume of data
(Volume), multiformat data presentation (Variety), and high
data processing speed (Velocity). It is thought that if the exact
data satisfies only two of three described properties, it can
be related to the Big Data class [2, 3]. Therefore, nowadays,
there are the following Big Data classes: “Volume-Velocity”
class, “Volume-Variety” class, “Velocity-Variety” class, and
“Volume-Velocity-Variety” class.
The Big Data processing is not a trivial task at all, and it
requires special methods and approaches. Graphical thinking
is a very simple and natural type of data processing for a
human being, so, it can be said, that image data representation
is an effective method, which allows for easing data understanding and provides enough support for decision making.
But, in case of Big Data, most of classical data representation
methods become less effective or even not applicable for
concrete tasks. Analysis of applicability for one of the concrete
classes of Big Data is a topical problem of subject area as
there are no such case studies held before. Therefore, there is a
purpose for this paper: Classification of existing visualization
methods by criterion of its applicability to one of the described
Big Data classes.
To make a decision for classification to one of described
Big Data classes, method needs to be analyzed from the following points: applicability for a large volume data, possibility
of data visualization, presented in different data formats,
speed, and performance of data presentation.
2
2. Big Data Visualization Problems
Paying attention to the described Big Data properties, we can
identify the following problems, making visualization not a
trivial task.
2.1. Visual Noise. The simple presentation of whole array of
data, being studied, can become a total mess on a screen,
we will see only one big spot, consisting of points, which
represents each data row. This problem comes from the fact
that most of the objects in dataset are too relative to each
other, and on the screen watcher cannot divide them as
separate objects. So, sometimes, the analyst cannot get even
a bit of useful information from whole data visualization
without any preprocessing tasks. It must be mentioned that
under the noise in this topic we should not understand any
data damage or distortion, it is just should be thought of as a
phenomenon of data visibility loss.
2.2. Large Image Perception. As a solution for the above
problem, comes an approach, concluded in data distribution
above a larger screen. But, occasionally, it ends up in another
problem which is large image perception. There is a certain
level of human being perception for different data visualization. Despite that this level for graphical data visualization is
much higher, compared to table data visualization, it has its
own limitations. And after achieving this level of perception,
the human being just looses the ability to acquire any useful
information from the data overloaded view. All visualization
methods are limited by device resolution that is responsible
for visualization output, so there is a limit on the number of
points to show per visualization. Of course, we can replace
visualization device for a modern one or a group of devices
for partial data visualization, allowing us to present a more
detailed image with a larger number of data points, but even
if we could repeat this process for an infinite number of times,
we will meet a human perception limitation. With growth
of data volumes shown at once, human being will meet a
difficulty in understanding data and its analysis.
Therefore, it can be said that data visualization methods
are limited not only by aspect ratio and resolution of device
but also by physical perception limits.
2.3. Information Loss. On the other hand, the approaches,
which end up in reduction of visible data sets can be used. But,
despite the solving of the above problems, these approaches
lead to another problem which is information loss. These
approaches operate with data aggregation and filtration,
based on the relatedness of objects in concrete dataset by
one or more criteria. Using these approaches can mislead
the analyst, when he cannot notice some interesting hidden
objects, and, sometimes, complex aggregation process can
consume a large amount of time and performance resources
in order to get the accurate and required information.
2.4. High Performance Requirements. The graphical analysis
does not stop on only static image visualization, so the above
problems become more significant in dynamic visualization.
Journal of Electrical and Computer Engineering
And there is also another problem, which can be hardly
noti (...truncated)