Analytical Review of Data Visualization Methods in Application to Big Data (pdf)

Article PDF cannot be displayed. You can download it here:

http://downloads.hindawi.com/journals/jece/2013/969458.pdf

Analytical Review of Data Visualization Methods in Application to Big Data

Hindawi Publishing Corporation Journal of Electrical and Computer Engineering Volume 2013, Article ID 969458, 7 pages http://dx.doi.org/10.1155/2013/969458 Review Article Analytical Review of Data Visualization Methods in Application to Big Data Evgeniy Yur’evich Gorodov and Vasiliy Vasil’evich Gubarev Novosibirsk State Technical University, St. Karla Marksa, 20-630073 Novosibirsk, Russia Correspondence should be addressed to Vasiliy Vasil’evich Gubarev; Received 28 March 2013; Revised 29 September 2013; Accepted 10 October 2013 Academic Editor: Mohammad S. Alam Copyright © 2013 E. Y. Gorodov and V. V. Gubarev. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This paper describes the term Big Data in aspects of data representation and visualization. There are some specific problems in Big Data visualization, so there are definitions for these problems and a set of approaches to avoid them. Also, we make a review of existing methods for data visualization in application to Big Data and taking into account the described problems. Summarizing the result, we have provided a classification of visualization methods in application to Big Data. 1. Introduction The customers need to process secondary data, which is not directly connected to the customers business which has lead to the phenomenon called Big Data. Bellow we will provide the definition of the Big Data term. Big Data, as mentioned by Gubarev Vasiliy Vasil’evich— is a phenomenon, which have no clear borders, and can be presented in unlimited or even infinite data accumulation. And even more, the accumulated data can be presented in various data formats, most of them are not structural data flows. Usually, under the term of Big Data we understand a large data set, with volume growing exponentially. This data set can be too large, too “raw”, or too unstructured for classical data processing methods, used in relational data bases theory. Still, the main concern in that question is not the data volume, but the field of application of that data [1]. It is used to provide the following Big Data properties in different analytical literature sources: large volume of data (Volume), multiformat data presentation (Variety), and high data processing speed (Velocity). It is thought that if the exact data satisfies only two of three described properties, it can be related to the Big Data class [2, 3]. Therefore, nowadays, there are the following Big Data classes: “Volume-Velocity” class, “Volume-Variety” class, “Velocity-Variety” class, and “Volume-Velocity-Variety” class. The Big Data processing is not a trivial task at all, and it requires special methods and approaches. Graphical thinking is a very simple and natural type of data processing for a human being, so, it can be said, that image data representation is an effective method, which allows for easing data understanding and provides enough support for decision making. But, in case of Big Data, most of classical data representation methods become less effective or even not applicable for concrete tasks. Analysis of applicability for one of the concrete classes of Big Data is a topical problem of subject area as there are no such case studies held before. Therefore, there is a purpose for this paper: Classification of existing visualization methods by criterion of its applicability to one of the described Big Data classes. To make a decision for classification to one of described Big Data classes, method needs to be analyzed from the following points: applicability for a large volume data, possibility of data visualization, presented in different data formats, speed, and performance of data presentation. 2 2. Big Data Visualization Problems Paying attention to the described Big Data properties, we can identify the following problems, making visualization not a trivial task. 2.1. Visual Noise. The simple presentation of whole array of data, being studied, can become a total mess on a screen, we will see only one big spot, consisting of points, which represents each data row. This problem comes from the fact that most of the objects in dataset are too relative to each other, and on the screen watcher cannot divide them as separate objects. So, sometimes, the analyst cannot get even a bit of useful information from whole data visualization without any preprocessing tasks. It must be mentioned that under the noise in this topic we should not understand any data damage or distortion, it is just should be thought of as a phenomenon of data visibility loss. 2.2. Large Image Perception. As a solution for the above problem, comes an approach, concluded in data distribution above a larger screen. But, occasionally, it ends up in another problem which is large image perception. There is a certain level of human being perception for different data visualization. Despite that this level for graphical data visualization is much higher, compared to table data visualization, it has its own limitations. And after achieving this level of perception, the human being just looses the ability to acquire any useful information from the data overloaded view. All visualization methods are limited by device resolution that is responsible for visualization output, so there is a limit on the number of points to show per visualization. Of course, we can replace visualization device for a modern one or a group of devices for partial data visualization, allowing us to present a more detailed image with a larger number of data points, but even if we could repeat this process for an infinite number of times, we will meet a human perception limitation. With growth of data volumes shown at once, human being will meet a difficulty in understanding data and its analysis. Therefore, it can be said that data visualization methods are limited not only by aspect ratio and resolution of device but also by physical perception limits. 2.3. Information Loss. On the other hand, the approaches, which end up in reduction of visible data sets can be used. But, despite the solving of the above problems, these approaches lead to another problem which is information loss. These approaches operate with data aggregation and filtration, based on the relatedness of objects in concrete dataset by one or more criteria. Using these approaches can mislead the analyst, when he cannot notice some interesting hidden objects, and, sometimes, complex aggregation process can consume a large amount of time and performance resources in order to get the accurate and required information. 2.4. High Performance Requirements. The graphical analysis does not stop on only static image visualization, so the above problems become more significant in dynamic visualization. Journal of Electrical and Computer Engineering And there is also another problem, which can be hardly noti (...truncated)