Preface (pdf) | Paperity

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs40745-015-0036-x.pdf

Preface

Ann. Data. Sci. (2015) 2(1):1–3 DOI 10.1007/s40745-015-0036-x Preface Some Advanced Techniques in Data Science Yong Shi1,2,3 · Yingjie Tian1,2 Published online: 26 May 2015 © Springer-Verlag Berlin Heidelberg 2015 This issue of 2015, Annals of Data Science (Volume 2, No. 1) presents seven papers from the several areas of data science. They are contributed from 20 authors and the co-authors come from six countries and regions: Australia, Brazil, Iran, Spain, Russia and UK. The first paper, “Forecasting with Big Data: A Review,” by Hossein Hassani1 and Emmanuel Sirimal Silva, presents a comprehensive review on the use of Big Data for forecasting by identifying and reviewing the problems, potential, challenges and most importantly the related applications. Skills, hardware and software, algorithm architecture, statistical significance, the signal to noise ratio and the nature of Big Data itself are identified as the major challenges which are hindering the process of obtaining meaningful forecasts from Big Data. The review finds that at present, the fields of economics, energy and population dynamics have been the major exploiters of Big Data forecasting whilst factor models, Bayesian models and neural networks are the most common tools adopted for forecasting with Big Data. The second paper, “Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance,” by Vu Nguyen, Dinh Phung, Duc-Son Pham and Svetha Venkatesh, revisits the abnormality detection problem through the lens of Bayesian nonparametric (BNP) B Yong Shi Yingjie Tian 1 Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, China 2 Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China 3 College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE 68182, USA 123 2 Ann. Data. Sci. (2015) 2(1):1–3 and develop a novel usage of BNP methods for this problem. In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. In particular, this paper employs the infinite hidden Markov model and Bayesian nonparametric factor analysis for stream data segmentation and pattern discovery. In addition, it introduces an interactive system allowing users to inspect and browse suspicious events. The third paper, “Indebted Households Profiling: A Knowledge Discovery from Database Approach,” by Rodrigo Arnaldo Scarpel, Alexandros Ladas and Uwe Aickelin, is to employ a knowledge discovery from database process to identify groups of indebted households and describe their profiles using a database collected by the Consumer Credit Counselling Service (CCCS) in the UK. Employing a framework that allows the usage of both categorical and continuous data altogether to find hidden structures in unlabelled data it was established the ideal number of clusters and such clusters were described in order to identify the households who exhibit a high propensity of excessive debt levels. The forth paper, “Refining a Taxonomy by Using Annotated Suffix Trees and Wikipedia Resources,” by Ekaterina Chernyak and Boris Mirkin, presents a stepby-step approach to taxonomy construction. On the first step, the upper layer frame of taxonomy is built manually according to educational materials. On the next steps, the frame is refined at a chosen topic using theWikipedia category tree and articles, both cleaned of noise. This main tool in this is a naturally defined string-to-text relevance score, based on annotated suffix trees. The relevance scoring is used at several tasks: (1) cleaning the Wikipedia tree or page set of noise; (2) allocating Wikipedia categories to taxonomy topics; (3) deciding whether an allocated category should be included as a child to the taxonomy topic, etc. The resulting fragment of taxonomy consists of three parts: the manually set upper layer topic, the adopted part of the Wikipedia category tree and Wikipedia articles as leaves. Every leaf is assigned a set of so-called descriptors; these are phrases explaining aspects of the leaf topic. The method is illustrated by its application to two domains in the area of mathematics: (a) “Probability theory and mathematical statistics”, (b) “Numerical mathematics” (both in Russian). The fifth paper, “Estimation of Stress–Strength Reliability for the Generalized Pareto Distribution Based on Progressively Censored Samples,” by S. Rezaei, R. Alizadeh Noughabi and S. Nadarajah, deals with the estimation of stress-strength reliability parameter, R = P (Y < X), based on progressively type II censored samples when stress, strength are two independent generalized Pareto random variables. The maximum likelihood estimators, their asymptotic distributions, asymptotic confidence intervals, bootstrap based confidence intervals and Bayes estimators are derived for R. Using Monte Carlo simulations, the MSE, Bayes risk estimators, credible sets and coverage probabilities are computed and compared. The sixth paper, “Event Management for Sensing Enterprises with Decision Support Systems,” by Andrés Boza, M. M. E. Alemany, Llanos Cuenca and Angel Ortiz, exposes a decision support system (DSS) to real-time events 123 Ann. Data. Sci. (2015) 2(1):1–3 3 and it is possible to start the decision process from scratch in case any unexpected internal and external events take place. An event monitoring and management system should interact with the DSS to manage events that might affect their decisions. It should act as a supra-system to identify when decisions made are still valid or need to be reanalyzed. The traditional configuration of DSS (where they collect internal and external information of the organization and the decision-maker is involved in the decision-making process) should be extended to treat event management using a monitoring and management system, which monitors internal and external information and facilitate the introduction of no monitored events. This monitor and manager systems become more and more necessary due to the incessant incorporation of new technologies that enables the companies to be more context-sensitive. Furthermore, this new and/or more accurate information, which is obtained for the organization, requires a proper management. The last paper, “Novel Approach for Network Traffic Pattern Analysis using Clustering-based Collective Anomaly (...truncated)