Factor Space, the Theoretical Base of Data Science

Annals of Data Science, Oct 2014

This paper introduces factor space theory, which provides a general coordinate system to describe the real world and a theoretical base for data science. Based on the theory, factorial databases is presented, which carries a new kind of statistics to do intelligent analysis for coming tide of Big Data.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs40745-014-0017-5.pdf

Factor Space, the Theoretical Base of Data Science

Ann. Data. Sci. (2014) 1(2):233–251 DOI 10.1007/s40745-014-0017-5 Factor Space, the Theoretical Base of Data Science Pei-Zhuang Wang · Zeng-Liang Liu · Yong Shi · Si-Cong Guo Received: 1 July 2014 / Revised: 15 August 2014 / Accepted: 10 September 2014 / Published online: 28 October 2014 © Springer-Verlag Berlin Heidelberg 2014 Abstract This paper introduces factor space theory, which provides a general coordinate system to describe the real world and a theoretical base for data science. Based on the theory, factorial databases is presented, which carries a new kind of statistics to do intelligent analysis for coming tide of Big Data. Keywords Factor space · Factorial databases · Background relation · Factorial neural networks · Factor vane · Sample cultivation · Information fusion Mathematics Subject Classification 90C05 1 Introduction Big Data stylishly leads the current tide, various parlances dazzle people in delightful surprise with confusion. However, we are acutely aware that the core task in the P.-Z. Wang (B) · S.-C. Guo College of Intelligence Engineering and Mathematics, Liaoning Technical University, Fuxin 123000, Liaoning, China e-mail: Z.-L. Liu National Defense University PLA China, Beijing 100091, China e-mail: Y. Shi Research Center of Fictitious Economy and Data Science, Chinese Academy of Science, Beijing 100080, China e-mail: S.-C. Guo e-mail: 123 234 Ann. Data. Sci. (2014) 1(2):233–251 tide is promoting the intelligence in Big Data. As the journal’s preface emphasized [1], data science ‘should have its own scientific contents such as axioms, laws and rues, which are fundamentally important for experts in different fields to explore their own interests from Big Data’. Even though there are remarkable achievements in this area, data science still lacks theoretical base on intelligence. As Tsien Hsueshen emphasized [2], ‘To develop intelligent engineering, most important task is building the mathematical theory towards intelligence! ’ This paper aims to introduce Factor space, which provides a general coordinate system for description of things in the world, which is the very mathematical base for data science. Factor space [3] was published in the same year coincidently with the formal conceptual analysis [4] and rough sets [5]. The three branches were the pioneers in intelligence mathematics, but the former one had focused on genetic analysis for uncertainty several years. Factor space is a bridge connecting randomness and certainty. Both ends can be ex-transferred each other according to the varying of the dimension of factors [6]. Based on the idea, intentionally or unintentionally, Kolmogorov presented the fundamental space , a factor space, in the axiomatic definition of probabilistic field. He drives randomness into an inevitable framework, took a march of mathematics toward random phenomena. Without the idea of factor space, probability could not realize modernization in the thirties of the last century. Factor space is a bridge connecting fuzziness and certainty also. This bridge and the bridge mentioned above shows a duality: The fuzziness on the ground, the universe U , can be viewed as the randomness in the sky, the power P(U ) of U . Based on the idea of factor space, Wang presented the theory of Fuzzy Shadows [7] to treat fuzzy set as the covering function of a random set, which provides a firm base to fuzzy sets theory and has been applied in fuzzy controllers [8] and several areas. As a summary, the book “Fuzzy System Theory and Fuzzy Computer” [9] was published in 1997. The trace of factor space on intelligence was shown in the books ‘Mathematical theory of Knowledge Representation’ [10], ‘Theory and Applications of Factorial Neural Networks’ [11], ‘Attribute method in Thinking and Intelligence Science’ [12] etc. Some represented papers can be found in [13–27]. Factor space has common goals with formal concept and rough sets, the authors of this paper emphasize the importance of background relation (called the formal background by R. Wills) and take deep study on the relation. Factor space provides a population theory to information systems in rough sets. All branches will cooperate to establish firm mathematical bases for data science. Organization of the paper: Sect. 2. Introduction to factor and factors space; Sect. 3. Knowledge representation; Sect. 4. Factorial databases; Discussions including a brief conclusion and main tasks are given in Sect. 5. Limited to time, all proofs of propositions are omitted. 2 What is Factor and Factor Space? Gene is the key of biology, which forms, generates, and identifies all living objects. There exists a key opening the door to recognize all things in the universe, which is 123 Ann. Data. Sci. (2014) 1(2):233–251 235 Fig. 1 Factor state space the generalized gene, we call it the Factor. The name of gene was called Mendelian factor originally; a factor is a fact-or, where ‘fact’ stands for any thing and ‘-or’ is the matter who describes, determines, and identifies all things. The Chinese translation of factor, YINSU, mostly fix to the mentioned meaning, factor is the best name for generalized gene. A gene likes a switch with two or more states plugged in a node upon chromosome, each state determines a biological property/quality. Gene is the quality-root of living beings. A factor switches a series of states. For example, Color is a factor, which switches three basic states: Red, Yellow and Blue. Factor is the quality-root of things. From the view of mathematics, a factor f is essentially a mapping, which defined on a domain O; every object is mapped into X f , the range of f . A state of X f can be an attribute, a feature, a characteristic, or a degree etc., they form a dimension with respect to f . A factor is the name of the dimension, which hooks a series of attributes. Factor is the attribute of attributes. Any concrete object has complex quality, which can’t be recognized except taking ‘photo’ from a specified angle/aspect. Factor is the angle of analysis. Without factor, no analysis can be taken. An object can be analyzed by many factors, and get a record, for example, Height (John) = 1.75 m, Weight (John) = 70 kg, Age (John) = 25, Sex (John) = Male, etc. Taking a synthesis after analyses, the Cartesian product of those dimensions form a coordinate system, it is the factor state space, which is a coordinate system with dimensions named by factors. John has been mapped as a point in the coordinate system. Factor space provides a general coordinate system to describe all things in the universe (see Fig. 1). Not only does factor space extend the field of vision, but also bring the flexibility to the coordinate system: For a given task, factor space decreases its dimensions as low as possible! There needs to introduce some operations on factors. For example, Color, Aroma, Taste are three simple factors in foods, and Color–aroma, Color– taste, Aroma–taste, Color–aroma–ta (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs40745-014-0017-5.pdf
Article home page: https://link.springer.com/article/10.1007/s40745-014-0017-5

Pei-Zhuang Wang, Zeng-Liang Liu, Yong Shi, Si-Cong Guo. Factor Space, the Theoretical Base of Data Science, Annals of Data Science, 2014, pp. 233-251, Volume 1, Issue 2, DOI: 10.1007/s40745-014-0017-5