APPLICATION OF EVENT SOURCING AND CQRS PATTERNS IN DISTRIBUTED SYSTEMS (pdf)

Article PDF cannot be displayed. You can download it here:

https://asac.kpi.ua/article/download/178224/179217

APPLICATION OF EVENT SOURCING AND CQRS PATTERNS IN DISTRIBUTED SYSTEMS

Міжвідомчий науково-технічний збірник «Адаптивні системи автоматичного управління» № 1’ (34) 2019 UDC 004-042  S.O. Diakov, T. E. Zubrei, A.S. Samoidiuk APPLICATION OF EVENT SOURCING AND CQRS PATTERNS IN DISTRIBUTED SYSTEMS Annotation: The purpose of this report is finding suitable approaches for dealing with the issue, particularly ability to recreate system state in modern high load distributed systems. In order to achieve the goal, the report will overview existing problems, compare conventional design to proposed architecture solutions. A combination of command query responsibility segregation (CQRS) and event sourcing is suggested to solve performance and design issues that often arise in conventional information systems development. Keywords: CRUD, CQRS, event sourcing, software architecture, design patterns, data modeling. Problem Statement The common template for any data-oriented application is multi-level architecture [1]. The main idea is to use the division of responsibility to keep the presentation, data storage and business logic separated from each other. The persistence layer should not know about the mechanisms used to store and retrieve data, it's only responsible for data operation with storage. A data layer has to deal with different relations between entities and often works as a bridge from application domain model to its normalized view in data storage. Data changes are generally expressed as C, U, and D of CRUD (create, update and delete). Fig. 1 - A traditional CRUD architecture So, what's wrong with this approach? This model is so popular that most people are not even thinking about an alternative, and for simple applications it may not cause any problems. However, there are a few shortcomings in this conventional architecture. The common problem of CRUD applications is that they are receive all data  S.O. Diakov, T. E. Zubrei, A.S. Samoidiuk 16 ISSN 1560-8956 Міжвідомчий науково-технічний збірник «Адаптивні системи автоматичного управління» № 1’ (34) 2019 models and views from its primary data storage on which they depend. It enforces two different requirement of data structure: fast writes and fast reads. These parameters are hard to balance using only one solution and in most cases this problem is lessen by adding caches. However, with caches comes additional complexity which requires tremendous knowledge to handle properly. Another issue with CRUD-like systems is violation of single responsibility principle, since update operation may not only do the update but also read newly changed data. The User object may have an id or update datetime, or other generated data that is present while reading the object, however the persistence layer will forbid you from updating them yourself. Therefore, code becomes more unmaintainable as the scope of application grows. In fact, writing and reading can be differentiated by its priorities: Table 1 Comparison of areas of concern of two operations Writing Reading Assuring data integrity Perform efficient queries and lookups Enabling atomic updates and transactions Calculate derived and aggregated values (sum, average, etc.) Optimistic concurrency or locking Provide a number of data views Enforcing write permissions Enforce row and column level permissions An overview of existing solutions There are a lot of way to deal with some of the issue above. Most application use caches [8] as fast and denormalized way to access heavily requested data. However introducing them adds new layer of complexity since caches need to be synchronized for all instances of application, its size and objects it contains is a very debatable subject considering different applications may use it for various amount of reasons. But the most difficult task regarding caches is keeping them upto-date. Caches aren’t the persistent data storage meaning they have to be rebuilt on each application launch from some source. This creates a gap between this data storage - source of truth - and caches. Transactions [9] are mostly considered as silver bullet of CRUD applications. While they are effective at keeping data integrity is SQL databases it causes a lot of overhead and business logic put on the data storage further violating single responsibility pattern. Transactions that are held open for quite a long time make the data storage track changed rows of frequently-modified tables that could be cleared. Moreover, it is really costly to roll back transactions. For some databases to roll back transactions takes more time than committing it. Another way to increase application performance is vertical scaling [10]. It’s a concept of adding more resource to single instance allowing for faster computation ISSN 1560-8956 17 Міжвідомчий науково-технічний збірник «Адаптивні системи автоматичного управління» № 1’ (34) 2019 of larger amount of data. In most scenarios the servers are already at full capacity physically. To see an actual impact of scaling typically would involve purchasing an entire new server to replace the old one. This is still vertical scaling from the database/application point of view. Having one high performance server is generally more expensive than buying a few less powerful. Proposed solution Main idea of CQRS (Command Query Responsibility Segregation) is separation of models for updating and reading information. Collaboration and staleness are two driving forces of CQRS. Collaboration means set of rules on how many participants will use / change the same shared data. Often there are rules that indicate which actor can execute which modifications and modifications that can be accepted in one case may be forbidden in another. Actors can be people like ordinary users or automated as software. Staleness is demonstrated by the fact that in shared use, when data is shown to one actor, it may be changed by another one. Almost any system that uses caches serves stale data - often due to performance reasons. This means that we can not take into account the decisions of their actors, because they can be made on the out-of-date data. Standard layered architectures do not address any of these issues. Although all in one database could be a single step in the direction of collaboration, the staleness is usually more unpleasant in these architectures because of the caches usage as a performance improvement after data modeling is already done. These issues are addressed in CQRS via read and write models segregation, where queries are for reading while commands deal with data updates.  Commands have to tell what needs to be done rather that tell how it should be done. ("Switch the lights on, "not" set LightStatus to ON."). In most scenarios they are put in some queue for later asynchronous processing.  Queries aren’t allowed to change anything in data storage. They should return DTO which is a container for data. You may think of it as a struct. Comparing data flow of CRUD-based applications to CQRS ones, i (...truncated)