APPLICATION OF EVENT SOURCING AND CQRS PATTERNS IN DISTRIBUTED SYSTEMS
Міжвідомчий науково-технічний збірник «Адаптивні системи автоматичного управління» № 1’ (34) 2019
UDC 004-042
S.O. Diakov, T. E. Zubrei, A.S. Samoidiuk
APPLICATION OF EVENT SOURCING AND CQRS PATTERNS
IN DISTRIBUTED SYSTEMS
Annotation: The purpose of this report is finding suitable approaches for dealing with
the issue, particularly ability to recreate system state in modern high load distributed systems.
In order to achieve the goal, the report will overview existing problems, compare conventional design to proposed architecture solutions. A combination of command query responsibility segregation (CQRS) and event sourcing is suggested to solve performance and design
issues that often arise in conventional information systems development.
Keywords: CRUD, CQRS, event sourcing, software architecture, design patterns, data
modeling.
Problem Statement
The common template for any data-oriented application is multi-level architecture [1]. The main idea is to use the division of responsibility to keep the presentation, data storage and business logic separated from each other. The persistence
layer should not know about the mechanisms used to store and retrieve data, it's
only responsible for data operation with storage. A data layer has to deal with different relations between entities and often works as a bridge from application domain model to its normalized view in data storage. Data changes are generally
expressed as C, U, and D of CRUD (create, update and delete).
Fig. 1 - A traditional CRUD architecture
So, what's wrong with this approach? This model is so popular that most people
are not even thinking about an alternative, and for simple applications it may
not cause any problems. However, there are a few shortcomings in this conventional
architecture.
The common problem of CRUD applications is that they are receive all data
S.O. Diakov, T. E. Zubrei, A.S. Samoidiuk
16
ISSN 1560-8956
Міжвідомчий науково-технічний збірник «Адаптивні системи автоматичного управління» № 1’ (34) 2019
models and views from its primary data storage on which they depend. It enforces
two different requirement of data structure: fast writes and fast reads. These parameters are hard to balance using only one solution and in most cases this problem
is lessen by adding caches. However, with caches comes additional complexity
which requires tremendous knowledge to handle properly.
Another issue with CRUD-like systems is violation of single responsibility
principle, since update operation may not only do the update but also read newly
changed data. The User object may have an id or update datetime, or other generated data that is present while reading the object, however the persistence layer will
forbid you from updating them yourself. Therefore, code becomes more unmaintainable as the scope of application grows.
In fact, writing and reading can be differentiated by its priorities:
Table 1
Comparison of areas of concern of two operations
Writing
Reading
Assuring data integrity
Perform efficient queries and lookups
Enabling atomic updates and transactions
Calculate derived and aggregated values
(sum, average, etc.)
Optimistic concurrency or locking
Provide a number of data views
Enforcing write permissions
Enforce row and column level permissions
An overview of existing solutions
There are a lot of way to deal with some of the issue above. Most application
use caches [8] as fast and denormalized way to access heavily requested data.
However introducing them adds new layer of complexity since caches need to be
synchronized for all instances of application, its size and objects it contains is a
very debatable subject considering different applications may use it for various
amount of reasons. But the most difficult task regarding caches is keeping them upto-date. Caches aren’t the persistent data storage meaning they have to be rebuilt on
each application launch from some source. This creates a gap between this data
storage - source of truth - and caches.
Transactions [9] are mostly considered as silver bullet of CRUD applications.
While they are effective at keeping data integrity is SQL databases it causes a lot of
overhead and business logic put on the data storage further violating single responsibility pattern. Transactions that are held open for quite a long time make the data
storage track changed rows of frequently-modified tables that could be cleared.
Moreover, it is really costly to roll back transactions. For some databases to roll
back transactions takes more time than committing it.
Another way to increase application performance is vertical scaling [10]. It’s a
concept of adding more resource to single instance allowing for faster computation
ISSN 1560-8956
17
Міжвідомчий науково-технічний збірник «Адаптивні системи автоматичного управління» № 1’ (34) 2019
of larger amount of data. In most scenarios the servers are already at full capacity
physically. To see an actual impact of scaling typically would involve purchasing
an entire new server to replace the old one. This is still vertical scaling from the database/application point of view. Having one high performance server is generally
more expensive than buying a few less powerful.
Proposed solution
Main idea of CQRS (Command Query Responsibility Segregation) is separation
of models for updating and reading information. Collaboration and staleness are two
driving forces of CQRS. Collaboration means set of rules on how many participants will
use / change the same shared data. Often there are rules that indicate which actor can
execute which modifications and modifications that can be accepted in one case may be
forbidden in another. Actors can be people like ordinary users or automated as software.
Staleness is demonstrated by the fact that in shared use, when data is shown to
one actor, it may be changed by another one. Almost any system that uses caches
serves stale data - often due to performance reasons. This means that we can
not take into account the decisions of their actors, because they can be made on the
out-of-date data. Standard layered architectures do not address any of these issues.
Although all in one database could be a single step in the direction of collaboration,
the staleness is usually more unpleasant in these architectures because of the caches
usage as a performance improvement after data modeling is already done.
These issues are addressed in CQRS via read and write models segregation,
where queries are for reading while commands deal with data updates.
Commands have to tell what needs to be done rather that tell how it should
be done. ("Switch the lights on, "not" set LightStatus to ON."). In most scenarios
they are put in some queue for later asynchronous processing.
Queries aren’t allowed to change anything in data storage. They should return DTO which is a container for data. You may think of it as a struct.
Comparing data flow of CRUD-based applications to CQRS ones, i (...truncated)