Evaluating cloud database migration options using workload models
Ellison et al. Journal of Cloud Computing: Advances, Systems
and Applications
Evaluating cloud database migration options using workload models
Martyn Ellison
Radu Calinescu
Richard F. Paige
0 Department of Computer Science, University of York , Deramore Lane, York , UK
A key challenge in porting enterprise software systems to the cloud is the migration of their database. Choosing a cloud provider and service option (e.g., a database-as-a-service or a manually configured set of virtual machines) typically requires the estimation of the cost and migration duration for each considered option. Many organisations also require this information for budgeting and planning purposes. Existing cloud migration research focuses on the software components, and therefore does not address this need. We introduce a two-stage approach which accurately estimates the migration cost, migration duration and cloud running costs of relational databases. The first stage of our approach obtains workload and structure models of the database to be migrated from database logs and the database schema. The second stage performs a discrete-event simulation using these models to obtain the cost and duration estimates. We implemented software tools that automate both stages of our approach. An extensive evaluation compares the estimates from our approach against results from real-world cloud database migrations.
Database modelling; Cloud migration; Enterprise systems; Model-driven engineering
Introduction
The benefits of hosting an enterprise system on the
cloud — instead of on-premise physical servers — are
well understood and documented [
1
]. Some organisations
have been using clouds for over a decade and are
considering switching provider [
2
], while others are planning
an initial migration [
3
]. In either case, the most
challenging component to migrate is often the database due to
the size and importance of the data it contains.
However, the existing cloud migration work focuses on the
software components and gives minimal consideration
to data. For instance, the ARTIST [
4
] and REMICS [
5
]
cloud migration methodologies refer to the database but
do not support any database specific challenges. Similarly,
cloud deployment simulators like CDOSim [
6
] focus only
on compute resources. The limitations of these existing
cloud migration methodologies are described further in
“Related work” section.
Migrating large relational databases from physical
infrastructure into the cloud presents many significant
challenges, e.g., managing system downtime, choosing
suitable cloud instances, and choosing a cloud provider.
The database could be deployed on a
database-as-aservice offered by one of several public cloud providers,
or installed and configured on a virtual machine(s). With
either option, selecting the appropriate cloud resources
requires knowledge of the database workload and size.
The infrastructure of the source database may impact the
migration duration; if it has limited available capacity or
bandwidth, then it will take longer to extract the data. An
organisation may wish to upgrade the existing database
hardware to speed up migration, or schedule downtime to
migrate the database while it is idle.
In this work, we assist with this decision-making
process via a tool-supported approach for evaluating
cloud database migration options. Our approach has
two stages—database workload and structure modelling,
and database migration simulation—and estimates
migration duration, migration costs, and future cloud running
costs.We assume the source and target databases have an
identical: schema, type (e.g., relational or NoSQL), vendor
(e.g., Oracle or MySQL), and software version.
Changing any of these parameters is a complex activity, which
organisations tend to perform separately (as discussed in
“Approach overview” section).
Given logs and a schema of a candidate database,
the database modelling stage generates: (i) a workload
model conforming with the Structured Metrics
Metamodel (SMM) [
7
], and (ii) a structure model
conforming with the Knowledge Discovery Metamodel (KDM)
[
8
]. The second stage of the approach uses these
models, alongside a cost model of the target cloud platform,
to perform a discrete-event simulation of the database
migration and deployment. To ease the adoption of the
new approach, we implemented two software tools that
automate the main tasks.
We carried out an extensive evaluation of the approach
using several open-source enterprise applications, and a
closed-source system from our industrial project
partner Science Warehouse [
9
]. In particular, our database
modelling method and tool were applied to 15
systems (including Apache OFBiz, and MediaWiki) to obtain
workload and structure models. In each case, the system
was installed on a server and configured with an Oracle
or MySQL database. The experimental results (detailed
later in the paper) show that our tool can extract models
from a (...truncated)