Cloud storage cost: a taxonomy and survey
World Wide Web
(2024) 27:36
https://doi.org/10.1007/s11280-024-01273-4
Cloud storage cost: a taxonomy and survey
Akif Quddus Khan1 · Mihhail Matskin2 · Radu Prodan3 · Christoph Bussler4 ·
Dumitru Roman5,6 · Ahmet Soylu6
Received: 24 November 2023 / Revised: 11 March 2024 / Accepted: 28 April 2024
© The Author(s) 2024
Abstract
Cloud service providers offer application providers with virtually infinite storage and computing resources, while providing cost-efficiency and various other quality of service (QoS)
properties through a storage-as-a-service (StaaS) approach. Organizations also use multicloud or hybrid solutions by combining multiple public and/or private cloud service providers
to avoid vendor lock-in, achieve high availability and performance, and optimise cost. Indeed
cost is one of the important factors for organizations while adopting cloud storage; however,
cloud storage providers offer complex pricing policies, including the actual storage cost and
the cost related to additional services (e.g., network usage cost). In this article, we provide
a detailed taxonomy of cloud storage cost and a taxonomy of other QoS elements, such as
network performance, availability, and reliability. We also discuss various cost trade-offs,
including storage and computation, storage and cache, and storage and network. Finally, we
provide a cost comparison across different storage providers under different contexts and
a set of user scenarios to demonstrate the complexity of cost structure and discuss existing
literature for cloud storage selection and cost optimization. We aim that the work presented in
this article will provide decision-makers and researchers focusing on cloud storage selection
for data placement, cost modelling, and cost optimization with a better understanding and
insights regarding the elements contributing to the storage cost and this complex problem
domain.
Keywords Storage-as-a-Service · Cloud storage · Cost taxonomy · Cost optimization ·
Multi-cloud · Hybrid cloud
1 Introduction
The amount of data that is being created, gathered, processed, and analyzed through advanced
analytics, artificial intelligence, and machine learning (AI/ML) techniques is ever-increasing,
driven by the widespread adoption of data communication technologies (ICT), such as social
media applications, the Internet of Things (IoT), and sensors. Cloud computing offers the
elasticity needed in terms of storage and computing resources to scale such growth while
providing cost-efficiency and various other quality of service (QoS) properties, such as availability and security, through the storage as a service (StaaS) approach [73]. Indeed, cost is
one of the important factors for organizations while adopting cloud storage [95]; however,
Extended author information available on the last page of the article
0123456789().: V,-vol
123
36
Page 2 of 54
World Wide Web
(2024) 27:36
cloud storage providers offer complex pricing policies, including the actual storage cost and
the related services (e.g., network) [63]. Given the increasing use of StaaS and its rapidly
growing economic value [62], cost optimization for StaaS has become a challenging endeavor
for industry and research. The goal is to minimize the cost of data storage under complex
and diverse pricing policies coupled with varying storage and network resources and services
offered by cloud service providers (CSPs) [70].
Cloud storage cost might widely vary depending on a company’s needs. A wide range
of interrelated parameters affect the cost, from retrieval frequency and storage capacity to
network bandwidth. There are also various cost trade-offs emerging due to the varying pricing policies of cloud storage providers and the usage characteristics of application providers
[73], which require a cost-benefit analysis based on a cost model. An example could be the
cost trade-off between storing the results of computation and re-computing the results, where
the decision depends on the size and usage pattern of the data. Furthermore, organizations
also use multi-cloud or hybrid solutions [117] by combining multiple public and/or private
cloud storage providers to avoid vendor lock-in, achieve high availability and performance,
and optimize cost [104]. An application deployed using multiple public and/or private cloud
providers distributed over several regions has the ability to enhance the application’s performance while reducing cost substantially. According to a survey [52], user satisfaction
dramatically decreases by a slight increase of 100 ms in Web page presentation time, i.e.,
latency, which requires storing the requested data in data centers close to the users of the
Web application to decrease data access latency. Data locality improves availability, but it
is costly since it usually requires storing multiple replicas with known cost trade-offs concerning bandwidth or computing [73]. The cost structure for using cloud storage services is
complex, unclear, and complicated, particularly in a multi-cloud or hybrid ecosystem. Comprehensive models and mechanisms are required to optimize the cost of using cloud storage
services and to make informed storage service selection decisions for data placement, for
which it is essential to understand this complex cost structure, associated parameters, and
trade-offs between different parameters affecting the cost.
The cloud storage providers tout ostensibly simple use-based pricing plans when it comes
to pricing; however, a practical cost analysis of cloud storage is not straightforward [53], and
there are a limited number of studies that focus on cost optimization across multiple CSPs
with varying price policies [60]. According to a survey among record-keeping professionals
[78], 86% of the respondents opt for cloud storage to save costs, while only 19% use cost
models. This is because either cost models are complicated to implement or do not meet their
requirements. In this respect, this article provides a detailed taxonomy of cloud storage cost,
including storage and network usage costs, and a taxonomy of QoS elements such as network performance, availability, and reliability [48]. We collected and analyzed data from the
documentation of three major cloud service providers to find commonalities and differences,
to provide a comprehensive taxonomy of cloud storage cost. It fills this gap by providing
a structured approach that can be used to develop tools for cost optimization and provides
a basis for more meaningful cost comparisons between cloud storage providers, which can
help organizations make more informed decisions about their cloud storage strategy. We also
discuss various cost trade-offs, including storage and computation, storage and cache, and
storage and network, and provide cost and latency comparison examples for major cloud
storage providers, such as Google Cloud Storage, Microsoft Azure Storage, and Amazon
Web Services, along with a set of user scenarios to demo (...truncated)