On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation
Empir Software Eng (2018) 23:11 88 – 122 1
DOI 10.1007/s10664-017-9535-z
On the diffuseness and the impact on maintainability
of code smells: a large scale empirical investigation
Fabio Palomba1 · Gabriele Bavota2 ·
Massimiliano Di Penta3 · Fausto Fasano4 ·
Rocco Oliveto4 · Andrea De Lucia5
Published online: 7 August 2017
© The Author(s) 2017. This article is an open access publication
Abstract Code smells are symptoms of poor design and implementation choices that
may hinder code comprehensibility and maintainability. Despite the effort devoted by the
research community in studying code smells, the extent to which code smells in software
systems affect software maintainability remains still unclear. In this paper we present a large
scale empirical investigation on the diffuseness of code smells and their impact on code
change- and fault-proneness. The study was conducted across a total of 395 releases of 30
Communicated by: Ahmed Hassan
Fabio Palomba
Gabriele Bavota
Massimiliano Di Penta
Fausto Fasano
Rocco Oliveto
Andrea De Lucia
1
Delft University of Technology, Delft, The Netherlands
2
Università della Svizzera italiana (USI), Lugano, Switzerland
3
University of Sannio, Benevento, Italy
4
University of Molise, Campobasso, Italy
5
University of Salerno, Fisciano, Italy
Empir Software Eng (2018) 23:11 88 – 122 1
1189
open source projects and considering 17,350 manually validated instances of 13 different
code smell kinds. The results show that smells characterized by long and/or complex code
(e.g., Complex Class) are highly diffused, and that smelly classes have a higher change- and
fault-proneness than smell-free classes.
Keywords Code smells · Empirical studies · Mining software repositories
1 Introduction
Bad code smells (also known as “code smells” or “smells”) were defined as symptoms of
poor design and implementation choices applied by programmers during the development
of a software project (Fowler 1999). As a form of technical debt (Cunningham 1993), they
could hinder the comprehensibility and maintainability of software systems (Kruchten et al.
2012). An example of code smell is the God Class, a large and complex class that centralizes
the behavior of a portion of a system and only uses other classes as data holders. God Classes
can rapidly grow out of control, making it harder and harder for developers to understand
them, to fix bugs, and to add new features.
The research community has been studying code smells from different perspectives. On
the one side, researchers developed methods and tools to detect code smells. Such tools
exploit different types of approaches, including metrics-based detection (Lanza and Marinescu
2010; Moha et al. 2010; Marinescu 2004; Munro 2005), graph-based techniques (Tsantalis
and Chatzigeorgiou 2009), mining of code changes (Palomba et al. 2015a), textual analysis
of source code (Palomba et al. 2016b), or search-based optimization techniques (Kessentini
et al. 2010; Sahin et al. 2014). On the other side, researchers investigated how relevant code
smells are for developers (Yamashita and Moonen 2013; Palomba et al. 2014), when and
why they are introduced (Tufano et al. 2015), how they evolve over time (Arcoverde et al.
2011; Chatzigeorgiou and Manakos 2010; Lozano et al. 2007; Ratiu et al. 2004; Tufano
et al. 2017), and whether they impact on software quality properties, such as program comprehensibility (Abbes et al. 2011), fault- and change-proneness (Khomh et al. 2012; Khomh
et al. 2009a; D’Ambros et al. 2010), and code maintainability (Yamashita and Moonen 2012,
2013; Deligiannis et al. 2004; Li and Shatnawi 2007; Sjoberg et al. 2013).
Similarly to some previous work (Khomh et al. 2012; Li and Shatnawi 2007; Olbrich
et al. 2010; Gatrell and Counsell 2015) this paper investigates the relationship existing
between the occurrence of code smells in software projects and software change- and faultproneness. Specifically, while previous work shows a significant correlation between smells
and code change/fault-proneness, the empirical evidence provided so far is still limited
because of:
–
–
Limited size of previous studies: the study by Khomh et al. (2012) was conducted on
four open source systems, while the study by D’Ambros et al. (2010) was performed
on seven systems. Furthermore, the studies by Li and Shatnawi (2007), Olbrich et al.
(2010), and Gatrell and Counsell (2015) were conducted considering the change history
of only one software project.
Detected smells vs. manually validated smells: Previous work studying the impact of
code smells on change- and fault-proneness, including the one by Khomh et al. (2012),
relied on data obtained from automatic smell detectors. Although such smell detectors
are often able to achieve a good level of accuracy, it is still possible that their intrinsic
imprecision affects the results of the study.
1190
–
–
–
–
Empir Software Eng (2018) 23:11 88 – 122 1
Lack of analysis of the magnitude of the observed phenomenon: previous work
indicated that some smells can be more harmful than others, but the analysis did not
take into account the magnitude of the observed phenomenon. For example, even if a
specific smell type may be considered harmful when analyzing its impact on maintainability, this may not be relevant in case the number of occurrences of such a smell type
in software projects is limited.
Lack of analysis of the magnitude of the effect: Previous work indicated that classes
affected by code smells have more chances to exhibit defects (or to undergo changes)
than other classes. However, no study has observed the magnitude of such changes
and defects, i.e., no study addressed the question: How many defects would exhibit on
average a class affected by a code smell as compared to another class affected by a
different kind of smell, or not affected by any smell at all?
Lack of within-artifact analysis: sometimes, a class has intrinsically a very high
change-proneness and/or fault-proneness, e.g., because it plays a core role in the system
or because it implements a very complex feature. Hence, the class may be intrinsically “smelly”. Instead, there may be classes that become smelly during their lifetime
because of maintenance activities (Tufano et al. 2017). Or else, classes where the smell
was removed, possibly because of refactoring activities (Bavota et al. 2015). For such
classes, it is of paramount importance to analyze the change- and fault-proneness of the
class during its evolution, in order to better relate the cause (presence of smell) with the
possible effect (change- or fault-proneness).
Lack of a temporal relation analysis between smell presence and fault introduction: While previous work correlated the presence of code smells with high fault- and
change-proneness, one may wonder whether the artifact was smelly when the fault was
introduced, or whether the fault was introduced before the class became smelly.
To cope with the aforementione (...truncated)