Optimal shortening of uniform covering arrays
December
Optimal shortening of uniform covering arrays
Jose TorresJimenez 2 3
Nelson RangelValdez 1 3
Himer AvilaGeorge 0 3
Oscar Carrizalez Turrubiates 3
0 Unidad de Transferencia TecnoloÂgica Tepic, CONACYTCICESE , Tepic, Nayarit 63173, MeÂxico, 4 SVAM International de MeÂxico. Cd. Victoria, Tamaulipas 87130, MeÂxico
1 Instituto TecnoloÂgico de Ciudad Madero, CONACYTTecNM. Cd. Madero , Tamaulipas 89440, MeÂxico
2 Information Technology Laboratory, CINVESTAVTamaulipas. Cd. Victoria , Tamaulipas 87130, MeÂxico
3 Editor: M. Sohel Rahman, Bangladesh University of Engineering and Technology , BANGLADESH
Software test suites based on the concept of interaction testing are very useful for testing software components in an economical way. Test suites of this kind may be created using mathematical objects called covering arrays. A covering array, denoted by CA(N; t, k, v), is an N × k array over Zv f0; . . . ; v 1g with the property that every N × t subarray covers all ttuples of Ztv at least once. Covering arrays can be used to test systems in which failures occur as a result of interactions among components or subsystems. They are often used in areas such as hardware Trojan detection, software testing, and network design. Because system testing is expensive, it is critical to reduce the amount of testing required. This paper addresses the Optimal Shortening of Covering ARrays (OSCAR) problem, an optimization problem whose objective is to construct, from an existing covering array matrix of uniform level, an array with dimensions of (N − δ) × (k − Δ) such that the number of missing ttuples is minimized. Two applications of the OSCAR problem are (a) to produce smaller covering arrays from larger ones and (b) to obtain quasicovering arrays (covering arrays in which the number of missing ttuples is small) to be used as input to a metaheuristic algorithm that produces covering arrays. In addition, it is proven that the OSCAR problem is NPcomplete, and twelve different algorithms are proposed to solve it. An experiment was performed on 62 problem instances, and the results demonstrate the effectiveness of solving the OSCAR problem to facilitate the construction of new covering arrays.

Data Availability Statement: All relevant data are
within the paper and its Supporting Information
files. The supporting data is additionally available
at: http://www.tamps.cinvestav.mx/*oc/OSCAR.
Funding: The authors acknowledge the GENERAL
COORDINATION OF INFORMATION AND
COMMUNICATIONS TECHNOLOGIES (CGSTIC) at
CINVESTAV for providing HPC resources on the
Hybrid Cluster Supercomputer ªXiuhcoatl,º that
have contributed to the research results reported.
The following projects have funded the research
reported in this paper: 238469  CONACYT
Introduction
Functionality tests during software development demand special attention, and they are
generally important for preventing malfunctions in software components. During the testing phase,
it is desirable to find all errors that could arise in a software component before it is delivered
to the user. If a software component has a large number of parameters, then testing it
exhaustively might be expensive because of the large number of configurations that can arise from the
different parameters' values; e.g., a software component with just 20 parameters of 2 different
MeÂtodos Exactos para Construir Covering Arrays
Optimos to JTJ; 2143  CaÂtedras CONACYT
Fortalecimiento de las capacidades de TICs en
Nayarit to HAG; and 148784  Fondo Mixto
CONACYT y Gobierno del Estado de Nayarit,
Unidad de Transferencia Tecnologica CICESE ±
Nayarit to HAG.
Competing interests: The authors have declared
that no competing interests exist.
values each would require 220 = 1,048,576 tests. An alternative is to test the system using a
small, randomly generated test suite, but in this case, there is no guarantee of the testing
coverage; instead, a better choice is to use a combinatorial testing approach that provides a coverage
guarantee for small test suites. This combinatorial testing approach (also called interaction
testing) guarantees the coverage of all interactions of a certain size among different values of
the input parameters of a software component. This approach is based on evidence presented
by [1] that many errors are produced by the interactions of only a few parameter values.
Specifically, the cited authors showed evidence that test suites with an interaction size of 6 are
sufficient to detect all known errors in a collection of different software components.
A uniform covering array (CA), denoted by CA(N;t, k, v), is a commonly used
structure in interaction testing. It is an array C with dimensions of N × k constructed over
Zv f0; . . . ; v 1g with the property that every N × t subarray covers all members of Ztv at
least once. The value of N is the number of rows of C, i.e., the number of test cases; k is the
number of columns or parameters; v is the number of values that each parameter can take;
and t is the degree of interaction among the parameters. Because there are kt sets of t
columns {c1,. . .,ct}, the number of different ttuples that must be covered at least once in C is
vt kt . When a specific ttuple is missing in a set of t columns (c1, . . ., ct), we refer to it as a
missing twise combination (or a missing combination, for short). Below, a CA(6; 2, 5, 2) in
which all 22 52 twise combinations are covered at least once is shown.
1
0
The covering array construction (CAC) problem is the search for the covering array
number (CAN), i.e., the minimum value N for which an array CA(N;t, k, v) still exists. Formally, the
CAN can be defined as CAN(t, k, v) = min{N9 CA(N; t, k, v)}.
In some theoretical studies, the following definition is adopted: CAN(t, k, v) = O(vt log k)
[2]. This definition is interesting because as the number of columns grows linearly, the number
of rows grows only logarithmically. This is an advantage of such combinatorial structures
because of the possibility of deriving small test suites. For instance, for a software component
with 126 binary parameters, exhaustive testing would require 2126 tests, whereas interaction
testing with strength 2 would require only 10 tests.
A complementary problem to the CAC problem is known as the test suite reduction
problem (TSRP), which consists of finding, for a given array, the smallest subset of rows that covers
all twise combinations [3]. The CAC problem is a special case of the TSRP in which the input
is an array that contains all vk distinct test cases.
For some special cases, there are algorithms that can solve the CAC problem in polynomial
time:
· when v = t = 2 [4],
· when v is a prime power and k
v + 1 [5], and
2 / 44
· when k = t + 1 [6].
However, the CAC problem remains highly combinatorial in most cases. Moreover, some
variants have been proven to be NPcomplete; e.g., the work presented in [2, 7] shows the
NPcompleteness of the problem of extending a matrix by one row with no fewer than m missing
twise combinations. The problem defined in the current work is also NPcomplete, as proven
in this paper.
Various methods have been developed to address the CAC problem. Exact methods solve
it to optimality; however, they usually require exponential time to achieve their goal [8±11].
As a result of this complexity, various approximate methods have been proposed as
alternatives, including recursive [4, 12, 13], algebraic [14, 15], greedy [16±20], and metaheuristic
approaches. This last category includes methods based on strategies such as genetic algorithms
[21], simulated annealing [22], and tabu search [23].
These approximate algorithms can be used to build nonoptimal CAs in a reasonable time;
some of these algorithms depend on the quality of their inputs to produce small CAs. Most of
the time, these inputs are based on matrices that are nearly CAs. The objective of the present
work is to construct matrices with sufficiently few missing combinations to still be considered
quasiCAs. Such arrays are created by solving the problem known as the Optimal Shortening
of Covering ARrays (OSCAR); related results were published in [24]. The OSCAR problem is
relevant to the construction of CAs because it can produce smaller CAs or excellent
initialization matrices for metaheuristic algorithms for constructing CAs. The main contributions of
this work are as follows. It formalizes three of the five algorithms presented in [24]. It also
presents seven new approximate strategies for solving the OSCAR problem. In addition, the
present work offers a complete analysis of the performance of all of the new and old algorithms,
something that has not been done before. Furthermore, it proposes three new benchmarks
with more than 800 OSCAR instances, which extend the range of study to matrices with
strengths of t = {2, 3, 4, 5}, whereas previous works have studied only t = 2; these benchmarks
are used as part of the experiments conducted to analyze the strategies. These experiments not
only evaluate how effectively the algorithms solve the OSCAR problem but also compare the
best of them against stateoftheart strategies. These experiments provide evidence that
solving the OSCAR problem using the proposed approaches enables the creation of quasiCAs
that are better than other reported initialization functions and even than the fast and versatile
IPOGF, a stateoftheart algorithm for constructing CAs; the main result is that the arrays
produced using the proposed algorithms have 90% fewer missing twise combinations than
those generated using the other approaches considered for comparison.
This paper is organized as follows. In the problem definition section, the OSCAR problem
is formally defined; its NPcompleteness is proven, and some of its applications are described.
In the related work section, some of the work related to initialization functions for
metaheuristics for CA construction is presented. Subsequently, the algorithms proposed in this work
for solving the OSCAR problem are presented. In the experimentation section, an experiment
performed to test the proposed algorithms for the construction of matrices with few missing
combinations is presented. Finally, in the conclusions section, final comments regarding this
work are provided.
Problem definition
Let A denote a CA(N; t, k, v) or a quasiCA(N; t, k, v) (a quasiCA is a matrix with a relatively
small number of missing tcombinations). Then, the OSCAR problem can be defined as
minft
BN0 k0 jB is a submatrix of Ag, where t
B is a function that counts the number of
missing twise combinations in the given array and N0 = N − δ and k0 = k − Δ are defined in terms
3 / 44
0 1 1 1 0
of two predefined integer values, 0 δ N − vt and 0 Δ k − t, which satisfy δ > 0 _ Δ > 0.
Hence, an OSCAR instance is specified by the elements
A; d; D.
The search space for an OSCAR instance consists of all submatrices B of the given matrix
A. Accordingly, the number of feasible solutions that form such a space can be estimated to be
N k , where NN d and k kD represent the numbers of different ways to choose subsets of
N d k D
rows and columns, respectively, from the original matrix A. Throughout the remainder of this
document, for a given submatrix B, we use JR to denote the subset of rows chosen from A
and JC to denote the subset of columns.
We present an example of a solution to the OSCAR instance specified by A CA
6; 2; 5; 2
(see Table 1), δ = 2, and Δ = 2, for which it is feasible to construct a solution B (see Table 2)
where t
B 0. The solution for this instance is obtained by eliminating JR f0; 2; 4; 5g
and JC f2; 3; 4g from A. Because t
B 0, the solution B is a CA(4; 2, 3, 2).
Alternatively, the matrix A can be represented by another matrix A0 with dimensions
of N kt . This matrix has the same number of rows as A and contains one column for
each subset of t columns derived from A. Each cell a0i;j 2 A0 contains a value from the set
{0, 1, . . ., vt − 1}; this value represents the ttuple covered by row i in the subset of t columns
associated with column j.
The OSCAR instance
A; d; D
CA
4; 2; 3; 2; 1; 0 is shown in Tables 3, 4 and 5. The
initial matrix A is shown in Table 3, the ttuples and sets of columns are shown in Table 4, and
the new matrix representation A0 is presented in Table 5 (twise combinations covered).
set of t columns
t1 = (c1, c2)
t2 = (c1, c3)
t3 = (c2, c3)
Finally, the tuple
A0; d; D is used to define an instance of the OSCAR problem, and the
NPcompleteness of the problem can be proven based on this new representation. The
remainder of this section is devoted to this proof.
The proof that the OSCAR problem is NPcomplete
To demonstrate the NPcompleteness of the OSCAR problem, it is necessary to show it to be
equivalent to a problem that is already known to be NPcomplete. For this purpose, this work
presents the transformation of the maximum cover (or MAXCOVER) problem (cf. [25] for a
review of this problem) into the OSCAR problem. For the proof, the previously defined
notation
A0; d; D for an OSCAR instance is extended to
A0; d; D; h, where the value h denotes
an integer that supports the following question: is there a subarray B of A0 with dimensions of
(N − δ) × (k − Δ) such that t
B h? This question transforms the OSCAR problem into its
decision form, which is required for this demonstration.
First, it is proven that the OSCAR problem is NP in nature. Let us begin with the case in
which Δ = 0, meaning that the matrix B is a subset of only the rows of A0. Clearly, the size of
the search space is reduced to N d
N . The claim that the problem is NP in nature holds because
computing the value of t
B would require time proportional to O N kt to examine all
possible twise combinations, which are equal in number to the number of columns of B. In
other words, the question of whether t
B h for the OSCAR problem can be answered in
polynomial time in the dimensions of B.
Now that it has been shown that the OSCAR problem is NP in nature, let us proceed with
the transformation of the NPcomplete MAXCOVER problem. The objective of the
MAXCOVER problem is to cover a given set Q fq1; q2; . . . ; qlg, regarded as the universe. To
achieve this goal, we must use a subset of Y fY1; Y2; . . . ; Ymg, where each subset Yi Q, for
all 1 i m, is given in advance and has a size of at most C. This problem can be
characterized by the tuple
Q; Y; C and can be transformed into an OSCAR instance
A0; d; D; l as
follows: a) The matrix A0 is constructed, with m + 1 rows and l + max{Yi} + 1 columns. b) For
1 i m and 1 j l, the value a0i;j of each cell is 1 if subset Yi covers element qj or 0 other
wise. c) For 1 i m and j > l, the value a0i;j of each cell is 0. d) For i = m + 1, the value a0i;j of
each cell is 0 if 1 j l or 1 otherwise. e) The values of δ, Δ, and h are set to m − C, 0, and 0,
respectively. The matrix A0 can be constructed in a time of O(lm), and the derived OSCAR
instance is denoted by
A0; d; 0; 0.
Table 6 shows an example of the transformation of the MAXCOVER problem into the
OSCAR1 problem. The following elements are used in this case:
· Q fq1; q2; q3; q4; q5g
· Y fY1 fq1; q2; q5g
· Y2 = {q2, q4, q5}, Y3 = {q1, q4, q5}, Y4 = {q1, q2, q3}, Y5 = {q2, q3, q4}
· C = 3
The array A0 has dimensions of 6 × 9, and the δ is equal to 5 − 3 = 2.
Finally, to complete the proof that the OSCAR problem is NPcomplete, we demonstrate
that the OSCAR instance
A0; d; 0; 0 built from the MAXCOVER instance
Q; Y; C has a
solution if and only if the latter has a solution. For this purpose, we start by showing that an
optimal solution for
A0; d; 0; 0 must include row Y6 of A0. This fact can be easily proven
since all ttuples must be covered in
A0; d; 0; 0 and those with the value 1 in any column j > l
can only be covered by row Ym+1.
The next step is to show that there is a solution with C subsets for the MAXCOVER instance
iff there is a matrix with C + 1 rows that solves
A0; d; 0; 0. This condition can also be easily
proven. We first note that the ttuple with value 0 is covered for any column j l by the row
Ym+1. The same tuple is also covered for any column j > l by any row from {Y1,. . .,Ym}. With
this information, the only ttuples that remain uncovered are those with value 1 in any column
j l. Given that during the construction of the OSCAR instance, a ttuple with value 1 is
assigned only to those rows in columns j l that are associated with a subset of Y, the
following claim is valid: any subset of Y that is formed of C elements and represents a solution for
the MAXCOVER instance can also be transformed into a solution for the OSCAR instance.
This claim is justified since the associated rows with the chosen C elements cover all ttuples
for any column but those with value 0 in columns j > l. Then, it is necessary only to add row
m + 1 to cover the missing ttuples. It is also true that a solution with C + 1 rows for the
OSCAR instance is a valid solution for the equivalent MAXCOVER instance, since it is
necessary only to choose those subsets of Y associated with the rows selected in the solution for the
OSCAR instance. Finally, if one of these instances has no solution, then neither does the other;
this claim holds because of the equivalence between such solutions, which has already been
shown. Hence, it is demonstrated that a solution to the MAXCOVER problem implies a
solution to the OSCAR problem.
6 / 44
Finally, any instance of the OSCAR problem for the case of Δ > 0 is equivalent to k kD
instances of the problem with Δ = 0. Since it has been proven that instances of this special case
are NPcomplete, then the general case of the OSCAR problem is at least as complex.
Applications of the OSCAR problem
Methods of solving the OSCAR problem have the following applications: a) they can reduce
the search space in the CAC problem; b) they can directly construct CAs, when there are no
twise combinations missing in the matrices they generate; c) they can be used as initializing
functions for metaheuristics for CA construction; d) they can aid in the identification of
better upper bounds for CA matrices; and e) they can be used for finetuning in experimental
design. Each of these applications is detailed in the remainder of this section.
The OSCAR problem successfully yields a quasiCA that has zero a small number of
missing twise combinations. Such a situation is convenient since instead of searching for a
CA(N + δ; t, k + Δ, v) in a feasible region with a size of O vk , corresponding to the original
N
domain, it may be possible to construct such a CA from a relaxed region of a smaller size,
Nd kDD .
d
The second and third applications of the OSCAR problem are related to the construction
of CAs. The OSCAR problem enables the direct construction of CAs when t
B 0, i.e.,
when B is a CA. Additionally, whenever the matrix constructed as a solution to an OSCAR
instance is not a CA (i.e., the number of missing ttuples is greater than zero), this solution
can still be used indirectly for CA construction because it can serve as the initial solution for
metaheuristic algorithms. Note that the performance of a metaheuristic for constructing
CAs depends on the quality of the initial matrix. Hence, the subarray obtained as a solution
to the OSCAR problem is adequate for this purpose because it has only a few missing twise
combinations; this is in contrast to arrays of the same size constructed using random
initialization functions, which are likely to be missing a large number of the possible twise
combinations due to their random nature. Some of the existing metaheuristic algorithms
designed for CA construction, which show dependence on the initial matrix, are reported
in [21, 23, 26, 27]. It is in algorithms of this type that the OSCAR problem finds its main
area of application, namely, the generation of initial matrices with few missing twise
combinations.
The fourth application of the OSCAR problem is the identification of new upper bounds
for CA matrices. Many such upper bounds have been reported in the literature. For example,
the best upper bounds for some CAs can be found in the repositories of [28, 29]. In addition,
some bounds on CAN(t, k, v) can be found in [29]; however, the corresponding CAs have
values of N that are far from optimal.
Because of the hardness of the CAC problem, the value of CAN(t, k, v) for any arbitrary set
of values of t, k, and v is generally unknown. However, suitable new upper bounds can be
obtained from existing matrices; e.g., between CA(174; 2, 110, 9) and CA(177; 2, 117, 9), the
upper bounds on the required numbers of columns for the cases of N = 175 and N = 176 are
unknown, but it can be inferred that they should be between 111, . . ., 116. Because most of
these upper bounds have not been shown to be optimal, the question arises as to whether other
upper bounds can be found. We conclude that inputs derived by solving the OSCAR problem
can be used to test potential upper bounds in order to find new bounds for CA(N; t, k, v); this
can be achieved through the proper selection of the values δ and Δ used to reduce the matrix
size.
Some specific cases of the values of δ and Δ are as follows:
7 / 44
1. When δ > 0 and Δ = 0, i.e., only the number of rows is to be reduced, the rows that are
selected to be discarded are those whose elimination results in the minimum number of
missing combinations in the final array.
2. When δ = 0 and Δ > 0, i.e., only the columns of columns is to be reduced, the columns that
are selected to be discarded are similarly those whose elimination results in the minimum
number of missing twise combinations. However, this case makes sense only when the
array A is not a CA.
Finally, another application of solutions to the OSCAR problem is their direct use in testing
scenarios. Through the careful selection of the OSCAR problem parameters δ and Δ, it is
possible to ensure that resulting subarray has the desired numbers of rows (i.e., test cases) and
columns (i.e., parameters) to produce a quasiCA (with 90±100% coverage of the ttuples) that
provides the required level of assurance.
The proposed methodology for the construction of CAs consists of generating an initial
solution for a metaheuristic algorithm by solving an instance of the OSCAR problem; i.e., the
OSCAR problem is solved to obtain the solution B, which is then used as the initial array in a
metaheuristic algorithm.
Related work
The construction of CAs is a highly combinatorial problem that can benefit from the use of
approximate algorithms to construct CAs of a desired size within a reasonable amount of time.
Many researchers, instead of directing their efforts toward finding CAs with the minimum
number of rows using an exact approach, have designed approximate algorithms to improve
the best known upper bound for CAs and then reduce the gap between that bound and the
CAN. These CA construction algorithms can be classified, in accordance with their
characteristics, into the following types: (a) algebraic approaches, (b) exact approaches, (c) greedy
approaches, (d) transformations, and (e) metaheuristic approaches.
Algebraic methods have the characteristic that the CA construction process involves
formulas or operations using mathematical objects such as vectors, finite fields, groups, and
CAs with small values of t, k, and v. Some algebraic methods yield optimal constructions,
including the CA(N; 2, k, 2) methods of [30] and [31]; Bush's construction method for
CA(N; t, q + 1, q), where q is a prime or a prime power and q t (cf. [5]); and the zerosum
method of [6], which yields an optimal CA(t, t + 1, v) for any t 2. The main feature of these
approaches is that most of them require small CAs or quasiCAs from which to construct
larger CAs.
Exact methods are exhaustive approaches for the construction of optimal CAs. Although
some approaches include techniques for accelerating the search process, they generally require
exponential time to complete their task, making them practical only for the construction of
small optimal CAs. This category includes branchandbound (B&B) strategies, such as the
work proposed by [10], which incorporates symmetrybreaking techniques, partial twise
verification and fixed blocks in the bounding process, and the work of [8], which, for the
generation of a nonisomorphic CA(N; 2, k, 2), uses a pruning strategy based on bounds defined by
the minimum ranks established in terms of the CA size.
Greedy strategies are commonly used for combinations of the parameters N, t, k, and v for
which exact methods are impractical, with the basic purpose of producing a good solution in a
short time. The majority of commercial and opensource tools for generating test data
(including AETG [32], TCG [17], ACTS [33], IPOGF [19], and DDA [34]) use greedy algorithms for
CA construction.
8 / 44
Transformations generally exploit the structure of existing CAs either to make them smaller
or to support other approaches, e.g., algebraic approaches, in creating smaller CAs. This task is
usually performed in one of two ways: a) through the identification of redundancy or b)
through the construction of submatrices. Redundancy in a CA can be identified through the
permutation of rows or columns or through the changing of symbols (cf. [35], [36] and [37]).
However, approaches based on the construction of submatrices provide a better basis for new
CAs, and the present work can be considered to be of this type.
Finally, similar to greedy methods, metaheuristic approaches are strategies that are not
guaranteed to find a CA with the minimum number of rows. In practice, metaheuristic
methods yield very good results, but they consume more CPU time than greedy algorithms. Some
metaheuristics that have been used to solve the CAC problem include simulated annealing
(SA) [22], tabu search (TS) [38], memetic algorithms (MAs) [27], and genetic algorithms
(GAs) [21].
For all of the strategies described above, the main goal is the construction of CAs, i.e.,
matrices with zero missing twise combinations. However, the CA construction performance of
algebraic and metaheuristic approaches is improved when the initial matrices are quasiCAs,
i.e., when they are missing only a small number of the possible twise combinations. This
situation raises the question of how an initial matrix should be constructed for these approaches.
The answer is to use initialization functions. Hence, these initialization functions are a key
element of the development of metaheuristics for CA construction.
The main initialization functions used in stateoftheart methods are as follows: a) random
matrix initialization [21, 23, 26, 27], b) initialization with a balanced number of symbols per
column [27], c) initialization through row augmentation [39], d) initialization based on
submatrices [40], and e) initialization based on greedy strategies [41, 42]. The four first strategies
do not consider the number of missing twise combinations in the construction of the initial
matrix. Strategies of the last type can be used to build CAs, but they are typically larger than
the required matrix size; this situation results in random discarding of rows and/or columns
that is also not optimized in terms of the number of missing twise combinations. Hence, an
alternative is to use an existing matrix of greater size and optimize the row/column reduction
process until a matrix of the required size is obtained. This optimization is exactly equivalent
to solving the OSCAR problem, and this work proposes a wide variety of new metaheuristic
and hybrid strategies for this purpose.
In summary, whereas CA construction approaches (e.g., exact, greedy, algebraic,
metaheuristic and transformation methods) produce matrices with no missing twise combinations,
the strategies presented in this work solve the OSCAR problem to generate quasiCAs.
QuasiCAs are important because they can be used as initial matrices for CA construction strategies
based on algebraic and metaheuristic methods and can thus improve the performance of
these methods in the construction of new CAs.
The remainder of this section provides a more detailed introduction to some of the relevant
initialization functions found in the scientific literature related to this topic. These four
initialization functions will be denoted by I 1, I 2, I 3, and I 4 in this paper; Fig 1 shows an example of
each one initialization function.
Each of the four initialization functions creates an array C with N rows and k columns, in
which each cell is initialized with a symbol of the given alphabet {0, 1, . . ., v − 1} of v symbols.
The function I 1 is presented in [21, 23, 26, 27]; this function initializes each cell cij of CN k with
a symbol drawn at random from the set {0, 1, . . .,v − 1}. Fig 1 (random) shows an example of
the use of I 1 to initialize a matrix C10 4.
The function I 2 initializes CN k with a balanced number of randomly generated symbols
per column. Each column ki, where 1 i k, will contain an almost uniform distribution of
9 / 44
Fig 1. Example of the Hamming distances between the two rows r1 and r2 that are already in the matrix C and the two candidate
rows d1 and d2.
the symbols {0, 1, . . ., v − 1}. To achieve such uniformity, a symbol is generated at random for
each of the N rows of column ki, but during the random generation process, it is ensured
that the first R1 v N bNv cv symbols appear bNv c times and that the remaining
R2 N bNv cv symbols appear dNv e times. For example, in a 10 × 4 matrix C with an alphabet
size of v = 3, each of the four columns contains R1 3 10 b130c3 2 symbols that
appear b130c 3 times and R2 10 b130c3 1 symbol that appears d130e 4 times; this
situation is exemplified in Fig 1 (balanced). The use of I 2 guarantees that each column has a
balance in the cardinalities of each symbol, something that cannot be guaranteed when using I 1.
The function I 2 is a generalization of the initialization function presented in [27] for solving
the binary CAC problem using an SA approach.
The function I 3 initializes CN k one row at a time. This function generates the first row r1 at
random; i.e., each of its cells will contain a symbol randomly chosen from {0, 1, . . .,v − 1}.
Subsequently, each new row is selected from a set of two random candidate rows d1 and d2 and is
added to C. The chosen candidate row is the one that maximizes the Hamming distance with
respect to all rows rs that already exist in C. The Hamming distance between two rows is equal
to the number of positions at which the corresponding symbols are different; correspondingly,
the Hamming distance between a candidate row dj and all rows already in C is equal to the
number of positions l in each row rs that differ from the corresponding positions in dj,
summed over all existing rows rs. Formally, this latter definition can be expressed as
g
dj; C Pis10 Plk01 h
rs;l; dj;l, where i is the number of rows already added to C and
h(rs,l, dj,l) = 1 if rs,l 6 dj,l or 0 otherwise. This process is repeated until all N rows have been
created. This initialization function has been used previously in [39].
An example of the selection of a row as defined in I 3 is shown in Fig 2; the matrix C already
contains 2 rows, and the third row will be the candidate d1 because it maximizes the value of
g
dj; C. Fig 1 (Hamming) shows the full initial matrix.
Finally, the function I 4 initializes CN k based on groups of t columns. This function is based
on the subarray C0 CA
vt; t; t; v, which is constructed using the vt combinations of symbols
derived from an alphabet of size v and a strength value of t; e.g., C0 CA
32; 2; 2; 3 will be
formed of the elements in the set {00, 01, 02, 10, 11, 12, 20, 21, 22}, where each element
represents a row in C0. The function I 4 is performed in two steps. In the first step, C0 is used to
define the symbols in the first t columns of the matrix C. During this process, juxtaposition of
C0 is applied to complete the N rows of C; specifically, C0 is juxtaposed bvNtcvt times, and the
remaining N bvNtcvt rows of C are filled with the first rows of C0. In the second step, the first t
columns of C are copied into the next subset of t columns whose symbols have not yet been
defined, and the values are changed in some pairs of rows; these changes are executed by
randomly choosing dNe pairs of rows and, for each pair, exchanging the values of those columns
2
in each row. This step is repeated until all k columns of C have been defined. If the number of
columns in the last subset (t0) is smaller than t, then only the first t0 columns of C0 are used.
10 / 44
Fig 2. Initialization functions. (a) I1 results in 20 missing combinations. (b) I2 results in 18 missing combinations. (c) I3
results in 15 missing combinations. (d) I4 results in 7 missing combinations.
This function is a generalization of the last initialization function presented in [40]. An
example of this initialization method is shown in Fig 1 (tgroups).
Algorithms for solving the OSCAR problem
This paper has formally defined the OSCAR problem and has proven that it is NPcomplete.
Now, various strategies are proposed for solving this problem. This section is devoted to this
purpose; throughout the remainder of the section, each proposed approach is described in
detail.
Given that a solution to a specific instance of the OSCAR problem is defined by two sets,
JR and JC, and considering that each of these two sets can be selected using one of three
different approaches (exact (E), greedy (G), or metaheuristic (M)), it is possible to define 9 basic
algorithms, as shown in Table 7. The superindices for the EE and GG options indicate the
number of variants that have been defined. For the EE approach, the two corresponding
algorithms are denoted by EECR (first the number of columns is reduced, then the number of
rows) and EERC (first the number of rows is reduced, then the number of columns). Three
variants have been defined for the GG approach; these variants are denoted by GGCR (first the
number of columns is reduced, then the number of rows), GGRC (first the number of rows is
reduced, then the number of columns), and GGR (the numbers of columns and rows are
C
reduced in an alternating fashion). Thus, we ultimately present a total of 12 possible algorithms
for solving the OSCAR problem.
We first describe the reduction of the numbers of rows and columns of the initial matrix A
using the greedy approach. Afterward, the three greedy algorithms GGRC, GGCR and GGR are
C
C
G
EG
GG3
MG
M
EM
GM
MM
11 / 44
Fig 3. OSCAR instance. Problem instance specified by the matrix A CA
6; 2; 5; 2 and the values Δ = 1
and δ = 2. Related information, twise combinations, ttuples, and P matrix.
defined. Next, we introduce the exact algorithms EERC and EECR for solving the OSCAR
problem by exploring the entire search space, which has a size of NN d k kD ; these algorithms are
based on a B&B approach [43]. Next, the metaheuristic algorithm for solving the OSCAR
problem is presented; this algorithm is based on the SA approach. Finally, the six hybrid
algorithms GE, EG, GM, MG, ME and EM for solving the OSCAR problem are defined.
Greedy algorithms GGCR, GGRC, and GGR for solving the OSCAR problem
C
The proposed greedy algorithms are based on two functions (F R and F C) that reduce the
number of rows or columns one element (row or column) at a time, starting from the array A,
while considering the number of missing twise combinations after the reduction process.
Examples of all of the greedy strategies proposed in this section are presented based on the
OSCAR instance shown in Fig 3. It presents the problem instance specified by the matrix
A CA
6; 2; 5; 2 and the values Δ = 1 and δ = 2. It shows each combination of columns, or
each ttuple, that is derived from A and all of the possible twise combinations of symbols that
could be found in each of them; it also shows the sets AR and AC of rows and columns,
respectively. Besides, it shows the auxiliary structure P, which is used to store the number of times that
each twise combination is covered in each ttuple; this structure P is a matrix of vt = 22 rows
and kt 5 10 columns, in which each cell pi,j contains the number of times that the ith
t2
wise combination of symbols appears in A in the subset of columns defined by the jth ttuple.
Greedy approach for reducing the number of rows. The greedy function that reduces
the number of rows is denoted by F R, and it is defined below. Let AR fr1; r2; :::; rN g be the
set of rows of A, and let O fo1; o2; . . . ; oN g be a vector in which each element is associated
with a row ri and has a value equal to the number of twise combinations that are exclusively
covered by the associated row. The function F R selects the row {rii = minj{oj}} to be discarded;
ties are broken randomly.
The function F R uses the vector O that describes the initial array A to choose a row ri to be
discarded such that the value oi is minimized. Discarding that row from A results in an array
12 / 44
A0 such that t
A0 t
A oi, since once row ri is discarded, the twise combinations that
were covered exclusively by row ri are no longer covered in A0. Therefore, the resulting array
A0 without row ri will be missing the minimum possible number of twise combinations
because oi has the minimum value among the elements of O. When oi = 0, row ri is clearly
superfluous, since it does not cover any twise combinations exclusively.
Every time that a row ri is discarded in the reduction process, the number of rows that
cover each of the twise combinations covered by the discarded row ri must be decreased by
one. Whenever a twise combination is then covered by only a single remaining row j, the
value of oj must be increased by one. We update the vector O in this way.
The time required to initially populate O for the function F R is O N vt vt N2 kt
(in all algorithms with a greedy component, this process is called getN()), since it is
necessary to explore all rows per set of t columns, to determine the number of times that each
ttuple is covered, and to confirm the twise combinations that are covered by only one row.
The time required to discard a row and update O is O N
N 1 kt , since it is necessary
to first explore the vector O and then, for each set of t columns, verify the number of times
that each ttuple is covered in at most N − 1 rows.
Tables 8, 9 and 10 illustrate the application of getN
A and F R
O to the matrix A
defined in Fig 3. The vector O shown in Table 8 is the result of the call to getN
A. Each
element of this vector has a value equal to the number of unique twise combinations covered by
the corresponding row; e.g., the value o1 = 2 implies that row r1 contains two twise
combinations that are exclusively covered by this row (these are the symbol combinations 00 and 10
corresponding to the ttuples (c2, c4) and (c3, c5), respectively). Now, a call to F R
O will result
in an arbitrary selection from among the rows {r1, r2, r5, r6}; let us assume that r2 is chosen. The
elimination of this row will produce the new vector O shown in Table 10. To illustrate the
update operation of F R, Table 9 shows how the auxiliary structure P is modified in accordance
with the twise combinations that are eliminated with the deletion of row r2; note that there are
8 new twise combinations that are now uniquely covered in the remaining rows. In the new
(c1, c5)
2
2 ! 1
1
1
(c2, c3)
1
2
2 ! 1
1
(c2, c4)
1
2
2
1 ! 0
(c1, c4)
2
2 ! 1
1
1
O
(c2, c5)
2
1
1
2 ! 1
o4
7
13 / 44
vector O, the value of the element corresponding to each of these rows is incremented by the
number of twise combinations in that row for which the corresponding value in P has been
changed to 1 after the elimination of row r2. For example, the twise combinations 01, 00 and
10 associated with ttuples (c1, c2), (c1, c3), and (c2, c3) are newly exclusively covered by row r5
after the removal of row r2; consequently, o5 is increased from 2 to 5 in the new vector O.
Greedy approach for reducing the number of columns. The function that reduces the
number of columns using the greedy approach is denoted by F C and is defined below. Let AC
be the set of columns of A; let K, with dimensions of k × k, be an array in which each element
ki,j stores the number of times that columns i and j together are involved in a missing twise
combination; and let U fu1; u2; . . . ; ukg be a vector in which ui Pjk1 ki;j. The function F C
selects the column {cii = maxj{uj}} to be discarded; ties are broken randomly. Whenever a
column i is discarded, the vector U is updated by subtracting the value ki,j from uj for all j 6 i.
Each element in U is associated with a column, and its value is equal to the number of times
that column is involved in a missing twise combination.
In summary, the function F C chooses a column i associated with the maximum value ui in
P
the vector U. When discarding column i, we obtain an array A0 such that t
A0 jk01 uj,
since once column i has been discarded, the associated missing combinations involving
column i are deleted. Therefore, the resulting array A0 will have the minimum number of missing
twise combinations, since ui has the greatest value among the elements of U.
When we discard a column, the values of the elements of U must be updated. To do so, a
value of −1 is assigned to ui, and the value of each element uj such that j 6 i is updated based
on its interaction with the recently discarded column; i.e., uj = uj − ki,j. This process is
intuitively illustrated as follows. Suppose that we have a set of n criminals who are accused of having
committed m crimes together, and suppose that the authorities have found that a certain
criminal s is the only one who committed l of these crimes, where l m; then, the number of crimes
of which each of the remaining criminals is accused must be decreased in accordance with his
initially suspected degree of participation in committing crimes with criminal s.
The time required to initially populate U and K for the function F C is O N 2t t kt
(in all algorithms with a greedy component, this process is called getK()), since for each set
of t columns, all N rows must be explored, the vector G must then be updated based on the
missing twise combinations in these columns, and K must be updated for all possible pairs in
this set of t columns. The time required to discard a column and update U and K accordingly
is O(2k), since the vector U must be explored to obtain the column i with the greatest value,
and column i of K must then be explored to update U.
Tables 11 and 12 illustrate the application of getk
A and F C
K; k; U to the matrix A
defined in Fig 3. The matrix K and the vector U shown in Table 11 are the results of the call to
getk
A; given that the initial matrix A is a CA, all values in K and U are zero because there
Fig 4. Example of GGCR. (a) Discarding columns. (b) Discarding rows.
O derived from the previous matrix A0, the set JR of rows chosen so far, the updated vector
Onew obtained after the elimination of the row r selected in that iteration, and the resulting
matrix A0 after that iteration. This second loop is repeated twice because δ = 2. The last matrix
A0 obtained in the second part of GGCR is returned as the final matrix B.
Greedy algorithm GGRC. The greedy algorithm GGRC first removes δ rows from A using
the function F R to obtain an array A0 with N − δ rows and k columns. Then, A0 is reduced to a
matrix with k − Δ columns using the function F C to obtain the final solution B. Algorithm 2
describes the GGRC approach.
Algorithm 2
1: function GGRC
A; d; D
2: JR ;
3: JC ;
4: GETN
A
5: for i < δ do
6: r F R
O
7: JR JR r
8: end for
9: A0 JR
10: GETK
A0AR
11: for i < Δ do
12: c F C
K; k; U
13: JC JC c
14: end for
1165:: BA0 AA0 C JC
17: return
c; B
18: end function
16 / 44
The time required to execute GGRC can be calculated from the times required for populating
and updating the necessary structures. The result is
O N vt vt N2 kt d N N 1 kt N 2t kt D
2k, since N − δ rows are
first discarded from the input array A, generating an array A0 with N − δ rows and k columns,
and this array is then reduced to one with k − Δ columns to obtain the solution B.
Fig 5 shows an example of the application of GGRC to the problem instance presented in Fig
3. This table illustrates how the initial matrix A evolves into the final matrix B. First, Fig 5(a)
presents the changes made to A due to the elimination of rows (see the loop in lines 5 to 8); for
each iteration i of this loop, the table presents the initial vector O, the set JR of rows chosen so
far, the vector Onew obtained by updating O after the elimination of the row r selected in that
iteration, and the resulting matrix A0 after that iteration. This part of the algorithm is repeated
twice because δ = 2. Subsequently, Fig 5(b) presents the changes made to the last matrix A0
obtained in the previous process due to the elimination of columns (see the loop in lines 11 to
14); for each iteration i of this loop, the table presents the vector U derived from the last matrix
A0, the set JC of columns chosen so far, the updated vector Unew after the elimination of the col
umn c selected in that iteration, and the resulting matrix A0 after that iteration. This second
loop is executed only once because Δ = 1. The last matrix A0 obtained in the second part of
GGRC is returned as the final matrix B.
Greedy algorithm GGR . The greedy algorithm GGR distributes the elimination of Δ
col
C C
umns and δ rows in a roundrobin fashion. This algorithm alternately discards first a single
row and then some number of columns until the number of rows has been reduced to N − δ.
This algorithm uses a vector D with δ elements, where each element di, corresponding to the
ith discarded row, indicates the number of columns that should be discarded immediately after
discarding that row. When Δ > δ, the first δ − 1 elements of D are each filled with a value of
bDdc, and the last one is filled with a value of dDde. When δ Δ, the first Δ elements of D are each
Fig 5. Example of GGRC. (a) Discarding rows. (b) Discarding columns.
17 / 44
Fig 6. Example of GGCR . Column 1 shows the main structures used throughout the algorithm, and each of the
remaining columns represents a different iteration of the main algorithm.
filled with a value of one. Once the Δ columns have been distributed among the δ rows, one
row is discarded, and then, the number of columns is reduced to k − di. Hence, with the
exploration of each element i of the vector D, the numbers of rows and columns of the array A0 will
P
be decreased to Ni = N − i and ki k ji11 dj, respectively. Algorithm 3 describes the GGR
C
approach.
The time required to execute GGR can be obtained by considering how the numbers
C
of columns and rows of A0 will be reduced while exploring the vector D; the result is
O
Pid1 Ni vt vt kti kti
Ni Ni 1 kti Ni 2t t kti di
2
ki. As each
element i of D is explored, first, the necessary structures are populated to discard rows from
the array A0, which has N − i + 1 rows and k Pji11 dj columns, and the number of rows is
reduced to N − 1. Next, it is necessary to populate the structures needed to discard columns
from the new array A0 with the reduced number of rows, and then, the number of columns is
reduced to k − di.
Fig 6 shows an example of the application of GGR to the problem instance presented in Fig
C
3. Because Δ is not greater than δ in this instance, the number of columns that must be
eliminated with the elimination of each row is given by the vector D fd1 1; d2 0g; i.e., after
the deletion of the first row, one column must be deleted, and then the algorithm proceeds to
the deletion of the second row to satisfy the value δ = 2. The example shown in Fig 6 illustrates
the reduction process for the instance given in Fig 3. The first column lists the main structures
that are changed during the execution of the algorithm. Each of the remaining columns in Fig
6 represents a different iteration of the main loop of the algorithm.
18 / 44
0 to i < Δ do
1
Algorithm 3
1: function GGCR
A; d; D
JR ;
JC ;
if Δ > 0 then
if Δ > δ then
for i 0 to i < δ − 1 do
di bDdc
end for
dd 1 dDde
else
for i
0 to j < di do
F C
K; k; U
A0C c
Metaheuristic algorithm MM
The approximate algorithm MM for searching for a solution to the OSCAR problem is based
on the SA approach and is described in Algorithm 4. This approach is a generalpurpose
stochastic optimization strategy that has been proven to be an efficient means of approximating
global optimal solutions to many NPcomplete combinatorial optimization problems. In this
strategy, a solution Wu is first constructed using the Initialize(. . .) method, and this
solution is designated as the first global best solution W ; then, the algorithm enters an
iterative improvement process, controlled by the length of the Markov chain, until a certain
termination criterion is achieved. In each iteration of this improvement process, a new solution Wv
is generated using the GenerateNeighbor(.. .) method, and this new solution is
substituted for Wu whenever its quality is superior to that of the current solution or the probability
condition is satisfied. The probability condition is based on the Boltzmann distribution, and it
is defined with respect to the values of an initial temperature T i, a final temperature T f , and a
quality function τ(. . .) of Wu and Wv. The global best W is also updated every time a solution
Wv improves upon it. The details of these procedures are presented in the remainder of this
subsection.
19 / 44
algorithm proceeds in two phases. First, it chooses a set of δ rows and removes them from the
initial matrix A; the resulting matrix is denoted by A0 and has dimensions of (N − δ) × k.
Subsequently, the algorithm greedily discards Δ columns from A0 to construct a possible solution
B0, i.e., a matrix with dimensions of (N − δ) × (k − Δ), for the OSCAR instance at hand. To
obtain the best solution B, the algorithm explores all possible combinations of δ rows and
identifies the best matrix B from among all matrices B0 constructed during the process
described above.
The algorithm EG for solving the OSCAR problem is described in Algorithm 8. Each
combination of N − δ rows is represented by the vector JR. Each new combination of N − δ rows is
computed by the function GREATERTHANPOLYNOMIAL(). An array A0, with dimensions of
(N − δ) × k, is constructed using the rows indicated by JR. Then, the algorithm populates the
necessary structures to greedily reduce A0 to a matrix with k − Δ columns, and this reduction
process yields an array B0. The best solution that has been found so far during the exploration
process is represented by B. Whenever t
B0 < t
B for a newly constructed matrix B0, the
matrix B is replaced with B0.
Algorithm 8
1: function EG
A; d; D
2: for i 0 to i < NN d do
34:: JAR0 AGREJARTcERTHANPOLYNOMIAL
JR
5: GETK
A0
6: JC ;
7: for j 0 to j < Δ do
8: c F C
G; K
9: JCc JCc c
10: end for
11: B0 AC0 JCc
12: if t
B0 < t
B then
13: B B0
14: t
B t
B0
15: end if
16: end for
17: return
B
18: end function
The time required to execute EG can be derived from the times required for populating the
necessary structures, reducing the number of columns, and updating the necessary values.
This time is proportional to O NN d N 2t t kt D
2k .
Hybrid algorithm GM. The algorithm GM uses a hybrid strategy that combines the SA
metaheuristic [45] with the greedy approach to construct a solution to the OSCAR problem.
In each iteration of GM, a local search is performed over the possible set of columns that can
be eliminated to obtain a matrix A0 with dimensions of N × (k − Δ). Afterward, the matrix A0
is subjected to a greedy process to reduce its size by δ rows and thus to construct a solution B
with dimensions of (N − δ) × (k − Δ) for the OSCAR instance at hand. Once the matrix B has
been built, the Boltzmann criterion is used as usual in SA. The details of the strategy are
presented in the remainder of this subsection.
Algorithm 9 describes the proposed GM approach for solving the OSCAR problem. The
algorithm GM uses a vector Wu of size k to represent the state of each column in the solution
B. The elements of the vector take values of wi 2 {0, 1} for 1 i k, where a value of 0
indicates that the corresponding column is not present in the solution and a value of 1 indicates
otherwise. In addition, two constraints are imposed to obtain a proper OSCAR solution: there
23 / 44
et
WvTit
Wu
20: else if RANDOM(0 . . . 1) then
21: Wu Wv
22: end if
23: end for
24: Ti a Ti
25: end while
26: B CONSTRUCTMATRIX
W0
27: return
B
28: end function
The algorithm GM uses a set of perturbations to the vector Wu as its neighborhood
function. For this purpose, it chooses two elements wi and wj, where wi 6 wj, and interchanges
their values. The new solution formed via this perturbation, which is a neighbor of Wu, is
denoted by Wv.
Finally, the evaluation function used in GM is τ, the number of missing combinations in a
created matrix. This function is also used to evaluate the matrices created during the local
search.
The time required to execute GM is proportional to O(iLFR), where i log a Tf log a Ti is
log a a
the number of temperature decrements necessary to reach T f and FR is the time cost of the
greedy approach for eliminating rows.
Hybrid algorithm MG. The algorithm MG uses another hybrid strategy that combines
the SA metaheuristic [45] with the greedy approach to construct a solution to the OSCAR
problem. In each iteration of MG, a local search is performed over the possible set of rows that
can be eliminated to obtain a matrix A0 with dimensions of (N − δ) × k. Afterward, the matrix
A0 is subjected to a greedy process to reduce its size by Δ columns and thus to construct a
solution B with dimensions of (N − δ) × (k − Δ) for the OSCAR instance at hand. Once the matrix
B has been built, the Boltzmann criterion is used as usual in SA. The details of the strategy are
presented in the remainder of this subsection.
Algorithm 10 describes the proposed MG approach for solving the OSCAR problem. The
algorithm MG uses a vector Wu of size N to represent the state of each row in the solution B.
24 / 44
et
WvTit
Wu
20: else if RANDOM(0 . . . 1) then
21: Wu Wv
22: end if
23: end for
24: Ti a Ti
25: end while
26: B CONSTRUCTMATRIX
W0
27: return
B
28: end function
The algorithm MG uses a set of perturbations to the vector Wu as its neighborhood
function. For this purpose, it chooses two elements wi and wj, where wi 6 wj, and interchanges
their values. The new solution formed via this perturbation, which is a neighbor of Wu, is
denoted by Wv.
Finally, the evaluation function used in GM is τ, the number of missing combinations in a
created matrix. This function is also used to evaluate the matrices created during the local
search.
The time required to execute MG is O(iLFC), where i log a Tf log a Ti is the number of
log a a
temperature decrements necessary to reach T f and FC is the time cost of the greedy approach
for eliminating columns.
Hybrid algorithm ME. The algorithm ME combines the metaheuristic and exact
approaches to solve the OSCAR problem, using a strategy based on the exploration of all
possible combinations of Δ columns that can be eliminated from the original matrix. The
algorithm proceeds in two phases. First, it chooses a set of Δ columns and removes them from the
initial matrix A; the resulting matrix is denoted by A0 and has dimensions of N × (k − Δ).
Then, the algorithm uses the SA approach to discard δ rows from A0 to construct a possible
solution B0, i.e., a matrix with dimensions of (N − δ) × (k − Δ), for the OSCAR instance at
hand. To obtain the best solution B, the algorithm explores all possible combinations of
25 / 44
et
WvTit
Wu
then
The time required to execute ME is O
k kD iL , where i log a Tf log a Ti is the number of
log a a
temperature decrements necessary to reach T f .
Hybrid algorithm EM. The algorithm EM also combines the metaheuristic and exact
approaches to solve the OSCAR problem; compared with ME, the difference is that it explores
all possible combinations of δ rows that can be eliminated from the original matrix. The
algorithm proceeds in two phases. First, it chooses a set of δ rows and removes them from the
initial matrix A; the resulting matrix is denoted by A0 and has dimensions of (N − δ) × k. Then,
the algorithm uses the SA approach to discard Δ columns from A0 to construct a possible
solution B0, i.e., a matrix with dimensions of (N − δ) × (k − Δ), for the OSCAR instance at hand. To
obtain the best solution B, the algorithm explores all possible combinations of δ rows and
identifies the best matrix B from among all matrices B0 constructed during the process
described above.
Algorithm 12 describes our EM approach. For each possible combination of rows
JR AR, the algorithm performs a metaheuristic search to define the set of columns
26 / 44
et
WvTit
Wu
then
17: else if RANDOM(0 . . . 1)
18: Wu Wv
19: end if
20: end for
21: Ti a Ti
22: end while
2234:: if bte
sWtS0 < bWes0tS then
25: B constructMatrix
JR; W0
26: end if
27: end for
28: return
B
29: end function
The time required to execute EM is O
NND iL , where i log a Tf log a Ti is the number of
log a a
temperature decrements necessary to reach T f .
In the next section, we demonstrate the performance of our 12 algorithms.
Experimentation
This section presents the experimental design used to test the performance of the proposed
algorithms for solving the OSCAR problem. The methodology consisted of the following steps:
1) A set of benchmark instances was defined. 2) The parameters of the SA algorithm were
subjected to a finetuning process. 3) The performances of the algorithms were evaluated by using
them to solve the benchmark problem instances. 4) A performance comparison against
stateoftheart initialization functions was conducted. 5) The results derived from the algorithms
were used to define new upper bounds for existing CAs.
The proposed algorithms were implemented in the C language and compiled using gcc
with the optimization option O3. We used a computer with 72 Intel Xeon 1.6 GHz CPU
cores and RAM of 64 GB. The remainder of this section describes the experimental
methodology in detail.
27 / 44
Definition of the benchmarks
This subsection introduces the three benchmarks used to properly test the proposed set of
OSCAR algorithms. The benchmark L1 (S1 dataset) consists of 12 small CAs, which are
described in Table 13, and it is used to analyze the performance of all algorithms presented in
this document; then, the algorithms that achieve the best experimental results on this
benchmark in terms of both time and solution quality are further tested on the following benchmark.
The benchmark L2 (S2 dataset), presented in Table 14, consists of 62 CAs; it is an extension of
the benchmark presented in [24] such that the adjusted values of δ and Δ provide support for
the discovery of a greater number of new upper bounds for the related CAs. This benchmark
aids in the identification of the OSCAR solver with the best overall experimental performance,
and it is also used to compare the results of the proposed OSCAR solvers against other
stateofthe art initialization functions. Finally, the benchmark L3 (S3 dataset) consists of 820
instances (see Table 15); this benchmark is used to evaluate the quasiCA construction
performance of IPOGF, a classical and versatile (in the sense that it can rapidly construct any type
of CA) greedy algorithm that is widely used in the literature, against the best OSCAR strategies
identified in the experiments on the previous benchmarks in terms of both the time required
for matrix construction and the quality of the constructed matrices. Table 15 presents the
instances included in benchmark L3, organized into 20 sets. In each set, one OSCAR instance
is defined per value of k considered (from 10 to 50), as shown in column 1; the remaining
columns show the values for v, t, δ, and Δ, which correspond to the alphabet size, the strength,
and the numbers of rows and columns to be eliminated, respectively. We note that the
benchmark L3 is also characterized by its wide variety of values of the strength t and the alphabet
size v.
Finetuning of the parameters of MM
The MM approach is the basis for several of our other approaches. Because this approach
uses the SA algorithm, a finetuning process is necessary to adjust the values of its parameters
to improve its performance. During the tuning process performed in this study, the Markov
chain length L, the final temperature T f , and the initialization function G were fixed; all
remaining parameters (i.e., the initial temperature T i, the decrement factor α, and the
maximum number of evaluations E) were subjected to adjustment. Because different neighborhood
28 / 44
A
CA(53;2,52,5)
CA(53;2,52,5)
CA(53;2,52,5)
CA(93;2,113,6)
CA(188;2,140,9)
CA(120;2,80,8)
CA(120;2,80,8)
CA(153;2,99,9)
CA(194;2,36,10)
CA(206;2,78,10)
CA(165;2,14,12)
CA(165;2,14,12)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(247;2,18,14)
CA(255;2,18,15)
CA(255;2,18,15)
CA(255;2,18,15)
CA(255;2,18,15)
CA(255;2,18,15)
CA(255;2,18,15)
CA(358;2,20,18)
CA(355;2,12,18)
CA(498;2,29,18)
A
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(511;2,22,20)
CA(520;2,22,21)
CA(520;2,22,21)
CA(520;2,22,21)
CA(520;2,22,21)
CA(520;2,22,21)
CA(520;2,22,21)
CA(520;2,22,21)
CA(520;2,22,21)
CA(526;2,24,22)
CA(526;2,24,22)
CA(526;2,24,22)
CA(526;2,24,22)
CA(526;2,24,22)
CA(526;2,24,22)
CA(622;2,26,24)
CA(622;2,26,24)
CA(622;2,26,24)
CA(136;5,68,2)
δ
1
2
3
4
9
10
12
14
16
20
29
46
82
1
2
3
4
6
7
19
21
1
2
3
4
5
6
1
2
3
2
functions are used in our approach, each with a certain probability of being applied, a fourth
parameter was also considered during the tuning process: the application probability of each
neighbor function, denoted by P. The goal of this finetuning process was to test the
performance of MM using different configurations of the parameter values to identify the
configuration that yielded the best performance.
The sets of values considered for the parameters T i, α, and E were {1, 4}, {0.90, 0.99}, and
{100L, 500L}, respectively. In the finetuning approach presented in [
46, 47
], a CA is used as a
means of systematically sampling the entire set of parameter value combinations; the method
starts at an initial level of interaction t, which is used to construct a CA(N; t, k, v), and t is then
increased until the generated sample is suitable for the purposes of the experiment. The
present study required the smallest possible sample in order to reduce the experimental time; this
sample was constructed using an interaction level of t = 2. A summary of the final
combinations of values tested, derived from the constructed CA(4; 2, 3, 2), is shown in Table 16.
29 / 44
Meanwhile, the vector of probabilities P used for the initialization functions Si was defined
based on solutions to the Diophantine equation a1x + a2x + a3x = 10, following the approach
presented in [44]. During this process, each of the 66 solutions to the Diophantine equation
was used to generate a possible vector P, in which the probability value for each initialization
function i was estimated as 1x0i.
Because we considered 4 different configurations of the values of the parameters T i, α,
and E and 66 configurations of the probability vector P, the experiment to finetune MM
T i
1
4
4
1
0.90
0.99
0.90
0.99
α
Δ
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
E
100L
100L
500L
500L
GE
110
2
74
43
187
267
9
92
82
143
108
161
Time (sec.)
GGCR
involved 264 different parameter value configurations. Each configuration was used to solve
two instances of the OSCAR problem, specified by
A CA
31; 2; 35; 4; d 5; D 5 and
A CA
255; 2; 18; 15; d 6; D 8, with a total of 31 runs per instance, where the value of
the solution reported was the best among all the runs.
The results obtained from the finetuning process indicated that the optimal parameter
values for MM are T i 4, α = .99, and E 100L and that the desired solution to the
Diophantine equation is a1 = 4, a2 = 3, and a3 = 3. This configuration was also used in the algorithms
GM, MG, ME, and EM, which also use the metaheuristic MM approach.
Evaluation of the 12 proposed algorithms
This section presents the evaluation of the 12 proposed algorithms for solving the OSCAR
problem. All algorithms were tested on the smaller set of 12 instances, L1, to identify the three
best algorithms. Then, the larger set L2 was solved using only those three algorithms to further
evaluate the general performance of these approaches.
The algorithms were first tested using the benchmark consisting of 12 OSCAR instances
derived from 10 CAs taken from the literature. The values δ and Δ for these instances were
fixed such that the size of the resulting array B would represent a possible new upper bound.
In addition, these instances were created such that the size of the search space would permit
us to solve them using all 12 algorithms; this was a concern because exact algorithms must
explore all possible combinations of rows and columns, and therefore, if the search space is too
large, they may require an excessive amount of time.
Table 17 presents the results obtained using each algorithm based solely on the greedy
approach (i.e., the algorithms GGCR, GGRC, and GGR ) when solving the benchmark L1. Note
C
that GGRC and GGR show better performance than GGCR; when all instances are considered,
C
the former algorithms result in equal or fewer missing twise combinations compared with the
latter. Therefore, the findings show that it is beneficial to eliminate rows before columns (as in
GGRC and GGR ) when working with initial matrices that are already CAs. This is because when
C
rows are removed from the matrix, those that contribute the least to the CA are chosen for
deletion, and the twise combinations that are lost as a result can subsequently be compensated
for by eliminating the columns that produce them. Meanwhile, although the time performance
31 / 44
GE
EG
GE
of GGR is superior to that of the others for this particular set of instances, it will worsen rapidly
C
with increasing values of δ and Δ. In general, the time performance of GGR will be the worst
C
among the three greedy approaches, as indicated by the theoretical complexities presented
alongside the definitions of these algorithms, mainly because of the greater number of calls to
the greedy strategies for eliminating rows and columns.
Table 18 shows the results obtained using the hybrid approaches that combine the greedy
strategy with either the exact approach or the metaheuristic approach (i.e., the algorithms GE,
EG, and GM) when solving the benchmark L1. An increase in running time is observed for
these approaches, mainly due to the use of the more elaborate strategies of the exact and
metaheuristic algorithms. However, the results achieved also improve upon some of the results
obtained by the solely greedy algorithms. All of these hybrid algorithms achieve the same
results; however, the average time increase for GM is much greater than that for the
algorithms that include exact strategies. Let's point out that the small amounts of times appearing
in the exact approach are indeed a result from its expected theoretical behavior.
Table 19 shows the results for another set of hybrid approaches, all involving the
metaheuristic strategy in combination with either the exact approach or the greedy approach (i.e., the
Time (sec.)
GGCR
GE
EG
32 / 44
GE
GE
EG
algorithms MG, ME, and EM), when solving the benchmark L1. From these results and the
previous ones shown in Table 18 for GM, it can be seen that the algorithms GM, MG, ME,
and EM all find solutions with a comparable number of missing twise combinations to those
in the solutions created by the other (exact or greedy) approaches. However, it should be
noted that the initial matrices used in these algorithms were the best solutions obtained by a
greedy algorithm, and in most cases, the differences in the number of missing twise
combinations between these initial matrices and the results reported by the hybrid algorithms are
nearly zero. These findings indicate that the contribution of these hybrid approaches is
minimal.
Table 20 shows the results of solving the benchmark L1 using the algorithms MM, EECR,
and EERC. Note that EECR exhibits better performance than EERC because there are more
possible ways to select rows than columns. However, exhaustive search algorithms are impractical
for finding a solution to an OSCAR instance except when the values of δ and Δ are both quite
small. The exact algorithm EECR is more suitable when N − δ > k − Δ since a greater portion of
the search space is defined by N − δ; otherwise, EERC behaves better. However, the execution
time of the proposed exact algorithms grows with the desired degree of reduction for a given
instance, and they can become infeasible.
Finally, some additional important observations are noted in the following. First, for every
instance, the solutions obtained by the algorithms GE and EG have the same number of
missing twise combinations. When N − δ > k − Δ, EG requires more time than GE; similarly,
when N − δ < k − Δ, GE requires more time than EG. These findings suggest that GE is
appropriate when N − δ > k − Δ and that EG is appropriate when N − δ < k − Δ.
Second, as δ and Δ increase for a given array A, the number of missing twise combinations
produced by the pure greedy algorithms increases in comparison with the results of the hybrid
algorithms that include exact strategies, i.e., EG and GE. For example, for the instances with
the input array CA(255; 2, 18, 15), we note that the solution obtained by GGR when δ = 3 and
C
Δ = 1 has 267 missing twise combinations, whereas the solution obtained by GE has 257
missing twise combinations; similarly, when δ = 9 and Δ = 11, the solution obtained by GGR has
C
65 missing twise combinations, whereas the solution obtained by GE has 61 missing twise
combinations. It can be inferred that the inclusion of an exact strategy contributes to reducing
33 / 44
the number of missing twise combinations, at the cost of an increase in the time required to
build the matrix.
Third, and most importantly, the algorithms that showed the best performance in the
experiment were GGR , GE and EG; all of them obtained comparable solutions, with only small
C
differences in both quality and time cost. The algorithms that include metaheuristic strategies
consumed considerably more time but showed little difference in the quality of their solutions,
whereas the exact approaches are too expensive for large values of δ and Δ.
To further evaluate the proposed approaches, the algorithms GGR , GE and EG were used
C
to solve the benchmark L2, which includes larger CAs. Table 21 summarizes the results
obtained when solving L2. In addition to this experiment, an instance specified by the array
A CA
136; 5; 68; 2 and values of δ = 2 and Δ = 33 was also solved using the metaheuristic
algorithm MM; this algorithm produced a solution B with zero missing twise combinations,
meaning that the approach constructed a new CA of the form CA(134; 5, 35, 2). This last result
serves as evidence that an approach based on seeking a solution to the OSCAR problem can
also be used to construct CAs.
Performance comparison with stateoftheart initialization algorithms
This subsection evaluates the performance of the proposed OSCAR approaches against the
performance of several stateoftheart initialization functions. For this purpose, the
initialization functions described in the related work section are considered, and their results are
compared with the best solutions obtained using the approaches proposed in this work.
The performance comparison was performed as follows. The benchmark L2 was chosen as
the set of instances to be used in this evaluation. First, the OSCAR algorithms proposed in this
work were used to solve the benchmark, and the best matrix B among all of the results was
obtained for each instance. Then, the initialization functions, denoted by I i, were used to
construct arrays Si of the same dimensions as the matrices derived by solving the OSCAR
instances; i.e., for each instance, we constructed an array S with N − δ rows and k − Δ columns.
Once all of the solutions generated by the OSCAR algorithms and the stateoftheart
initialization functions had been obtained, they were evaluated with regard to the function τ, i.e., the
number of missing twise combinations in each newly constructed matrix. Table 22
summarizes the results of this experiment. Column one shows the identifier of each instance in L ,
2
column two shows the number of missing twise combinations in the best solution obtained
using the OSCAR approaches, and columns three to six present the numbers of missing twise
combinations in the solutions derived using the stateoftheart initialization functions I i.
Note that for the last problem instance, one of the proposed OSCAR approaches (the
metaheuristic algorithm MM) was able to construct a CA, as seen from the fact that the new
matrix has zero missing twise combinations.
Performance comparison with IPOGF, a stateoftheart CA construction approach
The experiment presented here involves the comparison of the GGRC and EG strategies against
the stateoftheart IPOGF algorithm for CA construction. The goal in this experiment was
to evaluate the performance when constructing CAs and/or quasiCAs using IPOGF, a fast
greedy algorithm for CA construction that is widely used in the literature and is versatile in the
sense that it can rapidly construct any type of CA. The GGRC and EG strategies are among the
best of the proposed OSCAR solvers, as indicated by the experiments on the previous
benchmarks. Both of these strategies were compared against IPOGF in terms of the matrix
34 / 44
1070.3
59
60
61
62
GE
74
67
42
EG Instance
74 59
67 60
 61
353 62
construction time and the matrix quality (i.e., the number of missing tcombinations).
Table 23 summarizes the results of this comparison on L3. Column 1 lists each set of instances
in the benchmark. Columns 2 to 4 present the accumulated solution quality (i.e., the
accumulated number of missing twise combinations) per set for each strategy. Columns 5 to 7 report
the accumulated time per set and strategy.
The experiment reported in this section was conducted to test IPOGF as an approach for
constructing CAs and/or quasiCAs. The results shown in Table 23 reveal that the matrices
constructed using EGRC have up to 90% fewer missing twise combinations than those
constructed using IPOGF. In addition, EGRC could generate CAs in 40 of the 820 OSCAR
instances by reducing the number of missing twise combinations to 0, whereas IPOGF failed
to obtained any CA with the desired numbers of rows and columns. Finally, EGRC achieved
better running times than IPOGF for small values of v and t; however, the time performance
of EGRC rapidly worsened with increasing values of the alphabet size and strength. By contrast,
the GGRC strategy achieved time consumption results similar to those of IPOGF while also
improving the solution quality, making it a better choice than IPOGF for the construction of
quasiCAs.
Applications of the proposed approaches for solving the OSCAR problem
In this subsection, we demonstrate that the matrices constructed by solving the OSCAR
problem can be used as initial matrices for metaheuristics for CA construction to assist in the
construction of better matrices. For this purpose, the outputs of the initialization functions
described in the related work section and the best solutions obtained using the proposed
OSCAR approaches were used as the initial matrices for a metaheuristic reported in [22].
Table 24 shows the new upper bounds for CAN(t, k, v) obtained using our proposed
methodology. The second column shows the new CA bounds obtained when using the best matrices
generated by the proposed OSCAR algorithms as the initial matrices for the metaheuristic
algorithm, and the third column shows the previous upper bounds for those CAs. The best
produced solution for each specific instance of the OSCAR problem was used as the initial
matrix for the metaheuristic CA construction algorithm reported in [22], which is also based
on the SA algorithm. Because of the small number of missing twise combinations in all of the
produced initial matrices, the performance of the metaheuristic algorithm was improved. The
results define new upper bounds on CAN(t, k, v) for several CAs.
Conclusions
The present work has indirect implications for the interaction testing of software by aiding in
the construction of tests of economical size (a feasible number of test cases). In particular, this
paper presents and analyzes strategies for the construction of arrays with sufficiently few
36 / 44
37 / 44
t
B
173
161
201
140
263
212
140
140
98
106
103
67
74
67
42
0
missing combinations to be considered quasiCAs. Such arrays are constructed by solving the
problem known as the Optimal Shortening of Covering ARrays (OSCAR) problem. The
development of these strategies is motivated by the fact that the arrays thus produced can be used as
excellent initialization matrices for algebraic or metaheuristic approaches for the construction
of CAs, which are mathematical objects that have broad applications in the testing of software
components.
This work presents an analysis of twelve different strategies for solving the OSCAR
problem. Five of them correspond to greedy and exact approaches previously described in the
literature, whereas the remaining seven algorithms are newly proposed here. The new approaches
involve the use of simulated annealing and hybridization in their design. We note that this
work also provides pseudocodes for the design of all presented algorithms, including, for the
first time, the designs for the greedy approaches, which have been only briefly described in
previous works. In addition, to test these strategies, three new OSCAR benchmarks with more
than 1,000 instances have been designed, representing a considerable improvement over the
previously reported 20instance benchmark in terms of both size and variety in the values of
the strength and alphabet size parameters, t and v, respectively.
The experimental design developed for the comparative analysis involved all three
proposed benchmarks. The first benchmark, which consists of small instances, was
solved using all twelve strategies: three greedy algorithms fGGCR; GGRC; GGR g, two exact
algoC
rithms fEECR; EERCg, one metaheuristic algorithm fMMg, and six hybrid approaches
fGE; EG; GM; MG; ME; EMg. Using this benchmark, the algorithms were compared in
terms of running time and solution quality (measured as the number of missing twise
combinations in each constructed array). As expected, the results showed that the greedy algorithms
were the fastest, the exact algorithms yielded the best solutions, and the metaheuristic
provided a balance between quality and time. It was also observed that the solution quality of the
pure greedy algorithms worsened with increasing instance size, but this situation could be
addressed through the use of hybrid algorithms. The hybrid algorithms involving a mixture of
greedy and exact approaches had higher running times but also higher solution quality. The
38 / 44
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
first experiment indicated that hybrid algorithms involving a mixture of the metaheuristic
and greedy strategies are a viable alternative. Such strategies had somewhat higher running
times for array construction but resulted in fewer missing twise combinations than the hybrid
greedy approaches, mainly when the numbers of rows and columns to be deleted were high. In
terms of solution quality, the experimental results indicated that the best algorithms were GE
and EG because they yielded solutions with as few missing twise combinations as the exact
approaches EERC and EECR but in less time. In terms of running time, the experiment indicated
that the best algorithms were GGR and GGRC because they were faster than any other algorithms
C
while maintaining an acceptable solution quality; however, we note that for larger instances,
the time performance of GGR will be worse than that of GGRC because it is more strongly
C
affected by the instance size and the numbers of rows and columns to be removed.
The second benchmark was used to perform an indepth analysis of some of the best
strategies, namely, GGR , GE and EG. This experiment tested the performance of these algorithms on
C
larger OSCAR instances to yield a better understanding of their behavior. The best solutions
were still produced by the hybrid greedyexact approaches GE and EG, but in the latter
approach the time increased exponentially. By contrast, the pure greedy algorithm GGR
C
39 / 44
Previous CAN
50 CAN(355; 2, 12, 18)
52 CAN(354; 2, 8, 18)
53 CAN(2, 28, 18)
120 CAN(2, 20, 20)
120 CAN(2, 19, 20)
120 CAN(2, 18, 20)
120 CAN(2, 16, 20)
120 CAN(2, 15, 20)
153 CAN(2, 14, 20)
153 CAN(2, 13, 20)
153 CAN(2, 12, 20)
188 CAN(2, 11, 20)
92 CAN(2, 10, 20)
194 CAN(2, 9, 20)
206 CAN(2, 8, 20)
165 CAN(2, 7, 20)
165 CAN(2, 20, 21)
247 CAN(2, 18, 21)
247 CAN(2, 16, 21)
247 CAN(2, 14, 21)
247 CAN(2, 13, 21)
247 CAN(2, 11, 21)
247 CAN(2, 9, 21)
247 CAN(2, 8, 21)
247 CAN(2, 21, 22)
247 CAN(2, 16, 22)
247 CAN(2, 12, 22)
255 CAN(2, 11, 22)
255 CAN(2, 10, 22)
255 CAN(2, 8, 22)
255 CAN(2, 22, 24)
255 CAN(2, 16, 24)
255 CAN(2, 10, 24)
continued to be fast, and its solutions only slightly deviated from those of the hybrid
algorithms. After this analysis, the same benchmark was used to compared the best results from
these approaches against the initialization functions generated using stateoftheart methods.
The experimental results showed that in all instances, the number of missing twise
combinations was reduced by approximately 90% in the matrices constructed using the proposed
approach in comparison with those taken from the literature.
Finally, an experiment was conducted using the third benchmark to test IPOGF as an
approach for constructing CAs and/or quasiCAs. The results revealed that with EGRC, the
number of missing twise combinations was reduced by up to 90% compared with IPOGF.
Moreover, it was found that EGRC could obtain CAs in 40 of the 820 OSCAR instances by
reducing the number of missing twise combinations to 0, whereas IPOGF failed to obtain
any CA with the desired numbers of rows and columns. Finally, it was observed that the
running time of EGRC was better than that of IPOGF for small values of v and t but worsened
40 / 44
rapidly with increasing values of the alphabet size and strength. By contrast, the GGRC strategy
achieved running times similar to those of IPOGF while also improving the solution quality,
making it a better choice than IPOGF for the construction of quasiCAs.
A major drawback of some of the proposed approaches (with the exception of the greedy
ones) is the time consumed to solve the problem, which increases with the numbers of rows
and columns to be eliminated. Moreover, the experimental design could be improved to test a
wider range of possible values to adjust the metaheuristic and investigate a wider number
of strategies. The ranges of values of the alphabet size and strength parameters should be
extended to further probe the resulting changes in performance of the different strategies.
Future work should also address the lack of an indepth analysis of the use of the
metaheuristic approach to properly characterize its region of importance. In general, a more extensive
characterization study could provide better insight into the behavior of these strategies, and
this remains as future work.
Supporting information
S1 Dataset. Benchmark L1.
(ZIP)
S2 Dataset. Benchmark L2.
(ZIP)
S3 Dataset. Benchmark L3.
(ZIP)
Acknowledgments
The authors acknowledge the General Coordination of Information and Communications
Technologies (CGSTIC) at CINVESTAV for providing HPC resources on the Hybrid Cluster
Supercomputer ªXiuhcoatlº, which contributed to the research results reported here. The
research reported in this paper was funded through the following projects: CONACYTÐ
MeÂtodos Exactos para Construir Covering Arrays OÂptimos, project number 238469; and CaÂte
dras CONACYTÐFortalecimiento de las capacidades de TICs en Nayarit, project number
2143.
Compliance with ethical standards
All authors declare that a) we do not have any conflicts of interest, b) this manuscript is the
authors' original work and has not been published nor simultaneously submitted elsewhere,
and c) we have acknowledged all entities that have funded this work in any way.
Formal analysis: Jose TorresJimenez, Nelson RangelValdez.
41 / 44
Writing ± original draft: Jose TorresJimenez, Himer AvilaGeorge, Oscar
Carrizalez
Turrubiates.
Writing ± review & editing: Jose TorresJimenez, Nelson RangelValdez, Himer
Avila
George.
Lawrence JF, Kacker RN, Lei Y, Kuhn DR, Forbes M. A survey of binary covering arrays. Journal of
Combinatorial Designs. 2011; 18(1):1±30.
3. Jones JA, Harrold MJ. Testsuite reduction and prioritization for modified condition/decision coverage.
IEEE Transactions on software Engineering. 2003; 29(3):195±209. https://doi.org/10.1109/TSE.2003.
1183927
Sloane NJA. Covering arrays and intersecting codes. Journal of Combinatorial Designs. 1993; 1(1):51±
63. https://doi.org/10.1002/jcd.3180010106
Bush KA. Orthogonal arrays of index unity. Annals of Mathematical Statistics. 1952; 23(3):426±434.
https://doi.org/10.1214/aoms/1177729387
Colbourn CJ, Dinitz JH. The CRC handbook of combinatorial designs. CRC Press; 1999.
Colbourn CJ. Combinatorial aspects of covering arrays. Le Matematiche. 2004; 58:121±167.
Meagher K. Nonisomorphic generation of covering arrays. University of Regina; 2002.
LopezEscogido D, TorresJimenez J, RodriguezTello E, RangelValdez N. Strength Two Covering
Arrays Construction Using a SAT Representation. In: Gelbukh A, Morales EF, editors. MICAI 2008:
Advances in Artificial Intelligence. vol. 5317 of Lecture Notes in Computer Science. Springer Berlin
Heidelberg; 2008. p. 44±53.
BrachoRios J, TorresJimenez J, RodriguezTello E. A New Backtracking Algorithm for Constructing
Binary Covering Arrays of Variable Strength. In: MICAI 2009: Advances in Artificial Intelligence. vol.
5845 of Lecture Notes in Computer Science. Springer; 2009. p. 397±407.
Banbara M, Matsunaka H, Tamura N, Inoue K. Generating combinatorial test cases by efficient SAT
encodings suitable for CDCL SAT solvers. In: Proceedings of the 17th international conference on
Logic for programming, artificial intelligence, and reasoning. SpringerVerlag; 2010. p. 112±126.
Martirosyan S, Trung TV. On tCovering Arrays. Designs, Codes and Cryptography. 2004;
32(1±3):323±339. https://doi.org/10.1023/B:DESI.0000029232.40302.6d
Chateauneuf M, Kreher DL. On the state of strengththree covering arrays. Journal of Combinatorial
Design. 2002; 10(4):217±238. https://doi.org/10.1002/jcd.10002
Colbourn CJ, Martirosyan SS, Mullen GL, Shasha D, Sherwood GB, Yucas JL. Products of mixed
covering arrays of strength two. Journal of Combinatorial Design. 2006; 14(2):124±138. https://doi.org/10.
1002/jcd.20065
Cohen DM, Dalal SR, Fredman ML, Patton GC. The AETG system: An approach to testing based on
combinatorial design. IEEE Transactions on Software Engineering. 1997; 23(7):437±444. https://doi.
org/10.1109/32.605761
Tung YW, Aldiwan WS. Automating test case generation for the new generation mission software
system. In: IEEE Aerospace Conference Proceedings. vol. 1. IEEE Computer Society; 2000. p. 431±437.
Bryce RC, Colbourn CJ, Cohen MB. A framework of greedy methods for constructing interaction test
suites. In: Proceedings of the 27th International Conference on Software Engineering. ICSE'05; 2005.
p. 146±155.
Forbes M, Lawrence J, Lei Y, Kacker RN, Kuhn DR. Refining the InParameterOrder Strategy for
Constructing Covering Arrays. Journal of Research of the National Institute of Standards and Technology.
2008; 113(5):287±297. https://doi.org/10.6028/jres.113.022 PMID: 27096128
Colbourn CJ, Cohen MB, Turban R. A Deterministic Density Algorithm for Pairwise Interaction
Coverage. In: Proceedings of the IASTED International Conference on Software Engineering; 2004. p. 242±
252.
Shiba T, Tsuchiya T, Kikuno T. Using Artificial Life Techniques to Generate Test Cases for
Combinatorial Testing. In: Proceedings of the 28th Annual International Computer Software and Applications
Conference, 2004. COMPSAC 2004. vol. 1. IEEE Computer Society; 2004. p. 72±77.
42 / 44
22.
AvilaGeorge H, TorresJimenez J, GonzalezHernandez L, HernaÂndez V. Metaheuristic approach for
constructing functional testsuites. IET Software. 2013; 7(2):104±117. https://doi.org/10.1049/ietsen.
2012.0074
Nurmela KJ. Upper bounds for covering arrays by tabu search. Discrete Applied Mathematics. 2004;
138(1±2):143±152. https://doi.org/10.1016/S0166218X(03)002919
CarrizalesTurrubiates O, RangelValdez N, TorresJimenez J. Optimal Shortening of Covering Arrays.
In: Batyrshin I, Sidorov G, editors. Advances in Artificial Intelligence. vol. 7094 of Lecture Notes in
Computer Science. Springer Berlin Heidelberg; 2011. p. 198±209.
Feige U. A threshold of ln n for approximating set cover. Journal of the ACM. 1998; 45(4):634±652.
https://doi.org/10.1145/285055.285059
Cohen DM, Colbourn CJ, Ling ACH. Constructing strength three covering arrays with augmented
annealing. Discrete Mathematics. 2008; 308(13):2709±2722. https://doi.org/10.1016/j.disc.2006.06.
036
RodriguezTello E, TorresJimenez J. Memetic Algorithms for Constructing Binary Covering Arrays of
Strength Three. In: Collet P, Monmarch N, Legrand P, Schoenauer M, Lutton E, editors. Artifical
Evolution. vol. 5975 of Lecture Notes in Computer Science. Springer; 2010. p. 86±97.
Colbourn CJ. Tables of Covering Arrays; 2017. OnLine. http://www.public.asu.edu/~ccolbou/src/tabby/
catable.html
National Institute of Standards and Technology. Tables of Covering Arrays; 2017. OnLine. http://math.
nist.gov/coveringarrays/ipof/ipofresults.html
ReÂnyi A. Foundations of Probability. Wiley; 1971.
Kleitman DJ, Spencer J. Families of kindependent sets. Discrete Mathematics. 1973; 6(3):255±262.
https://doi.org/10.1016/0012365X(73)900988
Cohen DM, Dalal SR, Parelius J, Patton GC. The combinatorial design approach to automatic test
generation. IEEE Transactions on Software Engineering. 1996; 13(5):83±88. https://doi.org/10.1109/52.
536462
Lei Y, Kacker RN, Kuhn DR, Okun V, Lawrence J. IPOG: A General Strategy for TWay Software
Testing. In: ECBS'07: Proceedings of the 14th Annual IEEE International Conference and Workshops on
the Engineering of ComputerBased Systems. IEEE Computer Society; 2007. p. 549±556.
Bryce RC, Colbourn CJ. A densitybased greedy algorithm for higher strength covering arrays. Software
Testing, Verification and Reliability. 2009; 19(1):37±53. https://doi.org/10.1002/stvr.393
QuizRamos P, TorresJimenez J, RangelValdez N. Constant Row Maximizing Problem for Covering
Arrays. In: Artificial Intelligence, 2009. MICAI 2009. Eighth Mexican International Conference on. IEEE
Computer Society; 2009. p. 159±164.
LaraAlvarez C, AvilaGeorge H. New Algorithm for PostProcessing Covering Arrays. International
Journal of Advanced Computer Science and Applications. 2015; 6(12):250±254. https://doi.org/10.
14569/IJACSA.2015.061234
AvilaGeorge H, TorresJimenez J, HernaÂndez V. New bounds for ternary covering arrays using a
parallel simulated annealing. Mathematical Problems in Engineering. 2012; 2012:1±18. https://doi.org/10.
1155/2012/897027
GonzalezHernandez L, RangelValdez N, TorresJimenez J. Construction of mixed covering arrays of
strengths 2 through 6 using a tabu search approach. Discrete Mathematics, Algorithms and
Applications. 2012; 04(03):1±20. https://doi.org/10.1142/S1793830912500334
Bryce RC, Colbourn CJ. Onetestatatime Heuristic Search for Interaction Test Suites. In:
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. GECCO'07; 2007.
p. 1082±1089.
Bao X, Liu S, Zhang N, Dong M. Combinatorial Test Generation Using Improved Harmony Search
Algorithm. International Journal of Hybrid Information Technology. 2015; 8(9):121±130. https://doi.org/10.
14257/ijhit.2015.8.9.13
Donald LK, Stinson DR. Combinatorial algorithms: generation, enumeration, and search. CRC Press;
1999.
TorresJimenez J, RangelValdez N, Kacker RN, Lawrence JF. Combinatorial Analysis of Diagonal,
Box, and GreaterThan Polynomials as Packing Functions. Applied Mathematics & Information
Sciences. 2015; 9(6):2757±2766.
43 / 44
Kuhn DR , Wallace DL , Gallo AM . Software fault interaction and implications for software testing . IEEE Transactions on Software Engineering . 2004 ; 30 ( 6 ): 418 ± 421 . https://doi.org/10.1109/TSE. 2004 . 24 45. Van Laarhoven PJM , Arts EHL . Simulated Annealing: Theory and Applications . Philips Research Laboratories; 1992 .
46. RangelValdez N , TorresJimenez J , BrachoRios J , QuizRamos P . Problem and Algorithm FineTuningÐA Case of Study using Bridge Club and Simulated Annealing . In: Correia AD , Rosa AC , Madani K , editors. IJCCI; 2009 . p. 302 ± 305 .
47. PeÂrez Espinosa H , AvilaGeorge H , RodrÂõguezJacobo J , CruzMendoza HA , MartÂõnezMiranda J , Edrein EspinosaCuriel I . Tuning the Parameters of a Convolutional Artificial Neural Network by Using Covering Arrays . Research in Computing Science. 2016 ; 121 : 69 ± 81 .