Adaptive Fault-Tolerant Routing in 2D Mesh with Cracky Rectangular Model
Hindawi Publishing Corporation
Journal of Applied Mathematics
Volume 2014, Article ID 592638, 10 pages
http://dx.doi.org/10.1155/2014/592638
Research Article
Adaptive Fault-Tolerant Routing in 2D Mesh with
Cracky Rectangular Model
Yi Yang,1 Meirun Chen,2 Hao Li,3 and Lian Li1
1
School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
School of Applied Mathematics, Xiamen University of Technology, Xiamen 361024, China
3
Laboratoire de Recherche en Informatique, Bat 490, Universite Paris-Sud 11, 91405 Orsay Cedex, France
2
Correspondence should be addressed to Yi Yang;
Received 7 February 2014; Accepted 9 March 2014; Published 7 April 2014
Academic Editor: X. Song
Copyright © 2014 Yi Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This paper mainly focuses on routing in two-dimensional mesh networks. We propose a novel faulty block model, which is cracky
rectangular block, for fault-tolerant adaptive routing. All the faulty nodes and faulty links are surrounded in this type of block,
which is a convex structure, in order to avoid routing livelock. Additionally, the model constructs the interior spanning forest for
each block in order to keep in touch with the nodes inside of each block. The procedure for block construction is dynamically and
totally distributed. The construction algorithm is simple and ease of implementation. And this is a fully adaptive block which will
dynamically adjust its scale in accordance with the situation of networks, either the fault emergence or the fault recovery, without
shutdown of the system. Based on this model, we also develop a distributed fault-tolerant routing algorithm. Then we give the
formal proof for this algorithm to guarantee that messages will always reach their destinations if and only if the destination nodes
keep connecting with these mesh networks. So the new model and routing algorithm maximize the availability of the nodes in
networks. This is a noticeable overall improvement of fault tolerability of the system.
1. Introduction
In the last decades, the goal of many researchers was to study
communication operations in networks with fixed topologies,
including modeling architectures and routing algorithm of
parallel computers and cluster or middle area communication
networks (such as metropolitan networks covering a town
or a small region). The quality of such networks strongly
depends on correct and efficient execution of communication
operations.
Direct networks [1] become a popular architecture for
communication networks, especially in massively parallel
computer system. In direct networks, nodes (computers)
are connected to only a few nodes, that is, its neighbours,
according to the topology of the networks and communicate
with each other by exchanging messages. Moreover, the mesh
structure is one of the most important topology of direct
networks. Especially, low dimensional mesh networks, due
to its low node degree, are more popular than the high
dimensional mesh networks. Currently most of architecture
of parallel computers is based on two-dimensional mesh
topology, for example, Seitz et al. 1988 [2], Intel Touchstone
DELTA [3, 4], and Intel paragon.
Several models based on direct networks have been
studied ([5–9]), especially the two-dimensional mesh ([10–
16], etc.) for communication operations. The purposes of
these papers mainly focus on how to route messages in
the two-dimensional mesh. Routing is the process to send
messages from source nodes to destination nodes, passing
some intermediate nodes. A very important aspect of message
routing is its ability to route from a source node to a
destination node, avoiding all faulty nodes or links.
Basically, there are two types of message routing:
(1) deterministic routing that is routing in which the
routes between given pairs of nodes are determined
in advance of transmission,
(2) adaptive routing that allows us to take any path
between its source and its final destination; that is,
2
Journal of Applied Mathematics
the path is adaptively constructed in the process of
routing.
The deterministic routing algorithms are simple and ease
of implementation, this is the advantage for deterministic
routing. However, adaptive routing can reduce network
latency and increase network throughput and the most
attractive point is that it can tolerant more faults than
deterministic routing [17]. Thus the latter one emerged
as an attractive field. In most papers on this field, they
often considered how to make a path between source and
destination node pairs, avoiding the faulty nodes, and most
work used the disconnected rectangular block fault model
[11]. The disconnected rectangular blocks are composed of
the faulty nodes and their neighboring nonfaulty nodes with
the principle of maintaining rectangular shape. As a result,
adaptive routing can tolerate faulty nodes by bypassing these
rectangles. However, in order to maintain its rectangular
shape, the block has to group some nonfaulty nodes inside,
called unsafe nodes in these papers. Of course, these unsafe
nodes will never be used until their corresponding blocks
recovery, and the messages will never be sent to these nodes,
while they should be (as illustrated in Figure 1).
Chien and Kim [18] present a partially adaptive algorithm
for mesh networks. The basic idea is to use the algorithm to
circumfuse any convex faulty regions. If faulty regions are
not naturally convex, good nodes and links are marked as
faulty until the regions become convex. However, once the
faults are located on a boundary, in order to tolerate faults,
all nodes form that boundary will become faulty. Boppana
and Chalasani [10] use 𝑓-chain and 𝑓-ring, which is an
extension of disconnected rectangular block fault model, to
route the messages around them, and 𝑓-chain addresses the
boundary problem in the Chien and Kim’s paper. But the 𝑓chain and 𝑓-ring may connect with each other; this makes
the routing algorithm more complex than [18]. In [11], Su
and Shin assume a node to be the basic fault element. They
construct the blocks based only on the faulty nodes; thus they
can only tolerate faulty nodes except the faulty links. Overall,
the construction of these faulty regions is static; that is, once
these regions are constructed, all nodes including the good
ones in these regions cannot join in routing any more. The
faulty regions are not self-adaptive; that is, if some of faulty
nodes in these faulty regions are fixed well, then the faulty
regions will be held as they were, but actually they can release
some good nodes and become smaller ones keeping convex
shape.
Adaptive fault-tolerance routing technologies are also
using in WSN (Wireless Sensor Networks), MEMS (MicroElectro-Mechanical Systems) and SoC (System on Chip)
to increase the usability (...truncated)