K-means clustering for SAT-AIS data analysis
WMU Journal of Maritime Affairs
https://doi.org/10.1007/s13437-021-00241-3
IAMU SECTION ARTICLE
K -means clustering for SAT-AIS data analysis
Marta Mieczyńska1
· Ireneusz Czarnowski2
Received: 19 December 2019 / Accepted: 16 May 2021 /
© The Author(s) 2021
Abstract
The paper deals with a problem of automatic identification system (AIS) data analysis, especially eliminating the impact of AIS packet collision and detecting existing
outliers in AIS data. To solve this problem, a clustering-based approach is proposed.
AIS is a system that supports the exchange of information between vessels about
their trajectories, e.g. position, speed or course. However, SAT-AIS, which enables
the system to work on a global scale, struggles against packet collisions due to the
fact that the satellite, which receives AIS data from ships, has a field of view that
covers multiple areas that are not synchronized among themselves. As a result, the
received data is difficult to process by AIS receivers, because most of the messages
have a character of noise. In this paper, results of a computational experiment using kmeans algorithm for packet recovery and for dealing with noise have been presented.
The outcome proves that a clustering-based approach could be used as an initial step
in AIS packet reconstruction, when the original data is incorrect .
Keywords K-means · Clustering · SAT-AIS · Data analysis ·
Maritime data analytics
1 Introduction
An automatic identification system (AIS) is an automatic tracking system that has
been developed according to the International Maritime Organisation (IMO) regulations. The aim of creating such system was to develop a technology that would
Ireneusz Czarnowski
Marta Mieczyńska
1
Department of Marine Telecommunications, Gdynia Maritime University, Morska 81-87,
81-225 Gdynia, Poland
2
Department of Information Systems, Gdynia Maritime University, Morska 81-87,
81-225 Gdynia, Poland
M. Mieczyńska, I. Czarnowski
provide information about ships, including their unique identifier, type, position,
speed, course and current state, to other vessels and shore stations automatically (International Maritime Organisation (IMO) 2019). The dynamic information
is obtained from the ship’s navigational sensors such as its global navigation satellite system (GNSS) receiver and gyrocompass. On the other hand, static information
(e.g. ship’s identifier MMSI) is permanently programmed on the ship’s equipment.
Both of them are formed into binary format to create AIS messages and transmitted
regularly using dedicated transponders. The reception of AIS messages is performed
by either ships or land-based systems (e.g. vessel traffic systems) (exactEarth 2015).
Most of AIS messages are transmitted on a regular basis. For instance, messages containing dynamic information are exchanged every 2 to 180 s (European
Space Agency 2019). Hence, during a specific recording time period a significant
amount of data can be received. To process this huge dataset and actually derive
some meaningful information from it, the use of modern, advanced technology is
required (Czarnowski 2019). Machine learning methods might be one of the possible
approaches here, since it provides algorithms that cope with, among others, finding
a pattern in a huge dataset (Mieczyńska and Czarnowski 2019).
Nowadays, a need for carrying out the analysis of AIS data appears more and more
often. The reason is that such functionality is utilized by various applications. The
importance of AIS data analysis is crucial especially for maritime industry since the
usage of data analysis may lead to improved performance of monitoring and optimization of maritime processes. Examples of those applications might be related to
the maritime safety. For instance, the usage of a system that would predict the vessels’ movement may result in an early collision avoidance between ships (Zhang et al.
2015). The same system may be indispensable when it comes to predict a vessel’s
location (Liang et al. 2019) in emergency situations, when the connection with that
ship is lost. Another example of analysis of both real-time and historical data is an
identification of abnormal vessels’ activity that may lead to the detection of an act
of piracy (Lane et al. 2010). On the other hand, AIS data might also be useful in a
research of industrial usage in the form of maritime traffic analysis — prediction of
the load in seaports and its optimization (Millefiori et al. 2016) or route planning (He
et al. 2019).
The original, terrestrial AIS utilizes two VHF (very high frequency) frequencies
(161.975 MHz and 162.025 MHz) with the bandwidth of 25 kHz. To manage the
access to the wireless medium by multiple AIS transponders, the TDMA (time division multiple access) method is used. A single device is allowed to transmit only
during a pre-determined period of time (called slot). More specifically, each AIS
transponder must preannounce the time slots it wants to use (this technique is called
self-organizing TDMA (SOTDMA)). Time slots filled with information from various
devices form a time frame. Nine 1-min-long time frames (consisting of 2250 26.6ms time slots per radio frequency channel) are then grouped into a communication
cell. Within such a communication cell, slot selection is organized randomly. Devices
choose their time slots so they can transmit in a pre-assumed rate (which depends
on such factors as the speed of the vessel or its heading). If the AIS transponder
changes its slot assignment, it must transmit its new assignment and timeout for that
assignment.
K -means clustering for SAT-AIS data analysis
Although original (terrestrial) AIS itself has many advantages and potential applications, there are some drawbacks of this system as well. As mentioned before, it has
been originally developed to provide information about nearby vessels that could be
used to prevent collisions of vessels. The information about ships’ movement (course,
position, speed) is exchanged between them and shore stations regularly, so they are
able to recognize other vessels that may appear on their paths. However, the main
limitation of this communication is its range. Due to the Earth’s curvature, the horizontal range of terrestrial AIS’ visibility is about 74 km (40 nautical miles) from
shore (European Space Agency 2019). Consequently, this indicates that the original
AIS is a system working on a local scale, i.e. on a ship-to-ship basis or around coastal
zones only.
To overcome such a problem and enable AIS to work on a global scale, a SATAIS system has been proposed (European Space Agency 2019). In general, SAT-AIS
utilizes satellites (e.g. AAUSAT3) on low-earth-orbit to increase the range of transmission. Messages sent by ships are recorded by a satellite (which has a broader range
of view due to its altitude) and then transmitted to ground stations for further processing and distribution (Wawrzaszek et al. 2019). Although it seems to solve man (...truncated)