SIDs 2020 Paper 64red

Aircraft Push-back Prediction and Turnaround
Monitoring by Vision-based Object Detection and

Activity Identification
Thai Van Phat∗ , Sameer Alam∗ , Nimrod Lilith∗ , Phu N. Tran† and Binh T. Nguyen‡
∗ Saab-NTU Joint Lab, Nanyang Technological University, Singapore
† Air Traffic Management Research Institute, Nanyang Technological University, Singapore
‡ AISIA Research Lab, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
Abstract—An accurate prediction of aircraft readiness for

departure can help Air Traffic Control (ATC) plan an optimal
pre-departure sequence at which aircraft are dispatched from
the parking stands. This dynamic mechanism between predicting
when all ground handling activities end (Target Off Block Time)
and the pre-departure sequencing (Target Start-up Approval
time) is the core of Airport Collaborative Decision Making. This
turnaround process consists of several activities (fueling, board-
ing/deboarding, loading/unloading, etc.) and involves several
ground support types of equipment and vehicles. In this research,
we propose a visual-analytic approach for detection, tracking
such activities to predict the Target Off Block Time (push-back
time). This research introduces a Convolutional Neural Networks
based video-analytic framework that can monitor the aircraft
turnaround processes, including object detection, object tracking,
activity detection, and push-back prediction. It recognizes an
aircraft type and retrieves turnaround process/activities from
its Aircraft Performance Manual and then detects and tracks
various activities to estimate their completion time with high
accuracy. Live Gate Cam video data was collected from the
Gate 3 at Tokachi-Obihiro airport, in Hokkaido, Japan. We
used 16 videos with the corresponding lengths varying from
40 to 60 minutes for training and five videos for testing. The
video-analytic framework achieves 100% accuracy in aircraft
type recognition, while object detection achieves 0.9514 mean
Average Precision. For activity detection, the median error is less
than 6 seconds, which can be considered very low in the overall
turnaround duration. Furthermore, the proposed framework Figure 1. Aircraft turnaround arrangement involving many objects called
provided accurate push-back prediction than the scheduled push- ground support equipment of B737.
back time in four out of five or 80% of cases.
Keywords—Aircraft Turnaround Monitoring, Computer Vision
Surveillance System, Convolutional Neural Networks.
accuracy may enable ATC to forecast actual runway demand,
I. I NTRODUCTION which could help manage departure queues at the runway
Aircraft turnaround (TA) is the process of preparing an holding point. In turn, it may improve the predictability of
aircraft for a flight; it begins once the plane has reached runway demand and help determine an optimal push-back
an airport apron and completes once the aircraft is ready sequence to ensure smooth take-offs at the runways.
to leave [1]. TA process involves many Ground Support Previous works analyzed the TA time from various perspec-
Equipment (GSE), as shown in Figure 1. The process consists tives, including airport infrastructure and resources [3], [4],
of many activities, including deboarding, catering, cleaning, aircraft boarding [5], or arrival delay [6]. Another approach
fueling, cargo unloading and loading, and boarding, which to TA analysis is through video analytic. In this approach, the
can be performed simultaneously or sequentially depending authors introduced computer vision-based surveillance systems
on the regulations described in the Airplane Characteristics to detect and track aircraft in the airport airside recently [7],
for Airport Planning Manual (APM) [2] (see Figure 3). [8]. Additionally, a surveillance system, namely GAMTOS [9],
TA time plays a vital role in schedule planning, fleet was developed for monitoring aircraft turnaround. However, no
planning, and operation planning. The TA’s better prediction validation has been reported for this system; thus, it might be
2
Figure 2. The proposed framework for monitoring aircraft turnaround process. By detecting aircraft, ground support equipment and recognizing aircraft type,
the framework can detect activities and predict push-back time.
Challenge (ILSVRC) [14] in 2012, thanks to the development

of computational power and the availability of dataset [15].
Since then, there have been several refinements to improve
ConvNet performance, including inception modules [16] and
residual modules [17]. Furthermore, with the introduction of
the depthwise convolutional concept, MobileNet [18] achieved
similar recognition accuracy with fewer parameters, while
Xception [19] achieved higher accuracy with similar pa-
rameters compared with different famous models in 2017.
In 2019, by combining depthwise convolutional operation
with compound scaling method, EfficientNet [10] had been
released with seven versions for the trade-off between accuracy
and computation. By stacking a detection network on top
of ConvNets, they were modified for object detection. To
Figure 3. Aircraft turnaround process of B737 starting when an aircraft arrives avoid confusion, we call ConvNets used for object recognition
(second 0) and ending when an aircraft pushes back. There are 14 activities
which perform simultaneously or sequentially over time (from blue to red). as recognizers and ConvNets used for object detection as
detectors. Similar to recognizers, detector performance has
increased significantly after several refinements, including
considered as a proof of concept system. region proposal networks [20], feature pyramid networks [21],
In this paper, we aim to investigate a computer vision- and bidirectional feature pyramid networks [11].
based framework to monitor the turnaround process to predict To demonstrate the effectiveness of ConvNets, they are
the aircraft’s push-back time, as shown in Figure 2. Our usually applied to a vast dataset, such as ImageNet dataset [14]
framework adopts state-of-the-art models in computer vision, with 1000 classes or COCO dataset [22] with 80 categories. It
i.e., the Convolutional Neural Networks (ConvNets) [10]– results in substantial network sizes, which require a high-end
[12], and is capable of accurately detecting different objects GPU to achieve real-time performance. Due to this reason,
in the apron area, ranging from large aircraft to small fuel when applying ConvNets to specific applications, they are
pipes. The framework also recognizes the aircraft type to customarily modified based on the application requirements.
retrieve the respective TA process information from the APM. Recently, we introduced AirNet [12], which has been de-
The detection of the aircraft and GSE is then employed to signed to prioritize computational speed for high resolution
identify different activities in the TA process and update the (1920 × 1080) input videos for airport airside surveillance. In
schedule progress in real-time. During the TA, the framework this project, we adapt AirNet for detecting relevant objects in
continuously predicts aircraft push-back time and revises it the apron area and recognizing aircraft types.
until the push-back event occurs. B. Monitoring Aircraft Turnaround by Computer Vision
The rest of this paper is organized as follows. We briefly
Although several previous works applied computer vision
review previous related works in Section II and describe the
techniques to airport airside environments [7], [8], there has
video data in Section III. We present our proposed framework
been only one that has monitored aircraft turnaround by
in Section IV. The experimental results are discussed in
using a camera system, namely GAMTOS [9]. The GAMTOS
Section V. Finally, the paper ends with conclusions and future
detected different activities by using a Deformable Part Model
work in Section VI.
(DPM) [23] and predicted push-back time using Bayesian
II. R ELATED W ORKS prediction in the specific gate area at different times of the
day. However, the authors only described activity detection
A. Convolutional Neural Networks but did not mention anything about push-back prediction.
Although ConvNets [13] were introduced in the 1990s GAMTOS used a sliding window with DPM for binary object
for handwriting recognition, they became widely-known af- classification for activity detection, which was heavily time
ter winning the ImageNet Large Scale Visual Recognition consuming [24]. Since it used binary object classification, the
3
Figure 4. Videos collected from a live camera in Tokachi-Obihiro airport with Figure 5. Objects are labelled as bounding boxes. The bounding boxes are
different aircraft types, time and weather conditions. heavily overlap which makes both labelling and detecting difficult.
system required five similar detectors to detect five activities,

which seems ineffective. Notably, the paper only reported the
feasibility study of using computer vision to monitor aircraft
TA, as there was no validation in the report.
This paper’s primary contributions are designing and val-
idating a computer vision-based framework that implements
a state-of-the-art ConvNet architecture to monitor aircraft
turnaround activities and provide aircraft push-back prediction
in real-time.
III. DATA C OLLECTION

Figure 6. Frequency of number of objects per image. The number of objects
Videos were collected from a live camera at Tokachi- per image commonly are from five to nine.
Obihiro airport, in Hokkaido, Japan1 . We chose that airport
as the videos are recorded continuously from close and with
a high view, meaning we can clearly see every object. The We extract images from videos every five seconds for the
camera statically captures boarding gate 3, which has up to training/testing object detector, resulting in 15429 training
four flights per day from Japan Airlines. Furthermore, there images and 2593 testing images. In our opinion, this is a
are only two types of aircraft, Boeing 737-800 and 767-300. very challenging dataset due to several reasons. First, there are
Videos collected in July 2020 are used for training, while commonly five to nine objects per image, and there can be up
videos collected in August 2020 are used for testing. For the to 11, as shown in Figure 6. Second, the objects often heavily
training set, we collected 16 videos with lengths of 40 minutes overlap each other, leading to occlusion. For example, small
to 60 minutes. Videos were gathered at different times with objects (such as fuel pipe, rear cargo loader, and tow bar) can
various weather conditions, as shown in Figure 4. For the be occluded by larger objects, as shown in Figure 5. Third, the
testing set, we collect five videos with similar conditions as the intra-variance of objects (that is, objects from the same class
training set. To predict the actual push-back time, we collect appearing differently) is high. Remarkably, the Boeing 737
schedule information from the APM [2] and departure flight process uses different cargo loaders and trucks compared to
schedule information from the airport website2 . the Boeing 767 process. Finally, variations in time and weather
For labeling, we have object ground truths and activity conditions change the image brightness and can create noise
ground truths. We label the following eight objects - aircraft, such as shadows or reflections, as shown in Figure 4.
aircraft bridge, cargo truck, cargo loader, fuel truck, fuel We similarly extract images to the object detection scheme
pipe, tow bar, and tow truck. These objects are labeled as for training and testing the aircraft type recognition model.
bounding boxes, as shown in Figure 5. For the aircraft class, Compared to the object detection dataset, the aircraft type
we also label aircraft types, which are Boeing 737 and 767. dataset is less challenging as the aircraft images are quite large
Besides this, we also label the following nine activities: aircraft in the FHD images. Also, there are only two classes, which
arrival, bridge attachment, bridge detachment, cargo loader are B737 and B767.
attachment, cargo loader detachment, fuel pipe drawing in, IV. M ETHODOLOGY
fuel pipe drawing out, tow truck connection, and push-back
as a duration (that is, starting and ending times). Figure 2 describes the proposed framework, which monitors
the aircraft TA process and predicts aircraft push-back time.
1 https://www.youtube.com/watch?v=9KPHePhqPRI From a video input, ConvNets are applied to detect aircraft and
2 https://obihiro-airport.com/flight/ GSE and recognize aircraft types. Next, activities are detected.
4
Figure 7. The AirNet recognizer adopted from [12] for aircraft type recognition.
Finally, by combining the detected activities with other data,

including TA manual schedules from APM, TA from historical
data, and the departure flight schedule from the airport, push-
back time is predicted and updated in real-time.
Figure 8 describes the algorithm in detail. First, a detector
repeatedly detects aircraft. If an airplane is detected, the
aircraft type is recognized, and GSE is also detected. The
aircraft type is recognized once if the confidence is high,
or several times when the confidence is low. Other data,
including the TA manual schedule, TA history data, and
departure flight schedule, can be retrieved based on the aircraft
type. At the same time, GSE, including aircraft bridge, cargo
loader, cargo truck, fuel pipe, fuel truck, tow bar, and tow
trucks, are repeatedly detected. From the detected aircraft
and GSE, activities (including bridge attachment/detachment,
cargo loader attachment/detachment, fuel pipe drawing in/out,
tow bar connection, and push-back) are detected. Next, the
actual turnaround schedule is updated in real-time. After that,
the push-back time is predicted and revised until it occurs.
The algorithm finally ends when actual push-back is detected. Figure 8. Flow chart of the algorithm. The main process starts when an aircraft
In our framework, the following three object categories are is detected and ends when push-back is detected.
excluded from the detection algorithm. First, the framework
does not consider objects that do not contribute to the push-
back prediction algorithm. For example, cargo unload/load
activities can be safely ignored; instead the algorithm needs
only to detects cargo loader attachment/detachment. Second,
we ignore activities that cannot be captured by the camera,
such as passenger deboarding/boarding, service galleys, and
service cabin. Third, activities that do not take place in
the recorded videos are also excluded, e.g. GSE for service
vacuum toilet and service potable water, although they exist
in the TA manual.
Figure 9. The AirNet detector adopted from [12] for object detection.
A. Aircraft Recognition and Object Detection
To achieve high performance, we adapt the state-of-the-
art ConvNet architecture and build two different models: a pre-defined ConvNets [10], [11]. To this end, the AirNet [12]
recognizer for aircraft type recognition and a detector for framework, released for aircraft detection in airport airside
object detection. As mentioned in Section III, object detection environment, is customized for object detection and aircraft
is very challenging, but aircraft type recognition is relatively type recognition. We use the lightest version of AirNet rec-
straightforward. An important point is we want to build a ognizer [12] with 141, 740 parameters for the recognizer, as
generic framework that can be used in any airport, regardless shown in Figure 7. The recognizer consists of five identical
of aircraft types or GSE. Therefore, to be able to control the blocks which exploit depthwise convolutional layers [18].
framework fully, we design our ConvNets, instead of using The most complex version of the AirNet detector [12], with
5
Algorithm 1: Attachment/Detachment Detection

Result: Attachment at t0 and Detachment at t1
1 Calculate speed and IOU;
2 Initialize small number 1 , 2 ;
3 t ← 0;
4 while True do
5 if speed[t] ≥ 1 then
6 break
7 end
8 t ← t + 1;
9 end
Figure 10. Aircraft tracking by detection. By comparing object bounding 10 while True do
boxes and classes from previous frames, the algorithm can map them with 11 if speed[t] ≤ 1 and IOU[t] ≥ 2 then
the current frame.
12 break
13 end
1, 078, 662 parameters, is applied for the detector, as shown 14 t ← t + 1;
15 end
in Figure 9. The detector combines a recognizer with a
16 t0 ← t;
bidirectional feature pyramid network [11].
17 while True do
After objects are detected in each frame, they are tracked
by mapping through the video, creating a sequence of bound- 18 if speed[t] ≥ 1 then
ing boxes for activity detection. Two factors are considered, 19 break
classes and positions, when mapping objects. Specifically, 20 end
objects from consecutive frames are considered the same 21 t ← t + 1;
22 end
object if they are from the same category and their positions
23 t1 ← t;
overlap each other. Figure 10 shows the results of this tracking
24 Return t0 , t1 ;
process, in which different colors represent different objects3 .
For training the models, the training images are split into
training and validation sets, in a ratio of 9:1, respectively.
Therefore, we have 13887, 1542, and 2593 images in the The first and easiest activity is the arrival of an aircraft. An
training, validation, and test sets. Two models are trained aircraft arrival happens when an aircraft’s speed is close to
similarly with the binary (sigmoid) loss for the recognizer zero, indicating the aircraft has stopped moving. Next, bridge
and categorical (softmax) loss for the detector. We use Adam attachment/detachment, cargo loader attachment/detachment,
optimization with learning rate 10−3 . We also implement tow truck connection, and push-back are detected by Algo-
a learning schedule and an early stop mechanism for the rithm 1. First, speed and IOU are calculated. Then, for provid-
experiments. For the learning schedule, we reduce the learning ing the resistance to fluctuation, we initialize two hysteresis
rate by a factor of ten once validation loss no longer decreases thresholds for speed and IOU, namely 1 and 2 . The loop
in a period of three epochs. If validation loss does not drop from lines 4 to 9 detects the time that an object starts moving.
in a period of ten epochs, then the training is stopped. The loop from lines 10 to 15 detects the time the object stops
and overlaps with the aircraft, which signifies attachment, as
B. Activity Detection shown on the upper right chart of Figure 11. The loop from
Activity detection requires the relationship between the lines 17 to 22 detects the time the object starts moving again,
involved objects and the aircraft. For example, to detect bridge which signifies detachment, as shown on the lower right chart
attachment/detachment, the position of bridge and planes are of Figure 11. Tow truck attachment is also detected from lines
needed. We extract two pieces of information from the object 10 to 15, while push-back is detected from lines 17 to 22.
position: speed and intersection over union (IOU) of the As the fuel pipe is tiny and heavily overlapped, this object’s
involved objects and the aircraft. To calculate object speed, detection accuracy is lower than for other objects. Fortunately,
we subtract the center points of the object bounding boxes the fuel pipe cannot randomly appear in view. Therefore, the
over time. As we detect objects every second, the speed unit fuel pipe drawing in/out activities are detected as the first/last
is the number of pixels per second. The IOU is calculated time the fuel pipe appears.
by the ratio between the overlapping area and the area of the Activities are detected4 and displayed in real-time, as shown
union of the two objects. The left chart of Figure 11 shows an in Figure 12. The timer begins to count at the moment that an
example of speed (green) and IOU (blue) over time. Because aircraft arrives. Then, activities start when the attachment is
object bounding boxes are not stable over time, the speed and detected and end when detachment is detected. The display
IOU exhibit fluctuation. ends when push-back is detected. The fuel pipe drawing
3 https://www.dropbox.com/s/tze14zctbd7bblu/Object.avi?dl=0 4 https://www.dropbox.com/s/uoe9y5kbbkauvyh/Activity.mp4?dl=0
6
Figure 11. Activity detection is based on speed (green) of a involved object

and intersection over union (blue) of the involved object and the aircraft. The Figure 13. Push-back time prediction update every second. The predicted time
left figure shows the whole process while the upper right and lower right is initialized as schedule time and updated based on the completion of detected
focus on first 200 seconds and last 200 seconds respectively. activities.
that, the predicted time slowly increases as the first cargo

loader does not finish early, as refueling did. After that, it
slightly increases as the second cargo loader takes more time to
complete than it should be. Then, the predicted time suddenly
drops because the bridge detaches early. Finally, the predicted
time increases since the actual push-back may occur later than
the estimated one.
V. R ESULTS
A. Aircraft Recognition and Object Detection
As mentioned in Section IV, we use the lightest version
Figure 12. Detected activities is displayed by the framework starting when (i.e., a minimal number of trainable parameters) of AirNet for
an aircraft arrives (second 0) and ending when an aircraft pushes back (green
star) over time (from blue to red).
the aircraft type recognizer. At this minimal configuration, the
recognizer yields an accuracy of 100% in detecting two aircraft
types. If there are more than two aircraft types, high detection
in/out signifies refueling. We do not distinguish between aft accuracy is still achievable by using a more heavyweight
or forward cargo; therefore, we display those as the first cargo version of AirNet, i.e., introducing more trainable parameters.
and the second cargo. Activities that have attachment and Figure 14 shows the precision-recall curves of the detection
detachment are merged as a duration. for different objects. A high recall detector aims to detect
as many objects as possible, which helps to reduce false
C. Push-back Prediction negative. In contrast, a high precision detector aims to detect
From the TA history and manual, we establish the rela- objects as precise as possible, which helps to reduce false
tionship between the time required to finish activities and the positive. A precision-recall curve indicates the trade-off be-
push-back time. Based on the TA history time, we calculate tween precision and recall, which is important in determining
the average time to complete each activity and the ratios of the detection threshold. In Figure 14, the top-right corner
these average times to the actual push-back time, as shown in is the most desirable region, where the detector is able to
Table I. Additionally, the predicted push-back time could not achieve high precision and high recall simultaneously. Since
be shorter than the manual push-back time from APM. different thresholds are associated with different precision and
The predicted time is updated in real-time, as shown in recall values, Average Precision is an important metric that
Figure 13. First, it is initialized to the departure flight schedule calculates the average precision over all possible thresholds. In
on the airport website. It remains at this value until the first other words, Average Precision, ranging from 0 to 1, indicates
activity (refueling) finishes. Since the refueling activity ends the detector’s performance regardless of detection thresholds
early, the predicted push-back time drops significantly. After (i.e., the higher Average Precision the better performance).
Table II shows the results of object detection, achieving a high-
performance level with a mean Average Precision of 0.9514.
TABLE I. R ATIO BETWEEN FINISH TIME OF ACTIVITIES AND ACTUAL
PUSH - BACK TIME . Large objects (such as aircraft, bridge, tow truck, and fuel
truck) are detected with very high precision. Cargo loader and
Activities Bridge 1st Cargo Loader 2nd Cargo Loader Refueling
Ratio 0.9263 0.5550 0.8319 0.3392 cargo truck are detected with lower precision than other large
objects as they suffer from intra-variance and overlapping. Due
7
TABLE II. AVERAGE PRECISION OF AIRCRAFT AND DIFFERENT GROUND SUPPORT EQUIPMENT.
Class Aircraft Bridge Cargo Loader Cargo Truck Fuel Pipe Fuel Truck Tow Bar Tow Truck Mean
Average Precision 0.9996 0.9998 0.9304 0.9143 0.9028 0.9989 0.8663 0.9988 0.9514
Figure 15. The errors between detected and ground truth activities. Negative
errors indicate activities are detected earlier than ground truth, and vice versa.
Figure 14. Precision-recall curves of the detection for different objects. The
area under each curve indicates the performance of the corresponding detector
(the larger area the better accuracy). seconds compared to 2280 from the manual APM. Inter-
estingly, the difference between actual and scheduled push-
back times seems to be variable, regardless of duration. For
to their small sizes and heavy overlapping by larger objects, example, in the two figures with actual times of approximately
the fuel pipe and tow bar have the lowest detection precision. 2700 seconds, the scheduled times range from 2917 to 3506
However, this does not affect the activity detection as we use seconds; while in the two figures with schedule times of
different methods to detect refueling, and the tow bar does not about 3600 seconds, the actual times range from 2686 to 3446
contribute to the activity detection. seconds.
Moreover, there is no standard process for aircraft
B. Activity Detection turnaround. For example, in the upper right chart of Figure 16,
As ground truth activity time is labeled as a duration, while refueling time and first cargo loader time are similar to each
the detected activity time is estimated instantly, we normalize other, while in the lower-left chart of Figure 16, the bridge
the ground truth activity time using the middle point of that detaches before the second cargo loader does. The framework
duration. We divide nine activities into four groups, namely can get more accuracy in predicting push-back, compared
bridge (bridge attachment/detachment), cargo loader (cargo to the scheduled time, from the time of the first activity
loader attachment/detachment), refueling (drawing in/out of completion (refueling) in two of the five videos. Among these
fuel pipe), and other (aircraft arrival, tow truck connection, two videos, it can more accurately predict push-back from
and push-back). the time the second cargo loader detaches in one video and
from the bridge detaches in the remaining video compared to
Figure 15 shows the errors between detected and ground
the scheduled time. It results in a more accurate push-back
truth activities represented as box plots. The errors are cal-
prediction being available in four out of the five videos.
culated by subtracting the predicted activity time from the
ground-truth, as we want to keep both the magnitude and sign. VI. C ONCLUSION
Therefore, negative errors indicate activities that are detected We have proposed a novel computer vision-based frame-
earlier than the ground truth, and positive errors indicate work that can monitor the aircraft turnaround process, includ-
activities that are detected later. As can be seen, the median ing object detection, activity detection, and push-back predic-
errors for the Bridge, Cargo Loader, and Other groups range tion. By using ConvNets, aircraft type recognition achieves
from 1.5 to 2 seconds. As the fuel pipe has low detection 100% accuracy, while object detection achieves 0.9514 mean
performance, refueling errors are not stable, resulting in more Average Precision. For activity detection, the median error is
outliers. It results in a median error of the refueling time, only smaller than 6 seconds, which can be considered very low in
5.5 seconds, which is still accurate compared to the overall the overall turnaround duration. Furthermore, the framework
turnaround duration. provided more accurate push-back prediction that the airport
schedule in four out of five, or 80% of cases.
C. Push-back Prediction
In the future, we intend to further improve push-back
As there are only five testing videos, we display all these prediction time by collecting Gate CAM videos from differ-
videos’ prediction results in Figure 13 and Figure 16. Based on ent airports and incorporating information from the A-CDM
the five videos, one can see that the departure flight schedule system such as Target Start-up Approval time and runway
times are very different, ranging from 2855 seconds to 3898 configuration and availability. It may lead to a reduction in
8
Figure 16. Push-back time prediction of different TA processes. The difference between actual and schedule push-back times seems to be variable, regardless
of duration.
the aircraft waiting time at the runway holding points, reduce [10] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for
fuel consumption on the taxiways, and improve the passenger convolutional neural networks,” arXiv preprint arXiv:1905.11946, 2019.
[11] M. Tan et al., “Efficientdet: Scalable and efficient object detection,”
experience by having a smoother departure flow. in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2020, pp. 10 781–10 790.
ACKNOWLEDGMENT [12] P. Thai et al., “Deep4air: A novel deep learning framework for airport
airside surveillance,” 2020.
This work was conducted under the Saab-NTU Joint Lab [13] Y. LeCun et al., “Gradient-based learning applied to document recogni-
with support from Saab AB, Saab Singapore Pte Ltd., and Air tion,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
Traffic Management Research Institute, Nanyang Technologi- [14] J. Deng et al., “Imagenet: A large-scale hierarchical image database,”
in IEEE conference on computer vision and pattern recognition, 2009.
cal University, Singapore. [15] A. Krizhevsky et al., “ImageNet classification with deep convolutional
neural networks,” Communications of the ACM, vol. 60, no. 6, pp.
R EFERENCES 84–90, May 2017. [Online]. Available: http://dl.acm.org/citation.cfm?
doid=3098997.3065386
[1] M. Schmidt, “A review of aircraft turnaround operations and simula- [16] C. Szegedy et al., “Going Deeper with Convolutions,” arXiv:1409.4842
tions,” Progress in Aerospace Sciences, vol. 92, pp. 25–38, 2017. [cs], Sep. 2014, arXiv: 1409.4842.
[2] Boeing. (2011) Airplane characteristics for airport planning. https: [17] K. He et al., “Deep Residual Learning for Image Recognition,”
//www.boeing.com/commercial/airports/plan manuals.page. arXiv:1512.03385 [cs], Dec. 2015, arXiv: 1512.03385. [Online].
[3] M. M. Mota et al., “Simulation-based turnaround evaluation for lelystad Available: http://arxiv.org/abs/1512.03385
airport,” Journal of Air Transport Management, vol. 64, pp. 21–32, 2017. [18] A. G. Howard et al., “Mobilenets: Efficient convolutional neural net-
[4] S. Okwir et al., “Managing turnaround performance through collabora- works for mobile vision applications,” arXiv preprint:1704.04861, 2017.
tive decision making,” Journal of Air Transport Management, vol. 58, [19] F. Chollet, “Xception: Deep learning with depthwise separable convolu-
pp. 183–196, 2017. tions,” in Proceedings of the IEEE conference on computer vision and
[5] M. Schultz and S. Reitmann, “Machine learning approach to predict pattern recognition, 2017, pp. 1251–1258.
aircraft boarding,” Transportation Research Part C: Emerging Technolo- [20] S. Ren et al., “Faster r-cnn: Towards real-time object detection with
gies, vol. 98, pp. 391–408, 2019. region proposal networks,” in Advances in neural information processing
[6] B. Oreschko et al., “Turnaround prediction with stochastic process systems, 2015, pp. 91–99.
times and airport specific delay pattern,” in International Conference [21] T.-Y. Lin et al., “Feature pyramid networks for object detection,” in
on Research in Airport Transportation (ICRAT), Berkeley, 2012. Proceedings of the IEEE conference on computer vision and pattern
[7] A. Koutsia et al., “Automated visual traffic monitoring and surveillance recognition, 2017, pp. 2117–2125.
through a network of distributed units,” in ISPRS. Citeseer, 2008. [22] ——, “Microsoft coco: Common objects in context,” in European
[8] N. Pavlidou et al., “Using Intelligent Digital Cameras to Monitor conference on computer vision. Springer, 2014.
Aerodrome Surface Traffic,” IEEE Intelligent Systems, vol. 20, pp. [23] P. F. Felzenszwalb et al., “Object detection with discriminatively trained
76–81, 2005. [Online]. Available: http://ieeexplore.ieee.org/document/ part-based models,” IEEE transactions on pattern analysis and machine
1439483/ intelligence, vol. 32, no. 9, pp. 1627–1645, 2009.
[9] H.-L. Lu et al., “Airport gate operation monitoring using computer [24] P. Sermanet et al., “Overfeat: Integrated recognition, localization and de-
vision techniques,” in 16th AIAA Aviation Technology, Integration, and tection using convolutional networks,” arXiv preprint:1312.6229, 2013.
Operations Conference, 2016, p. 3912.

SIDs 2020 Paper 64red

Uploaded by

Copyright:

Available Formats

SIDs 2020 Paper 64red

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SIDs 2020 Paper 64red

Uploaded by

Copyright:

Available Formats

Aircraft Push-back Prediction and Turnaround

Monitoring by Vision-based Object Detection and

Abstract—An accurate prediction of aircraft readiness for

Challenge (ILSVRC) [14] in 2012, thanks to the development

system required ﬁve similar detectors to detect ﬁve activities,

III. DATA C OLLECTION

Finally, by combining the detected activities with other data,

Algorithm 1: Attachment/Detachment Detection

Figure 11. Activity detection is based on speed (green) of a involved object

that, the predicted time slowly increases as the ﬁrst cargo

You might also like