X-Vision An Augmented Vision Tool With Real-Time S
X-Vision An Augmented Vision Tool With Real-Time S
X-Vision An Augmented Vision Tool With Real-Time S
net/publication/325557252
CITATIONS READS
0 2,233
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Yongbin Sun on 02 July 2018.
efforts have been devoted to applying existing computer vision sense the environment and object’s attributes to create a more
techniques to enhance user-environment interaction experience intimate and comprehensive interaction between the humans
for different purposes. Research work on this track benefits and surrounding objects.
areas, such as education [10], tourism [6] and navigation
[25], by improving user experience. Our work follows this III. S YSTEM
trend by fusing object recognition and 3D pose estimation Our system (hardware and visualization shown in Fig. 1)
techniques with RFID sensing capabilities, aiming to create a contains two parallel branches (shown in Figure 2) to concur-
smart environment. rently detect and sense RFID tag-sensor attached objects. On
one side, the system captures color and depth images using
B. Emerging RFID Applications the depth camera for in-view target object identification and
RFID is massively used as identification technology to pose estimation. On the other side, the system collects tag-
support tracking in supply chain, and has so far been suc- data reflecting the target object’s physical properties, such as
cessfully deployed in various industries. Recently industry’s temperature, using an RFID interrogator\reader. Information
focus seems shifting towards generating higher value from collected from both sources are uploaded to a shared central
the existing RFID setups by tagging more & more items server, where heterogeneous information is unified and deliv-
and by developing new applications using tags that allow ered to the HoloLens for augmented visualization. Details are
for sensing, actuation & control [8] and even gaming [1]. given in the following subsections.
Another such exciting application with industrial benefit is
fusion with emerging computer vision and AR technologies. A. Object Identification and Pose Estimation
Fusion of RFID and AR is an emerging field and there are Our system uses an Intel RealSense D415 depth camera
recent studies combining these technologies for gaming and to capture color and depth information. It is attached to an
education, yet we see lot of space to explore further especially HoloLens via a custom mount provided by [9], and faces in
going beyond ID in RFID. One of the earlier papers [20] the same direction as the HoloLens (Figure 1). The captured
studied the use of RFID to interact with physical objects images are used to identify the in-view target object and
in playing a smartphone-based game which enhanced the estimate its pose.
gaming experience. Another study [27] used a combination Object Identification: Object recognition is a well-studied
of smart bookshelves equipped with RFID tags and mixed- problem, and we adopt the local feature based method [3]
reality interfaces for information display in libraries. Another in our system, since it is suitable for small-scale database.
study [2] explores the use of AR with tags to teach geometry to Generally, to identify an in-view object from a given database,
students. These studies show a good interest in the community the local feature based method first extracts representative
to explore mixed reality applications using tags for object IDs. local visual features for both the scene image and template
In this paper, we use RFID for not just ID but also to wirelessly object images, and then matches scene features with those
of each template object. The target object in the view is
identified as the template object with the highest number of
matched local features. If the number of matched features
of all template objects is not sufficiently large (below a
predetermined threshold), then the captured view is deemed to
not contain a target. Our system follows this scheme, and uses
SURF algorithm [3] to compute local features, since compared
to other local feature algorithms, such as SIFT [18], SURF is
fast and good at handling images with blurring and rotation.
Pose Estimation: After identifying the in-view object, our
system estimates its position and rotation, namely 3D pose,
in the space, thus augmented information can be rendered
properly. We achieve this by constructing point cloud of the
scene, and aligning the identified object’s template point cloud
with it. Many algorithms exist for point cloud alignment, and
we adapt widely-used Iterative Closest Point (ICP) algorithm
[4] in our system, since it usually finds a good alignment
in a quick manner. To obtain better pose estimation results,
Fig. 3. Tag-sensors for sensing object properties: Top row: Water level
especially for non-symmetric objects (i.e. mug), a template sensing using paper tags; Bottom row: Temperature sensing using custom
object usually contains point clouds from multiple viewpoints. tag with EM4325.
Yet, the performance of ICP relies on the quality of the approach is based on tag-antenna’s response to changed en-
initialization. Our system finds a good initial pose by moving vironments as a result of sensing event. Change in signal
a template object’s point cloud to the 3D position that is back- power or response frequency of the RFID tag due to this
projected from the centroid of matched local feature coordi- antenna’s impedance shift can be attributed to sensing events
nates in the scene image. The coordinates of correctly matched like temperature rise [24], presence of gas concentration [19],
local feature are the 2D projections of target object surface soil moisture [13] etc. Another approach is to use IC’s on-
points, thus back-projecting their centroid should return a 3D board sensors or external sensors interfaced with GPIOs [7]. In
point close to target object surface points. After initializing the this study, we use both the antenna’s impedance shift approach
pose of template point cloud, our system refines its pose using to detect water-level and the IC’s on-board temperature sensor
ICP. Finally, the estimated pose can be represented as a 4 × 4 to detect the real-time temperature in a coffee cup.
matrix, Mpose = Mini Micp , where Mini is the transformation Water Level Sensing: Water-level sensor works on the con-
matrix for pose initialization, and Micp is the transformation cept of relating the detuning of the tag’s antenna in the
matrix for pose refinement using ICP. All the transformation presence of water in the neighborhood of the tag. In this study,
matrix are in the format of we used tags as the water-level sensors on common house-
R t hold/office objects such as coffee cup made of paper, ceramic
M=
0 1 mug and plastic bottle. In an empty state, the background
dielectric for the tags is air, therefore, the backscattered signal
, where R is a 3×3 matrix representing rotation, and t is a 3×1
strength from the tags is at the maximum. In the state where
vector representing translation. Related details are illustrated
the mug contains water, the antenna is significantly detuned
in [11].
due to the change in background dielectric, as a result the tag
B. RFID Sensing becomes unresponsive. However, when the mug is emptied the
tag can be read again indicating empty cup. We build on this
An office space already equipped with the RFID infrastruc- concept to detect discrete levels of water in the container by
ture is used as the tagged-environment for the experiments in using three tags to define states as empty, low, mid, and high
this study. The space is set up using the Impinj Speedway (illustrated in Table I). Fig. 3(a) shows the level sensor labels
Revolution RFID reader, connected to multiple circularly implemented on a standard ceramic coffee mug.
polarized Laird Antennas with gain of 8.5 dB. The reader
system is broadcasting at the FCC maximum of 36 dBm EIRP. TABLE I
WATER L EVEL I NDICATION
For the tag-sensors, we make use of the Smartrac’s paper
RFID tags with Monza 5 IC as the backscattered-signal based Status A B C
water level sensors and custom designed tags with EM 4325 Empty 7 3 3
Middle 7 7 3
IC as the temperature sensors. We use the Low Level Reader Full 7 7 7
Protocol (LLRP) implemented over Sllurp (Python library) to
interface with RFID readers and collect the tag-data. Temperature Sensing: Temperature sensor is implemented by
Purely-passive or semi-passive tags can be designed to sense using EM Microelectronics’s EM 4325 with on-board tem-
multiple physical attributes and environmental conditions. One perature as the RFID IC. Fig. 3(b) shows a T-match antenna
with EM 4325 IC and a button-cell battery implemented as
a temperature sensor on a standard coffee cup. Temperature
measurements from this IC can be made in both passive as
well as semi-passive mode. In the passive mode, the tag has
to be in range of a reader antenna. In the semi-passive mode,
the battery or external energy source keeps the IC on. The IC’s
temperature measurement is triggered by writing any random
information into the specific word of the user memory bank
(Memory bank:3 and Wordptr:256). The IC updates this word
with the measured temperature from the on-board temperature
sensor. By reading the word again current temperature can
be known. We have implemented this code using the Sllrup
library. Real-time temperature sensing is possible using this
IC within −64o to 64o Celsius.
C. Augmented Visualization
After obtaining the target object’s identity, pose and physical
properties, the system superimposes augmented information
(i.e. CAD model) onto the object. Since the object’s 3D pose
is estimated in the depth camera coordinate system, a series
of transformations are required to obtain the 3D pose in the
world coordinate system, which is required in the HoloLens
rendering system. Our system computes the transformation
using:
Mworld world HoloLens dep cam
pose = THoloLens Tdep cam Mpose
, where Mdep
pose
cam
and Mworld
pose are the 3D poses in depth
camera and world coordinate system, respectively, THoloLens
dep cam
maps the pose from the depth camera coordinate system to the
HoloLens coordinate system, and Tworld
HoloLens maps the pose
Fig. 4. Water level sensing results: (a) shows the HoloLens rendered results
before and after water is added into a tagged mug; (b) shows multiple objects
from the HoloLens coordinate system to the world coordinate with different detected water levels.
system. All the transformation matrices are in the same format
as those described for pose estimation.
IV. E VALUATIONS
A. Sensing Results Visualization
We first test our system’s performance on both water level
sensing and temperature sensing. A user’s augmented views
are recorded and shown to demonstrate the effectiveness of
the proposed system.
We present water level sensing results for both single object
and multiple objects cases (Figure 4). The system projects 3D
CAD models of identified objects into the scene according to
their estimated poses. The color of projected 3D models is
changed at different heights to reflect different water levels.
As can be observed, our system properly aligns 3D models to
corresponding target objects, even for non-symmetric shapes
(i.e. mug). The detected discrete water levels (empty, low, mid,
Fig. 5. A sequence of temperature sensing results after hot water is added.
high) also matches the actual water level in our experiments. The temperature change after adding hot water results from the temperature
Temperature sensing results are shown in Figure 5. These sensor latency.
results are selected from a recorded video clip containing the shown on the right of Figure 5. Our system shows visually
whole temperature changing process after hot water is poured appealing results.
into a tagged cup. In the beginning, the tag-sensor reports
room temperature for the empty cup. After hot water is added, B. Pose Estimation Evaluation
the tag-sensor reports water temperature (and intermediate We evaluate the pose estimation pipeline used in the system
temperatures). Temperatures are rendered using the color code by considering recognition accuracy, pose estimation quality
TABLE III
P OINT- TO -P OINT R ESIDUAL E RROR