Warehouse Env
Warehouse Env
Warehouse Env
net/publication/373188882
CITATIONS READS
0 254
2 authors:
All content following this page was uploaded by Ibrahim Shaer on 28 August 2023.
Abstract—On a global scale, buildings account for a significant The introduction of renewable energy disrupts the con-
portion of energy consumption and CO2 emissions, contributing ventional production pipelines in factories that are heavily
to inclement environmental conditions. The Heating, Ventilation, dependent on carbon-based fuels [7]. A gradual induction of
and Air Conditioning (HVAC) system is an important energy
consumer in buildings, which can be controlled to reduce overall these sources into the industrial infrastructure is favourable
energy consumption. This paper provides an overview of the given the operational expenses associated with the complete
HVAC control problem for energy reduction in warehouses. In replacement of traditional energy sources [8]. The integration
particular, this paper first introduces the enabling technologies of renewable sources of energy to power industrial building
for HVAC control. After that, an extensive explanation of the activities is challenged by many factors. On the one hand,
warehouse environment is provided. Issues such as multi-zone
spaces, occupancy profiling, and ambient environmental condi- the storage medium of renewable sources of energy is prone
tions are highlighted in connection to energy consumption. Next, to gradual degradation and inefficiencies due to charging and
a survey of traditional solutions followed by the proposed solution discharging cycles [9]. On the other hand, renewable energy
of incorporating Reinforcement Learning (RL) and Supervised exhibits extreme volatility due to its dependence on natural
Learning techniques are discussed. After that, an in-depth conditions that can be highly unpredictable [10].
summary of the challenges associated with the implementation of
these techniques is examined. Lastly, a use case study is conducted To complement the benefits of the partial integration of
to demonstrate the outlined challenges using a dataset collected renewable energy, industrial building operators can optimize
in a real-world environment. their energy consumption to decrease their energy bills. A
Index Terms—Energy Consumption Reduction, Warehouse common controllable energy consumer for all types of build-
HVAC systems, Use Case study , Reinforcement Learning, Su- ings is Heating, Ventilation, and Air Conditioning (HVAC)
pervised Learning systems. In manufacturing buildings, HVAC systems are used
to control the air quality to ensure the workers’ safety. For
I. I NTRODUCTION warehouses, temperature, humidity, and particulate control
The total energy used in commercial buildings and industrial achieved through HVAC systems are pertinent for its func-
facilities accounts for around 40% and 36% of the global tioning. HVAC systems account for about 30% of the total
energy consumption, respectively [1, 2]. With the expansion energy consumption in warehouses [11]. Therefore, this energy
of industrial activities to new geographical and technological footprint defines a sizable opportunity for technical solutions
territories, these numbers are expected to increase exponen- to optimize warehouse energy consumption.
tially. In this regard, there is a strong correlation between the While thermal comfort is integral for residential buildings, it
proliferation of industrial activities that use different carbon- is less of a pressing issue in warehouses. The HVAC control’s
based fuels to power their processes and the prominence of primary goal is to preserve its recommended indoor climate
greenhouse effects [3]. These effects have resulted in dire conditions to avoid inventory spoilage [12]. An essential part
consequences on the Earth’s ecosystem, manifested by the of the refrigerated inventory is dedicated to the food sector,
accelerated loss of Arctic ice, the rise of sea levels, and which consumes about 72% of the total energy, whereby fossil
frequently inclement weather conditions [4]. Such conditions fuels account for almost 79% of the energy consumed [13].
herald drastic climate changes that have long-lasting effects Every year, almost $35 billion is estimated to be lost in perish-
on the economy, environment, and daily life. able item value worldwide due to spoilage, a condition that is
Many governments have implemented aggressive energy detrimental to industries’ budgets and the Earth’s ecosystem.
legislation to curb the free fall of the Earth’s ecosystem. The These conditions can be addressed by proper HVAC control
new policies are geared towards providing monetary incentives [14]. The optimal control of HVAC systems in warehouses
for industries that manage to limit their carbon footprint by gains importance on three different levels: the social and
relying on alternative sustainable sources of energy, such as environmental levels achieved through curbing greenhouse
wind and solar energy [5, 6]. In the face of these increased emissions, the legal level by following the regulatory bodies,
monetary costs and the volatility of the oil price market, oper- and the financial level by limiting energy consumption.
ators of industrial buildings are compelled to utilize renewable Slashing the costs of HVAC systems in warehouses while
sources of energy and optimize their energy expenditure. maintaining the environmental requirements of the inventory
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 2
is challenging. The uncertainty of renewable energy sources, Section VII explains the practical hurdles of reinforcement
the structural limitations of warehouses, and the human factor learning implementation. Section VIII presents a use case
represented by occupancy patterns are all salient impediments study to showcase the outlined challenges using a dataset
that contribute to the operation of HVAC systems, and conse- collected in a real-world environment. Lastly, Section IX
quently, the energy costs of warehouses. Therefore, these chal- concludes the paper.
lenges must be considered when designing and implementing
an energy optimization strategy. II. BACKGROUND
The warehouse operators introduced different methods into This section examines background information on enablers
their systems to address each challenge. In particular, methods to implementing energy optimization strategy. These enablers
such as scheduled control, reactive control and model predic- include the Industrial Internet of Things (IIoT) devices that are
tive control are the most popular. These methods suffer from connected using the communication capabilities of Wireless
some limitations that undermine their utility in the dynamic Sensor Networks (WSNs). Renewable energy sources are
warehouse environment. For instance, scheduled control de- deployed to replace carbon-based fuels to power the warehouse
grades upon a sudden change in occupancy and renewable gradually. The Building Energy Management System (BEMS)
energy patterns. The reactive control has a myopic view of the can be developed to better manage the energy consumption of
environment, which prioritizes short-term environment vari- warehouses.
ability over long-term goals. Lastly, Model Predictive Control
(MPC) requires models that encompass most of the factors that A. Industrial Internet of Things
contribute to the warehouse environment, an endeavour of time
The emergence of the Industry 4.0 concept is facilitated by
and resource-intensive nature. To address all of the limiting
the desire for the materialization of the inter-connected smart
characteristics of traditional approaches, this work proposes
industry [15]. The envisioned transformation is spearheaded by
Reinforcement Learning (RL) and Supervised Learning to
integrating Internet of Things (IoT) technologies that enable
maintain the environmental requirements of warehouses while
the autonomous sensing, automation, and control of different
reducing energy consumption. The integration of RL and
operations in industrial complexes. Together, these technolo-
Deep Learning (DL) techniques enables the high-dimensional
gies are referred to as the Industrial Internet of Things (IIoT).
mapping of the warehouse environment and addresses the
The inter-connectivity, storage, and computing capabilities of
long-term concerns of HVAC systems through its reward func-
IIoT devices enable different industrial processes to extend
tion formulation. Supervised Learning facilitates the decision-
their existing applications and envision new ways of operation.
making process of RL by modelling and predicting the future
In particular, these technologies are responsible for creating a
values of environmental conditions.
representative network of information on industrial processes
The contributions of this paper are as follows: that shapes the decision-making capabilities of IIoT systems.
• Detail the physical phenomena taking place in a ware- An example of an IIoT use case in a warehouse would be a
house environment; sensor mounted on the warehouse entrance to identify a person
• Discuss the traditional data-scarce methods leveraged entering or leaving the facility. This information collected over
for HVAC control and their limitations in a warehouse a time horizon can profile the occupancy in warehouses.
environment; A drop-off in the costs and sizes of IIoT gateways and an
• Motivate the utilization of Reinforcement Learning com- increase in their processing power facilitate monitoring and
bined with Supervised Learning methods as a replace- analyzing information from different sources. With regards
ment for traditional methods and highlight the challenges to energy consumption, the Annual Energy Outlook report
of their implementation in the warehouse environment; published by the International Energy Agency recommends
• Conduct a case study that implements supervised learning the adoption of next-generation sensors and control technolo-
techniques to predict occupant proxies using a dataset gies, which can reduce energy costs annually by almost $18
collected in a real-world setting; and, billion [16]. As such, the integration of IIoT technologies is
• Analyze the prediction results in connection to the multi- fundamental for the realization of efficient energy optimization
zone and air diffusion challenges in the warehouse envi- strategies.
ronment.
This paper is the first to provide an overview of the energy B. Wireless Sensor Networks
consumption issue in warehouses. It first introduces the core Wireless Sensor Networks (WSNs) are one of the founda-
principles and technologies that are foundational to the energy tional units in IIoT technology. A WSN is a group of spatially
optimization goal, which is explained in Section II. After that, dispersed sensor nodes interconnected using wireless commu-
the main challenges associated with the warehouse environ- nication [17]. The battery-powered sensor node is formed of
ment are explained in Section III. Towards the goal of energy a processor, storage unit, and a group of sensors. Its principal
optimization, Section IV discusses the traditional methods that function is to capture the ambient conditions’ variations and
are currently implemented. Section V explains the role of convert them into electric signals processed by the node’s
supervised learning and reinforcement learning to fulfil the processor [17].
goal of energy reduction. Section VI details the challenges and A favourable property of these sensors is their ability to
solutions of supervised learning implementations. Similarly, sense when deployed far from the target phenomenon. To
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 3
monitor a phenomenon of interest, a host of sensors can be of Digital Twin was meant to mirror processes occurring in
randomly deployed so that they capture every aspect of the product life-cycle management [21]. At its core, this concept
bespoke phenomena. Such a deployment strategy eliminates consists of three components, the physical entity, the virtual
the need for cables and complex wiring and promotes the entity, and the bi-directional data connections. The physical
flexible deployment of IIoT devices in remote and harsh entity is defined as the real-world existence of an entity.
locations [18]. The processing capabilities of each sensor The real-world landscape within which the physical entity
enable pre-processing of the collected data so that the filtered exists is defined as the physical environment. This environment
and useful data can only be sent to the data fusion sites. encompasses all the factors that can contribute to altering
The aforementioned harsh conditions are frequently encoun- the physical entities. On the other hand, the virtual entity is
tered in warehouses. A use case of WSNs in warehouses is defined as the digitized representation of the physical entity.
monitoring the ascent of high-temperature air in warehouses. Different virtual processes such as optimization, analysis, and
Towards that end, sensors are expected to be mounted on the predictions are realized on the virtual entity’s end. The virtual
warehouse’s high ceiling, which is a human-unreachable area. environment mirrors the physical environment constructed
using IIoT devices. The data connections are broken down
C. Deployment of Renewable Sources of Energy into physical-to-virtual and virtual-to-physical connections.
Technologies such as IIoT devices and WSNs continuously
The integration of renewable energy sources into the ware- relay the physical environment to the virtual environment. The
houses’ electric grid should account for the warehouse en- rate of this exchange is referred to as the twinning rate [22].
vironment and the cost-payoff trade-off. Photo-Voltaic (PV) The flow of information obtained from virtual processes to
panels, wind energy, biomass, and electromagnetic field uti- change the state of the physical entity and its environment is
lization are considered prime candidates for supplying renew- materialized using the virtual-to-physical connection. Through
able energy. Compared to other renewable energy sources its actuators, the BEMS can realize this connection, which
such as wind turbines and biomass, the characteristics of bridges the gap between the hypotheses built in the virtual
PV panels mitigate the constraints imposed by the ware- environment and the feedback obtained with their implemen-
house environment. The warehouse rooftops provide a suitable tation in the physical environment [22].
medium for their placement because of the space available Projecting these definitions to the studied environment
and the low probability of sunlight blockage. These conditions is straightforward. Here, the physical entity represents the
are favourable for on-site energy generation. Additionally, HVAC systems in a warehouse environment. The warehouse
the applicability of this setup has been proven by recent environment represents the real-world blueprint, which in the
implementations in large refrigerated warehouses [14]. DT terminology, is defined as the physical environment. The
virtual entity is connected to the main purpose for constructing
D. Building Energy Management System the DT. In the context of HVAC systems, these can include
A Building Energy Management System (BEMS) is an predictive modelling of some environment-specific parameters,
entity responsible for controlling and monitoring loads of changing HVAC setpoints, and scheduling these systems.
different electrical and mechanical entities inside a building. Lastly, the virtual environment encompasses all the factors
This overarching rule can reduce the energy needed for illumi- deemed necessary to aid the virtual entity in achieving its main
nation, heating, and ventilating a building [19]. In the realm of purposes.
HVAC control, BEMS handles its main components, including
Air Handling Units (AHUs), chillers, and heating. III. P HYSICAL P HENOMENA IN WAREHOUSES
The connectivity and monitoring capabilities of IIoT devices The warehouses are characterized by their spacious areas
and WSNs have driven the integration of IoT technology that store the goods and merchandise of a host of big corpora-
solutions into the BEMS. Moreover, the vast and heteroge- tions. The vast spaces and the warehouses’ architecture com-
neous data available and the sensing and control capabilities plicate indoor climate control, resulting in many challenges to
have necessitated the integration of computational intelligence the goal of energy optimization. Each of these challenges is
into BEMS. Machine Learning techniques are suitable for explained in the following subsections as they represent the
achieving the main goals of BEMS in terms of efficient energy physical environment in the DT terminology.
consumption, its integration with the smart grid technology,
and its resilience to any updates of sensory data [20]. Fur- A. Large Air Leaks
thermore, The migration to IP-based networking has enabled
the remote monitoring of energy consumption by a centralized The common leak points in warehouses include large doors
entity promoted by the emergence of user-friendly cloud-based and windows that introduce when opened, an outdoor air-
software-as-a-service applications [19]. flow that disrupts the internal climate conditions [23] . The
extent of this disruption depends on the outdoor thermal
conditions, which can vary based on the warehouse location
E. Digital Twin that dictates the ambient climate. Warehouses may encounter
Grieves et al. [21] introduced the concept of Digital Twin drastic changes in internal thermal conditions in scenarios
(DT) defined as a virtual blueprint containing information where shipping doors are opened and closed to fulfil frequent
about a physical product. In its nascent stages, the concept deliveries. Therefore, any operation with dock doors faces
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 4
In contrast to office, manufacturing, and residential buildings large cities or urban concentrations determines the construction
with fixed or predictable occupancy schedules linked to regular of new warehouses. These conditions along with the brand
working shifts (8-5), workers’ occupancy profile in warehouses of satisfying customer needs contribute to the distribution of
displays high volatility. warehouses associated with specific industries over a large
The working hours in warehouses extend to 12-hour shifts, geographical area. The dispersion of warehouses is favourable
exceeding the expected number of hours in other types of for profitable operations.
buildings [33]. These prolonged shifts occur in irregular The consequence of distributed warehouses operating on
hours, due to the all-year-round operations of warehouses. continent or country levels are numerous for creating global
Warehouses expect shipments at different hours of the day as warehouse HVAC systems. From the structural perspective,
they represent a hub connecting various economies and supply warehouse dispersion translates to building warehouses that
chains. As such, workers’ presence is needed to fulfil these follow the local area regulations. These pre-conditions are
deliveries. Data have revealed that half of the warehouse labour set to adapt the warehouses to the dominant climate [40].
hours in North America are spent searching for products in The unique regulations create diverse indoor climate profiles
warehouses [34]. The walking distances and the effort exerted when exposed to outdoor environmental changes [41]. With
in each part of the warehouse disrupt the stability of thermal the dispersion of warehouses over the European Union, China,
conditions inside warehouses. the USA, and Canada, an array of climates can be encountered
Warehouse workers of big corporations such as Amazon Inc. for warehouses operated by one company. Climates exhibit
are challenged by mandatory overtime that is caused by whim- distinctive characteristics that are inherently integrated by the
sical patterns of consumption [33] and labour shortages [35]. overarching governmental regulations, which naturally affect
Social media trends and unexpected events, such as global the HVAC systems. In that regard, the U.S. Department of
pandemics, are the main contributor to the radical changes in Energy (DOE) recommends specific envelope, lighting, and
consumption [36]. The high demand for some products results HVAC systems for warehouses in each of their 8 defined
in the volatility of occupancy in some warehouse zones. These climates [41]. When one industry or company is involved,
patterns are expected in warehouses with diverse products that the structural differences of warehouses invoke the need for
cater to a large customer base, implying frequent warehouse approaches tailored to the ambient climatic environment.
activities. Receiving and distributing the goods, order picking To illustrate the effect of the dominant climate on the
and shipping [34] are among the most prevalent activities in indoor climate, the following example is explained. When
warehouses. For HVAC systems, this volatility is translated to the warehouse door is opened, a warehouse in the subtropical
the reactive operation of HVAC, contributing to the increase climate of Austin, Texas, located in the Southern U.S., expe-
in energy consumption. riences distinct changes in internal temperature compared to a
The unpredictable patterns of consumption were accentu- warehouse in the humid continental climate of Detroit, located
ated due to the COVID-19 pandemic that forced people to in the Northern U.S. This example showcases the effect of
resort to online shopping to avoid contracting the virus in retail climate and seasons on the indoor environmental conditions
stores [36]. The pandemic has also introduced the new concept and highlights the grave effects of the large doors on these
of social distancing, which enforced approaches to track and conditions, as explained in the Air Leaks challenge. In the
organize occupancy in closed areas to avoid spreading the work by Seifhashemi et al. [40], they studied the effect of
COVID-19 virus. Warehouses are heavily involved in this cool rooftops on energy saving in different Australian climates.
organizational shift that has implicated the supply chains [37]. The results suggest drastic differences between cool and warm
The insights gathered from applying social distancing prove temperatures, which emphasizes the effects of climate on the
its contribution to influenza fizzle [38]. Warehouse owners can HVAC systems in warehouses . Therefore, their integration is
leverage such insights to limit the future exposure of their paramount for the reduction of energy consumption in ware-
workers to flu and address one of the many health hazards houses. A summary of warehouses’ challenges is presented in
associated with warehouse jobs. In conclusion, many factors Table I.
that contribute to the uncertainty of warehouse occupancy
should be considered to maintain the indoor climate of each
warehouse zone.
IV. DATA - DEPRIVED T RADITIONAL S OLUTIONS
Multi-zone Space [29, 30, Fulfillment of supply chain requirements Multiple models should be created to ac-
31] and diversifying the goods’ portfolio count for each zone
– Fulfillment of deliveries throughout the
day Occupancy profiling experiences large drifts
Occupancy Profile [32, – Emergence of E-commerce and the effect and diverges from residential occupancy
33] of unexpected events on the supply chain models
Environmental Conditions Dispersion of warehouses to adapt to accel- Outdoor airflow effects drastically differ be-
[40, 41] erated economic changes tween warehouses of the same industry
A. Programmable and Scheduled Control of thermostats by workers entering the warehouses has a
similar effect. The workers are compelled to change the
Programmable and scheduled control allows regular occu- setpoints due to the drastic differences between indoor and
pants to either manually change the HVAC settings or input outdoor environmental conditions. However, these changes do
fixed occupancy schedules to aid the operation of HVAC not account for the indoor climate requirements needed to
systems. Such an approach is common in commercial build- preserve the freshness of the inventory. The confluent effects
ings, whereby the occupancy profiles can be deterministi- of this approach render the scheduled control unsuitable for
cally quantified [42]. In this case, the building operators are the warehouse environment because it is likely to violate its
responsible for defining the setback or setpoint conditions. thermal constraints.
The conditions refer to the requirements for controlling a
specific space’s indoor climate. The setback conditions are
the minimum acceptable requirements for a space when no B. Reactive Control
occupants are expected, which are defined to conserve energy. This approach addresses the limitations of the Pro-
On the other hand, the setpoint conditions are the levels of grammable and Scheduled Control by reacting to any occu-
conditioning that need to be attained. The manual calibration pancy changes or by setting thresholds that trigger specific
of thermostats is prevalent in residential houses, allowing the changes in HVAC systems, known as the rule-based approach.
adjustment of internal temperatures based on occupants’ level This method exploits the occupancy profiles built using mul-
of comfort. tiple sensors to react to any occupancy changes. Due to its
The literature provides many examples of scheduled con- reactive nature, the HVAC system is triggered upon workers’
trol, also referred to as intermittent control, to replace the arrival to transition from its current setback environmental
continuous operation of HVAC systems with the goal of conditions to its setpoints. Under these definitions, the control
reducing energy consumption. This method was evaluated method represents the virtual processes that are fed with the
under different conditions of buildings structures and climates sensor data that mirrors the physical environment. In what
and varied in complexity and depth of analysis. Works such as follows, the focus will be on the temperature setpoints as they
[43, 44] evaluated the utility of this approach in small house represent the most calibrated environmental condition.
and office spaces, whereby energy savings of 5% and 30% are The literature adopting reactive approaches for HVAC con-
reported, respectively. More profound approaches incorporated trol differs depending on the occupancy-detection strategy. A
intermittent control as part of a broader analysis such as the common method is the integration of different sensors such
selection of an insolation layer [45], peak energy shifting as Passive Infrared (PIR) sensors [30, 49, 50], and occupancy
[46], and selecting the best schedule control strategy [47, 48]. counting and presence methods in the works of [51, 52] into
All of these methods displayed an impressive energy-saving the decision-making processes of HVAC systems. The applied
potential, gauged at around 42% for [45], in the range of 14- methodologies compared to manual and scheduled control
29% and 18-43% for [46, 47], respectively, and 17% for [48]. achieved remarkable energy savings of 42% for [52], in the
ranges between 2-48 % in different climates [51], 20-30% in
The implications of such an approach in warehouses are unoccupied periods in [30], north of 54% for [49], and around
manifold. The unpredictability of workers’ schedules in ware- 28% for [50].
houses causes a discrepancy between the actual and the pre- The seasonal changes result in noticeable effects on the
defined occupancy profiles. The incompatibility in schedules recovery time between setback and setpoint temperatures. The
can unnecessarily operate the HVAC systems, contributing authors in [53] studied the lag-time between the setback and
to energy wastage. In a similar vein, the manual calibration setpoint temperatures in residential houses during the winter
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 7
and summer seasons. The lag time can also be referred to requirements and the energy consumption of HVAC systems.
as the pre-conditioning time [54]. The study reported that the In that regard, the contributions to this field target the pre-
estimated lag time is between 2.25 hours and 6 hours. These dictive component, the control component, or the changes in
intervals are widened in warehouses that are more spacious ambient circumstances. For example, Xu et al. [60] incorporate
than residential houses. the climate and building information to predict the warm-up
The gradual degradation of the thermal conditions from time using MPC for a building’s heating system, contributing
setpoint to setback temperature or vice versa, upon occupancy to 20% energy savings. The work in [61] capitalized on the
changes, can spoil the inventory because of the fluctuation in insights of the previous work by investigating the performance
the inventory’s ambient temperature. Furthermore, the reactive of MPC in a range of climates and outdoor weather conditions,
approach does not integrate the long-term occupancy and with the intention of evaluating the climatic conditions that can
energy price changes but is more fixated on the instantaneous provide the most energy savings. On average, a 10% reduction
changes. Such a myopic view of the environment results in energy consumption is reported that varies with climatic
in HVAC operations of high energy consumption given the conditions. The work by Wang et al. [62] expanded the MPC
unpredictability of occupancy behaviour in warehouses. implementation to multiple zones with different requirements
and thermal dynamics. An energy savings of 24% are reported
C. Model Predictive Control compared to methods without HVAC control. On another note,
Optimal control approaches are proposed to address the in [63], MPC was utilized to control HVAC systems, mechan-
shortcomings of reactive control, especially those related to ical ventilation and lighting, resulting in energy consumption
overlooking the long-term implications of any control action. reduction in the range of 15-20% compared to reactive control.
These approaches depend on two pillars: accurate modelling The gradual degradation of the MPC methods’ models limits
of the building’s thermal conditions and predicting stochas- their applicability in a highly dynamic warehouse environment.
tic factors. However, the two pillars were not addressed in The shortcomings of the MPC approach are not limited to
conjunction. Therefore, the analysis will explain each factor modelling only. Solving MPC problems is time-consuming
separately and then elaborate on their combined effect in a and resource-intensive, hindering their deployment on IIoT
warehouse environment. Both modelling methods fall under gateways of limited resources and applicability in real-time
the virtual processes when adopting Model Predictive control scenarios given the dynamic warehouse environment [55, 57].
(MPC). The model built to reflect the thermal dynamics are specific to
Creating an accurate physical model encompassing all the the environment or zone they were built on. The implications
factors contributing to a building’s thermodynamics is chal- of this supposition are two-fold. First, the developed models
lenging. Building-related factors such as the structure, mate- cannot be applied to other zones or buildings. The models
rial, and buildings’ decay and internal factors such as lighting should be easily transferable to different zones or warehouses
and occupancy affect the thermal conditions of buildings [55]. for a warehouse setting. Second, the geographical dispersion of
However, additional factors such as the thermal insulation and warehouses prevents the use of the “one-size-fits-all” model of
flows from each zone exist in a warehouse setting. Given that the environmental conditions of warehouses given the differ-
many factors are involved, devising a mathematical model that ences in structural regulations in each geographical location.
explains the buildings’ thermodynamics is a time and resource- These factors are critical in devising proper HVAC control in
intensive task [56]. If one of these models is available, its warehouses.
accuracy is expected to degrade with time, especially with In summary, the traditional methods suffer from three
the changes that warehouse structure undergoes [57]. Thermal salient limitations that undermine their applicability to reduc-
insulation degradation and changes in internal zones’ distri- ing energy consumption while maintaining the indoor climate
bution because of renovations are two prominent examples of in warehouses. The programmable and scheduled control is
such changes. oblivious to the dynamicity of the occupancy profile in ware-
The stochastic factors involved in an HVAC system en- houses. The reactive control is more concerned about short-
compass occupancy behaviour. While occupancy profiling is term goals by reacting instantly to any occupancy changes,
not straightforward, several approaches have been applied to which disregards the implications of these myopic decisions
predict occupancy patterns within a specific time horizon. on the long-term goals of the energy optimization strategy.
Defining this horizon is predicated on the pre-conditioning This effect is more profound considering the fluctuations in
time that depends on the accurate modelling of the buildings’ energy prices, occupancy, and weather conditions. As for the
thermal dynamics and occupancy profile. Therefore, any accu- MPC method, mathematical models developed to mirror the
rate HVAC control relies on the accurate prediction of these building’s thermodynamics are prone to degradation due to
factors [58]. Given the occupancy models’ stochasticity and changing conditions. A summary of different methods and
the wear and tear of warehouses, the model defining the pre- their respective pros and cons are provided in Table II.
conditioning either drifts from the actual models or needs to
be constantly calibrated.
V. P OTENTIAL ML-D RIVEN S OLUTIONS
Model Predictive Control (MPC) represents the conver-
gences of the thermodynamics modelling and the modelling of This section discusses the technical requirements of the
the stochastic factors [59]. The literature provides ample exam- warehouse environment and, accordingly, outlines the method-
ples of MPC implementations that jointly consider the thermal ologies that can fulfil these requirements. These solutions
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 8
represent the methods that are part of the virtual processes fits into three disciplines: supervised learning, unsupervised
in the realm of DT. learning, and reinforcement learning (RL). Since evaluating
any energy consumption methodology and under the outlined
A. Warehouse Requirements tasks of predicting important factors and controlling HVAC
systems, unsupervised learning is eliminated as a candidate
The warehouse HVAC requirements must be translated to
method. Therefore, supervised learning and RL techniques are
constraints for any proposed HVAC control methodologies. To
employed for the energy optimization goal in warehouses. The
begin, the warehouse environment experiences high dynamic-
following subsections explain each of these techniques and
ity in occupancy behaviour, renewable energy generation out-
their contribution to the problem under study.
put, outdoor environmental conditions, and electricity prices.
1) Supervised Learning: Supervised learning is concerned
Therefore, the controllers’ ambient environment is unknown,
with learning accurate predictions for a set of labelled data by
which eliminates the utility of physical modelling encompass-
modelling the relationship between a set of variables, denoted
ing all of these conditions [64]. However, these factors should
by features or predictors, and the output variable of interest.
be incorporated into warehouse HVAC operations to preserve
Depending on the type of output variable, the supervised
indoor climate conditions. Predicting these values within a
learning task can be either a classification or regression task.
pre-defined time window is a viable method to mitigate their
With supervised learning, the set of predictors and output
associated uncertainty [64].
variables are always available for training ML algorithms that
The HVAC control decisions should account for their im-
serve as ground-truth data.
plications on future decisions, given the decisions’ temporal
In regard to HVAC control, supervised learning can be
correlations [7]. For example, activating the HVAC system to
leveraged to predict the future values of different factors that
full capacity for precooling purposes leads to an immediate
can facilitate the decision-making process. The number of
spike in consumption and costs; however, in the long term, this
occupants, occupant proxies, energy load predictions, weather
action might prove to be useful. The reaped benefits originate
conditions, renewable energy, and energy prices are factors that
from the probable increase in occupancy or electricity prices
fall under this category. Many studies incorporated this addi-
compared to current conditions. As such, proper HVAC control
tional information to improve the HVAC control performance.
necessitates balancing the long-term and short-term goals in
For example, the works of [52, 65] reported significant en-
its decision-making processes.
ergy savings with occupancy-driven HVAC control. Similarly,
weather, electricity price, and renewable energy forecasts were
B. Candidate Solutions incorporated in multiple studies [64, 66, 67] with the goal
Facilitated by the widespread deployment of IoT sensors, of providing a better overview of the current environment.
the ubiquity of data, and powerful computing, data-driven Developing models that can accurately predict each of these
models have proven their merit in enhancing the buildings’ values can diminish some of their uncertainty. The utility of
energy consumption [27]. These factors along with the control these models is determined by the time window of the resultant
and actuation capabilities provided by IIoT devices and BEMS predictions, which is dictated by the predicted factor and the
bolster Machine Learning (ML) techniques as solutions to space under study [68]. The discussion about lag time or pre-
concerns related to energy efficiency in buildings [20] and conditioning time in the warehouse requirements subsection
in warehouses in particular. ML is a data-driven method that presents a good example of the bespoke window.
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 9
While supervised learning addresses some of the issues at assesses the expected reward that an agent can get starting at
hand, it does not enforce any sequential decision-making. For state s and following the policy π.
HVAC control, this factor is essential to realize the energy Limitations of Vanilla RL: Trial-and-error methods allow
optimization strategy. As such, supervised learning acts as a the RL agent to accumulate experiences involving states and
complementary piece to any HVAC control strategy. actions producing a comprehensive environment model. The
2) Reinforcement Learning: Compared to supervised and discretization of the state and action pairs to determine their
unsupervised learning, RL is a data-driven approach used for corresponding value function results in a large number of state-
sequential decision-making. The RL algorithms possess many action pairs [69]. The tabular methods that assign a value for
advantages that set them apart from other ML approaches. each of these pairs are inefficient because they consider all
RL’s advantages: First, RL’s main goal is to balance short- combinations of state-action pairs. In that regard, the growth
term and long-term goals, reflected by immediate feedback and in these combinations is superfluous as many combinations
reward mechanisms. In relation to HVAC control, the trade- cannot be encountered in real-world scenarios. If tabular
off between these goals stems from the need to immediately methods are followed, continuous values such as energy prices
satisfy indoor environmental conditions versus the uncertainty are partitioned into coarse-grained brackets to mitigate the
of future occupancy behaviour, energy prices, and renewable explosion in the number of possible discrete values. Such
energy sources. Second, RL does not require pre-defined a method implies less accurate mappings between the actual
training data to learn. Since it is a control-centred algorithm, values of energy prices and the HVAC control decision. To that
RL interacts with its environment and learns sequentially using end, different function approximation methods were developed
a trial-and-error methodology. This characteristic is desirable to address these shortcomings.
when no optimal strategy for HVAC control is available. Function Approximation methods: Different feature con-
Lastly, given that RL methods interact with their environment, struction methods can be utilized such as linear and coarse
they instantly receive feedback about the usefulness of their coding methods as a basis for function approximation [70].
HVAC control actions, a property similar to accuracy measures However, these methods are either limited by their assump-
in supervised learning [69]. tions (linearity) or require expert knowledge or extra pro-
RL’s Elements: The RL is applied to control HVAC systems cessing to decide some parameters [70]. As a result, nonlin-
by interacting with an environment that follows a Markov ear function approximation using Artificial Neural Networks
Decision Process (MDP). The RL agent starts interacting with (ANN) provides the tools to approximate value functions
the MDP from an initial state and performs an action, which and creates high-dimensional combinations of states using
results in rewards that guide the agents’ future actions. Upon the provided raw data. The combination of ANN and RL is
the action completion, the MDP transitions to the next environ- referred to as Deep Reinforcement Learning (DRL).
ment state based on MDP’s transition dynamics. The rewards The integration of ANNs and their more profound variants
are accumulated in a time-discounted fashion, which means of Deep Neural Networks (DNNs) results in some prominent
that less weight is attributed to older interactions and their advantages. High-dimensional feature combinations of state
corresponding rewards. While this Markovian modelling sim- and action pairs could not have been quantified using linear
plifies the environment, the definition of the current state can or polynomial interactions or encountered in real-world data
be expanded to include either lagged version of state variables obtained from the trial-and-error method. An additional ad-
or their predictions achieved using supervised models. The vantage of using deep learning (DL) is the ability to integrate
MDP can be represented using a tuple P = (µ0 , S, A, T, λ, R) Transfer Learning (TL) into the mapping of value functions
such that: and state-action pairs [55]. As a result, DRL acquires trans-
• µ0 is the initial state. ferability, which is a favourable property for HVAC control in
• S is the state space that reflects the studied environment. warehouses. The integration of DRL methods for HVAC con-
This set can include the factors deemed necessary for trol is garnering increased attention, manifested its prominent
HVAC control decisions. adoption in literature through works such as [55, 67, 69].
• A is the action space that represents the decision taken 3) Advantages of Data-driven methods: Supervised and
by the agent to control HVAC systems. Reinforcement learning techniques can address the challenges
• R : S ×A×S → − R represents the reward distribution imposed by the warehouse environment and the limitations
where R(s, a, s′ ) is the reward gained when applying an of the traditional approaches. On the one hand, supervised
action a to a state s to transition to a state s′ . learning can predict some of the future values of the fac-
• T : S ×A×S → − R is the transition probability distri- tors contributing to HVAC control; therefore, addressing the
bution, such that T (s′ | s, a) indicates the probability of shortcomings of reactive and scheduled control. On the other
transitioning to state s′ when taking an action a at a state hand, RL combines a long-term outlook, reflected by the
s. reward function, and integrates feedback into its decision-
• λ represents the decaying factor of previous rewards. making process. These two characteristics are lacking in other
The agent’s behaviour when interacting with the MDP is traditional methods, which highlights the superiority of the
determined using a policy π that maps states to actions. The combination of supervised and reinforcement learning. While
notation π(a | s) represents the probability of taking an action this combination improves upon the traditional methods, de-
a at state s, representing a stochastic policy. The quality of a veloping a holistic HVAC control model using these methods
state s is determined using a value function. Here, the quality for the warehouse environment presents its own challenges.
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 10
VI. S UPERVISED L EARNING IN WAREHOUSES : Large Air Leaks challenges. The Wi-Fi connectivity, activity
C HALLENGES AND P OTENTIAL S OLUTIONS levels, and CO2 can serve as better indicators of occupancy.
The optimal HVAC control is predicated on the predictions For example, Wang et al. [74] employed ML techniques to
of specific factors that include outdoor and indoor environmen- predict Wi-Fi connectivity counts and compared it to manually
tal conditions, energy load predictions, and occupancy profil- collected ground truth data in an office environment. Their
ing. Compared to other factors, occupancy profiling is less methodology accurately predicted the number of occupants;
straightforward to quantify. The subsequent sections involve however, some limitations compromise their approach’s ap-
an extensive discussion of occupancy profiling methods, the plicability. The association between the Wi-Fi connectivity
availability of datasets to quantify occupancy, and the adopted count and occupancy is based on the assumption that the
feature engineering techniques, which represent the steps occupants will undoubtedly use their cell phones when joining
necessary for predicting occupancy in a specific space. The a specific area. The reported results showed that this method
discussion analyzes the current state of the art, its challenges, failed to predict peak occupancy as a result of this strong
and prospective solutions for each step. assumption. The limitations of this method also extend to the
adopted feature engineering technique, which extracts time-
based features predicated on the existing data periodicity.
A. SL1: Occupancy Inference Method Again, this assumption precludes the methods’ implementation
Multiple sensors can accurately collect the occupancy be- in situations that are not compliant with these periodic trends.
haviour in a specific space. Classifying occupancy inference Here, the assumptions of Wi-Fi connectivity counts are unfit
methods depends on the data acquisition of occupants’ pres- to be implemented in a warehouse environment. This analysis
ence and movement, which can be divided into two main leaves the CO2 and activity levels as good indicators of
categories. The first category requires the direct involvement occupancy.
of occupants by collecting their identities upon arrival or using
The activity levels can be obtained using a Passive Infrared
videos or images of the involved space to infer the number of
(PIR) sensor that increments an internal counter when a
occupants [71]. The second method is predicated on proxy
certain activity is captured. The workers’ constant movement
estimators of occupants using either motion sensors [30],
to load, unload, or search for inventory boosts the chances of
WiFi signal disruptions [72], or CO2 levels [73] in specific
employing these sensors to reflect the number of workers in a
residential areas.
warehouse. These sensors’ deployment in a warehouse envi-
The first category of methods is intrusive and raises some
ronment presents a more compelling case than its deployment
privacy concerns. Even if participants’ consent is granted,
in an office environment characterized by minimal movements.
these adopted occupancy inference methods also expose some
This fact is challenged by the possibility of accounting for
technical and practical limitations. Works such as [65, 71]
a single occupant twice while moving in opposite directions
explored image- and video-based occupancy inference. These
during the sampling periods. Therefore, the overestimation of
methods require the adoption of Deep Neural Networks
current occupants is possible when employing PIR sensors.
(DNNs) that are data- and resource-intensive algorithms. Ad-
ditionally, the process of obtaining ground-truth data on occu- The CO2 concentrations represent a lagged indicator of oc-
pants’ numbers requires manual labour, which adds monetary cupancy, which is produced by occupants through respiration
concerns to the building operators. Furthermore, this segment [68]. The double-dipping phenomena of activity levels and
of research is in its nascent stages with no performance the interference of other influencing factors for temperature
guarantees. Lastly, surveillance cameras need to be mounted are not encountered when utilizing CO2 concentrations. The
on warehouse zone levels to effectively capture occupants existence of occupants in a specific area automatically influ-
and their activity. To realize this function, significant financial ences the collected CO2 concentrations. Therefore, linking the
investments should be carried out on the warehouse operators’ variability in CO2 concentrations in a specific time window
end. The confluent factors undermine the image- and video- mirrors the change in the number of occupants in a past time
based occupant inference. window. The CO2 concentrations are also used to trigger the
Inferring the number of occupants using proxy indica- ventilation system, which means that sensors that record these
tors has many advantages. The proxy indicators can include concentrations are part of the warehouse infrastructure. Since
temperature, activity levels, CO2 concentrations, and Wi-Fi it is a lagged indicator, supervised learning techniques can be
connectivity. All these indicators preserve the occupants’ pri- leveraged to predict the future values of CO2 concentrations
vacy, which represents a salient concern for surveillance-based using the current state of the environment. One concern
methods. In the case of Wi-Fi signals, anonymization of MAC with associating CO2 variability to the occupancy change is
addresses can address privacy concerns [74]. The listed proxy linked to the type of activity executed by occupants. The
indicators are already part of the warehouse infrastructure so work by Kapalo et al. demonstrates the differences in CO2
no new equipment should be installed for occupancy inference. production [76] as a result of the physical activity, which can
Each of these indicators reflects an aspect of occupancy, be projected to the warehouse environment. A worker that is
which can be leveraged to infer their count [75]. The rise unloading or re-racking the inventory drastically affects the
in indoor temperature can be attributed to the thermal energy CO2 concentrations compared to a worker that is walking
produced by new occupants. However, multiple factors can around. This factor must be considered when solely relying on
affect the indoor temperature such as the ones explained in the CO2 concentration changes to quantify occupancy changes.
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 11
VII. D EEP R EINFORCEMENT L EARNING : C HALLENGES available to train the models specific to the target task. The
AND P OTENTIAL S OLUTIONS implications of applying specific HVAC control actions, and
This section investigates the challenges of applying DRL the partial observability of the environment are prohibitive for
to control warehouse HVAC systems. Each subsection details exploratory interactions with the environment. TL techniques
these challenges’ roots and suggests solutions based on the allow for a smooth transition phase, leveraging the previously
available literature and authors’ experience. acquired knowledge until enough data is accumulated to cater
to the new task at hand.
A. DRL1: Transferability TL methods can be employed in a warehouse environment
This subsection explains the warehouse environment condi- on two levels. On a general level, the warehouse operators
tions that prompt the transferability challenge in DRL. Later, owned by companies are in a constant state of expansion,
use cases that showcase these challenges in connection with driven by the changing needs of economies and supply chains.
DRL are explained and future directions are investigated. Therefore, the establishment of warehouses over large ge-
1) Facilitating Factors: The transferability property is fun- ographical areas is favourable for increasing profits. As a
damental to any developed model, but it gains more impor- result, environmental conditions and local legislation specific
tance in the warehouse environment. The warehouses are in a to each area challenge the seamless transfer of models to new
constant state of updating their business strategies to accom- locations. On the local level, the TL methods can be applied
modate new customers or adapt to any disruptions in supply to warehouses’ multi-zone space. A DRL agent developed in
chains caused by changes in the economies or the occurrence a zone can be transferred to another zone.
of unexpected events. These factors contribute to the expansion The direct transfer of developed models to a new building or
or shrinkage of the inventory. As a result, warehouse operators zone is faced with multiple hurdles. The degree of similarity
construct new zones or brand-new buildings, which require a between the source and the destination task needs to be
fully-functioning HVAC control method and potent supervised quantified to avoid negative transfers [81]. This pre-requisite
learning methods without historical data. The changes to the entails a thorough investigation of the structural, climatic, and
warehouse can be localized such as retrofitting efforts meant internal warehouse dynamics similarity. Experts can resort to
to address the wear and tear of warehouses. The newly created the physical modelling of this similarity when the data is
environments diverge from the old ones in terms of their absent, which can potentially meet this requirement. However,
internal environmental dynamics, which is consequential to if sufficient data exist, this similarity can be quantified using
the DRL agent with no sufficient experience to make an indoor climatic phenomena such as heat transfer [82]. An
informative decision. additional layer of concern is attributed to the selection of
All these conditions facilitate the application of TL methods. a convenient source or reference warehouse, especially when
TL is a method to transfer the knowledge acquired on a many warehouses or zones are involved, which can be realized
source dataset to initiate learning in the target dataset [80]. using many criteria. For example, when different climates
TL is applied by transferring the weights learned from the are involved, the warehouse experiencing the full spectrum
source task to the target task, which in this case represents of seasons can be used as a reference warehouse to other
the value functions. Due to differences between source and warehouses that experience mild seasonal changes. Another
target tasks, re-tuning of the models using the target dataset is example that better applies to the warehouses pertains to
applied. This process is achieved when some neural network patterns of consumption. A warehouse that is close to populous
layers are frozen to retain information from the source task metropolitan areas aggregates the consumption models of
while others are retrained to introduce new information to the sparsely populated areas, which can better serve as a reference
model. TL is required whenever little to no information is warehouse. This discussion represents a broad analysis of the
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 13
transferability aspect in warehouses in general, but it is yet to Structural and Automation Levels: The distinctive levels
analyze the TL mechanisms in DRL. of automation and technological advancements in warehouses
2) Use Cases: In the field of TL applied to DRL, the reflect on the state space and the dynamics of the indoor
differences between the source and the target buildings affect climate. The process of picking up, loading, and unloading
any component of the MDP, which includes the state space, inventory is materialized manually by forklifts or automati-
action space, reward function, transition dynamics, and policy cally by automated storage and retrieval systems. The tran-
[83]. Few works have applied the TL in the context of DRL sition from one technology to another is yet to be fully
for HVAC building control. These works are based on strong materialized, and the path to full automation is still in its
assumptions of uniformity of the state and action spaces, nascent stages [88]. The carbon emissions of forklifts alter
and reward functions [84, 85]. However, such conditions are the indoor dynamics, including the sensor readings, compared
rarely encountered on a local level across the warehouse zones to a more automated warehouse. Even if the state and action
and on a global level across the warehouses scattered over spaces are uniform, these circumstances cause a diversion of
the globe. These conditions engender two broad challenges, the transition dynamics between the source and the target
one related to the indoor climate requirements of each zone warehouses. The integration of renewable energy sources in
and the second pertaining to structural and automation level the operation of HVAC control is another use case of the
differences between warehouses. Use cases that highlight these effect of technological advancement. The incentives of local
aspects will be detailed. governments, the availability of installation equipment, and the
Multi-zone Space: The warehouse’s multi-zone space essen- enabling ambient climate are all important factors that dictate
tially means that HVAC control systems should account for a the viability of integrating renewable energy sources into the
distinctive set of environments. For example, the flammable warehouses’ power grid. Examples of these programs include
zone should include sensors that monitor hazardous gases Ontario’s Clean Energy Credit Registry and the European
or toxic vapours that are common in factories and chemical Green Deal which promote the replacement of traditional
warehouses, known as Metal Oxide Gas (MOX) sensors. These energy sources [89, 90]. The effective integration of renewable
sensors are connected to alarm systems for fast detection of energy sources augments the state and action spaces. On one
any leakages that pose occupational hazards for workers when end, the action space encompasses two sets of actions. The first
inhaled [86]. Meanwhile, these conditions should activate the set is concerned with the control of HVAC setpoints, which
ventilation systems to curb the instantaneous effects of these is a common theme in warehouses. The second set involves
gases. On the other hand, the dry zone requires a special the scheduling of an HVAC system to either a conventional
set of sensors that do not include hazardous gases sensor electric grid or the utilization of renewable energy. In a similar
detectors. Therefore, if the initial DRL model is developed vein, the state space is appended with past, present, and
in the dry zone, the transferability of this model is hampered future renewable energy levels that affect the decision-making
by the inclusion of the MOX sensors. Another example of process of the DRL.
the changes in state space is caused by different climates. The last portion of the transferability analysis is related
Differences in climates introduce new dynamics to the indoor to the MDP’s reward changes. As previously explained, the
environmental conditions and new variables to the state space. reward function shapes warehouse operators’ priorities which
While temperature and humidity are sufficient indicators of can be categorized into two themes: reducing energy con-
outdoor conditions in hot arid climates, rainfall and snowfall sumption and maintaining indoor environmental conditions
amounts are crucial to better understanding the outdoor envi- within predefined setpoints. However, augmenting the electric
ronmental conditions in humid climates. These subtle nuances grid with renewable energy and the thermal requirements of
are fundamental to the HVAC control and are accordingly the inventory contribute to changing warehouse operators’
integrated into the state space. The expansion of the state space objectives. The storage medium of renewable energy sources
has a trickle-down effect on all MDP components that should is prone to gradual degradation and inefficiencies due to the
incorporate this new state, which are foreign applications to charging and discharging cycles [9], and its charging levels
conventional TL techniques. should be maintained within specific thresholds to avoid pos-
Warehouse inventory is stacked with items that cater to sible spoilage. The reward function would integrate this aspect
the requirements of the customer base they serve. For exam- in warehouses with renewable energy generation. The macro-
ple, agrarian communities are more conscious of their con- theme of maintaining indoor environmental conditions hides
sumption, which explains their disposition for fresh and less many subtleties that have a consequential effect on perishable
processed food with limited shelf time. On a different note, items. The extreme fluctuations in indoor climate can result
crops produced by these communities should be monitored in adverse effects on perishable goods [91]. To achieve a
to maintain their quality for future transportation. Based on reduction in energy consumption, the indoor climate can be
these factors, the preferences of the nearby populous and their susceptible to large swings in thermal conditions. Therefore,
economic activities are major determinants of the types of the permissiveness of these changes depends on the type of
stored items. These conditions require integrating sensors that perishable items, which need to be factored into the reward
can detect the deterioration of perishable items [87], which function.
is not a common requirement across warehouses. Similar to Potential Solutions: The challenges of DRL transferability
other cases, these conditions alter the state space, discarding are associated with the warehouse environment and its require-
the employment of conventional TL techniques. ments. Zooming into each of these challenges shows that they
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 14
are shared with the challenges of the IoT sensors. At their core, foundations of such an assumption. Second, collecting the
the hurdles of transferability are related to the non-uniformity sensory data of each room and mapping these states to actions
of the source and target domains, manifested by changes in specific to each room causes an explosion in inputs to the
the state and action space. In the DRL framework, such an NN representing the value function. The convergence to an
alteration affects the neural network (NN) architectures that acceptable solution demands considerable data that would not
map the state-action pairs to their value, known as value be necessarily available. Fortunately, the research community
functions. Surveying the literature does not yield any decisive proposes multi-agent DRL to address scalability issues by
methods for addressing these concerns. Thus, it is important to breaking down the MDP environment into co-dependent ones
tackle such concerns from a fundamental angle based on the to reduce the complexity of training each agent. No concrete
representation of state and action spaces. Finding a common methods are followed to demarcate the state and action spaces
representation between any different sets has been exten- of the spawned DRL agents. However, building structures can
sively studied in the literature. These methods pre-dominantly easily define these agents based on the rooms or thermal zones.
project these sets into a shared dimension achieved using Potential Solutions: On the warehouses’ end, the multi-
linear methods, such as Kernel Principal Components and its agent DRL reaps multiple benefits. First, each agent can
special use cases [92], and non-linear methods such as the train its model based on the requirements and the occupancy
Autoencoders and its variations [93]. However, applying these behaviours of each zone, which conforms with the needs of
methods misses each environment’s shared aspects, which is warehouse operators. Each agent defines its state and action
reflected by their respective NN’s architecture and trained spaces and reward functions. Second, the decoupling of the
weights. The commonality in feature space is mirrored by requirements of each zone provides more tractable and easier-
the commonality in the NN’s architecture [94]. Therefore, to-train agents given the reduction in the state and action
methods that extract these shared aspects are good candidates spaces and the data supplied for training.
to address the non-uniformity of NN architectures induced The multi-agent algorithms can be divided into three groups,
by changes in the action and state spaces. While originally fully cooperative, fully competitive, and a hybrid approach
tailored to different data representations, such as audio or [98]. In short, the cooperative agents work towards optimizing
video, multi-view representation learning [95] is a suitable a long-term goal that is represented by the reward function.
candidate to tackle the TL in divergent environments. A varia- On the other hand, competitive agents interact with their
tion of this method was implemented in a Federated Learning environments to yield a zero-sum game. Lastly, the mixed
(FL) use case, whereby the local models have heterogeneous model includes competitive and cooperative aspects. Assigning
NN architectures [96]. The direct application of this type the multi-agent space of the warehouse HVAC control requires
of learning provides insights into the shared facets of these an in-depth analysis of the environment dynamics that govern
architectures by generating common representations. While each zone. Despite the prominent distinction in each zone’s
not previously implemented in the realm of TL, this suggestion indoor requirements, the HVAC control’s main goal of each
can spawn intriguing applications to address the challenges zone is to maintain that zone’s climate within specific thresh-
of TL in the warehouse environment. An extra benefit of olds and reduce energy consumption. Therefore, the long-
these approaches is quantifying NN’s similarity and addressing term goals are shared between the agents, alluding to the
different NN architectures with common features. cooperative nature of the multi-agent environment. This type
In terms of the variation in warehouse indoor climate of cooperative multi-agent DRL is referred to as team-average
dynamics and alteration of the reward function, reward shaping reward [99]. However, this cooperative regime is defied when
(RS) [97] is a mature field developed to address these two renewable energy is integrated into the equation. In this case,
issues. The RS exploits external knowledge to restructure the agents have to compete to exploit this rare resource
the reward to re-tune the agent’s policy. It is important to to reduce their contribution to energy consumption, which
find the type of applied TL techniques in all use cases. separates a cooperative multi-agent environment from a hybrid
These techniques are zero-shot transfer, few-shot transfer, and one. The literature provides some methods to address the
sample-efficient transfer [83]. Inferring the type of transfer mixed multi-agent DRL environment such as decentralized Q-
is predicated on the similarity between the source and target Learning [100] and Morse-Smale games [101]. These methods
warehouse under study. are yet to be applied in building settings, more so to the
warehouse setup.
Under the cooperative or hybrid paradigm, the multi-agent
B. DRL2: Scalability environment faces challenges related to non-stationarity. The
This subsection outlines the root causes of the scalability environment’s non-stationarity is well-established in the air
challenge and its implications on DRL models. leaks challenges. In particular, when the agent controlling the
Root Causes: The common approaches in HVAC control- cold zone applies an action, this action will affect its zone
related literature delegate this control to a single RL agent. through HVAC functions and other zones by air diffusion.
However, this strategy is impractical and exposes some lim- Therefore, the perceived environment of one agent is affected
itations. First, this strategy is based on loose assumptions of by other agents’ actions, which invalidates the basic assump-
the uniformity of indoor climate dynamics between different tions of stationarity in a single-agent setting. Incorporating
rooms or halls. The location of these rooms, their sizes, the actions of other agents into the state space of one agent
and the occupancy patterns are all factors that erode the is a shortcut that can mitigate the non-stationarity of the
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 15
environment. However, the connectivity issues that may arise • Investigate the contribution of the features resulting from
in the IoT environment hinder the applicability of such an the feature engineering technique on the prediction accu-
approach. If this issue is addressed, the explosion of each racy and draw conclusions linked to the challenges of the
agent’s state spaces is another issue to consider. All of these warehouse environment (o2); and,
conditions present considerable challenges to applying multi- • Analyze the predictions’ divergence in light of the ware-
agent DRL systems in the warehouse environment. Figure 2 house challenges (o3).
depicts a schematic that summarizes the DRL challenges. Each of these objectives is studied independently and traced
to the physical phenomena taking place in the warehouse
VIII. U SE C ASE S TUDY environment. The code is available on the GitHub repository
1
. Such an analysis is instrumental to prove the challenges
This section presents a use case to highlight the above- of the warehouse environment and their effect on the applied
mentioned environment-specific and solution-specific chal- supervised learning procedures. The discussions in connection
lenges. An extensive discussion about the supervised learning- with this environment spawn many research questions to
related challenges is conducted to motivate their connection to be addressed by the research community. Addressing these
the warehouse environment’s structural challenges. hurdles is crucial not only from the intellectual curiosity
standpoint but more from the challenges’ downstream effect
on the supply chains and the Earth’s ecosystem.
A. Objectives
This manuscript has extensively explained the myriad chal-
B. Dataset Description
lenges related to the field of energy reduction in the warehouse
environment. Since the literature and the research community The shortcomings of the datasets as mentioned in the SL2
are yet to investigate this theme, the use case study is ap- are partially addressed in the dataset used [103] for this case
plied to the residential environment. However, the analysis, study. This dataset includes data points collected in various
discussions, and conclusions are projected in the warehouse seasons across two years in a building that includes 7 floors
environment. Toward that end, a publicly available dataset at one-minute intervals. This property engenders a comprehen-
collected in a residential building is employed [102]. Based sive dataset that mitigates the scarcity of data, allows extensive
on the detailed structural- and solution-based challenges, the analysis of the seasons’ effect on any predicted property,
objectives of this use case fall into three main themes: and facilitates the analysis of the methods’ transferability
on different floors and single-floor zones. Lastly, this dataset
• Showcase the effect of different feature engineering tech-
niques on the obtained results, which is connected to the 1 https://github.com/Western-OC2-Lab/Data-driven-Methods-for-the-
supervised learning challenge SL3 (o1); Reduction-of-Energy-Consumption-in-Warehouses-Use-Case.git
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 16
and Random Forest. Each of these algorithms includes a set Window Method Algorithm Parameters MAE
of hyper-parameters to tune. In particular, the lasso and ridge NCL = [64],
a1 1D-CNN 1.11
NFL = [32]
regressions should tune the penalty parameter of their objective h-10 f-10 MD = 10,
a2 Gboost 5.71
function (λ). The trees should tune the maximum depth (MD) MSS = 2
of their grown trees and the minimum samples to execute a MD = 3,
a3 Gboost 0.91
MSS = 10
split (MSS). In addition to the HPO process, it is important NCL = [64],
a1 1D-CNN 1.13
to define the history and future time windows. The history NFL = [64]
time window is important for approaches a1 and a3 that use h-20 f-20 MD = 10,
a2 Gboost 5.72
MSS = 2
lagged versions of features. The future time window defines MD = 10,
a3 Gboost 1.08
the prediction horizon. The history and future time windows MSS = 5
are defined as h − w f − w, such that w = {5, 10, 15, 20}.
TABLE V: HPO results for a1, a2, and a3
The summary of these parameters is provided in table IV.
averaged for each month and the standard deviation is reported. across all boosted trees [106].
Figures 4a and 4b depict the variation of MAE with re- The following approach is adopted to answer the lingering
spect to each season over prediction horizons of 10 and 20 questions that arose from the discrepancy in performance
minutes for approach a3. The AC units are activated over all between the summer season and other seasons. The feature
seasons due to the ambient climate, whereby drastic changes importance is extracted for each absent month, which took
in temperature are not expected. However, these variations are place when the feature engineering approaches were evaluated.
manifested in the adopted binning structure. The highest bin After that, the feature importance of the models with no
values (bin 5) corresponding to the highest AC units’ energy fall months and without one of the summer months are set
consumption are only encountered in the summer season, aside for analysis. Next, the top 6 features that contribute
which aligns with the high-temperature conditions. The figures to over 90% of the importance of the model with no fall
show the importance of the binning process to better extrap- months are highlighted. Later, the percentage difference of
olate the prediction performance, especially if the regression the feature importance between each of the models without
task is skewed towards an interval of values. Generally, a3 one summer month and the model with no fall months is
successfully predicted the energy consumption for different calculated. This percentage is averaged across the three models
bins in Winter, Spring, and Fall semesters for both prediction without one summer month. The adopted process highlights
horizons. An uptick in the MAE is noticeable with the expan- the contribution of each summer month to the deterioration of
sion of the prediction time window; an expected observation performance with respect to higher energy consumption bins
resulting from the increase in uncertain conditions with such (3, 4, 5).
expansion. The increase in MAE follows the increase in the Figure 5 depicts the difference in feature importance associ-
energy consumption bin, which suggests two interpretations. ated with the top features for a 10-minute prediction horizon.
First, there exists a scarce number of instances of higher The names on the vertical axis follow this convention z{zone
energy consumption, preventing the accurate prediction of number} {feature name}(unit) {lag time}. The most drastic
energy in such circumstances. Second, the feature engineering changes are experienced by features that are not associated
process of a3 may not encompass all the factors that contribute with the AC’s unit energy consumption. This observation
to the prediction, which can include conditions that are not alludes to the possible effect of outdoor environments on
sensed or exogenous factors. These aspects are investigated to internal indoor conditions, which is proven by the features that
fulfil o3. displayed the greatest deviation. In particular, the importance
The good results obtained in the Winter and Fall seasons of relative humidity and temperature and their respective lags
of limited data points undermine the time-based feature en- has inversely switched. This factor can be attributed to changes
gineering technique as a3 successfully extrapolated common in outdoor environmental conditions that are not captured
aspects between all seasons based on lagged versions of in the available dataset, which contributed to the rise in
environmental features. The multi-month seasons such as the energy consumption. The increase in the importance of plug
spring and summer seasons experience some fluctuations in energy consumption showcases the effect of occupants on the
their predictions attributed to the subtle variations in months AC’s energy consumption, which is not part of the captured
or the transition from one season to another. environment.
3) Feature Importance Analysis: The comparison between 4) Warehouse-related Observations: These experiments
various feature engineering approaches and their performance shed light on many aspects that should be integrated into
on various energy consumption bins provides a surface-level the supervised learning process of the building HVAC system,
understanding of the underlying dynamics of the environment. which can be extended to the warehouse environment. First,
To better grasp the radical deviations in performance between the history and future time windows are critical aspects that
summer and the other seasons, it is important to zoom into determine the utility of predictions. The changes in lagged
the contributing factors. Since the feature engineering process features’ importance and their contributions to predictions’
produces a myriad of features, the importance of these features accuracy demonstrate the effect of the defined time window.
can be analyzed to reason the reported results. Fortunately, the Given that spatial configurations dictate the delayed impact of
best a3 configuration involves a tree-based algorithm (Gradient activating AC units, the variations in these configurations in
Boosting), which facilitates the feature importance inference the warehouse environment highlight the importance of finding
process. The equivalent methods for DNNs require a more a suitable time window. Second, the biased prediction results
profound approach involving the calculation of gradients in the obtained with the variation of energy consumption bins have
absence of a feature or a set of features to infer the features’ shown the interference of exogenous factors. It is hypothesized
importance [105]. that conditions such as outdoor environmental conditions and
Before diving into the specifics of the features’ importance occupancy are affecting these predictions as reflected by the
for Gradient boosting, explaining how their respective values feature importance changes. Quantifying the occupancy and its
are calculated is instrumental. The feature importance is a inference method is at the heart of the warehouse challenges.
weighted factor that gauges the utility of a feature in the Additionally, the effect of environmental conditions detailed
boosted trees’ construction. Since the decision trees are formed in its corresponding challenge is manifested in the conducted
from many decision nodes, the feature’s importance is mea- study.
sured by its ability to improve the performance measure. After To demonstrate the viability of the effect of some external
that, the individual importance of each feature is averaged factors on prediction accuracy, some additional experiments
MANUSCRIPT PUBLISHED IN ELSEVIER INTERNET OF THINGS 19
0 $ (
0 $ (
: L Q W H U 6 S U L Q J 6 X P P H U ) D O O : L Q W H U 6 S U L Q J 6 X P P H U ) D O O
6 H D V R Q 6 H D V R Q
] B 6 5 + B
AC1 0.43 0.84
] B 6 G H J &