3.1. OBEQ Query Filter
The transmitting of data is extremely power consuming for wireless sensors. At the beginning of the bounded-error query process, the local database uses sinks to propagate query instruction to its WSN. However, for some queries, it is not necessary to access all the SNs and EDs. A location-based query is one example. When we query only some data from sensors in the “Taipei city” sink, the query should not be sent to sensors in the “New Taipei city” sink. The query filter is proposed to avoid redundant data transmission by analyzing the WHERE clause of the SQL query.
Since SN is responsible for transmitting query instructions to particular sets of EDs, the EDs under one SN contain three situations to be judged by the relation between the WHERE clause in the SQL query and the metadata of EDs.
All EDs under the SN violate the WHERE clause in a query instruction.
Part of the EDs under the SN violate the WHERE clause in a query instruction.
All EDs under the SN are qualified by the WHERE clause in a query instruction.
Therefore, when an SN detects the first situation, the query instruction is discarded. When an SN detects the second situation, the query instruction should be sent to confirm that the target sensors, which are specified in the WHERE clause, will receive the query. For example, a query asks for water-meter data because the EDs under this SN are composed of both watt-meters and water meters. When an SN detects the third situation, the WHERE clause that all EDs are qualified to be acknowledged will be truncated because all EDs should execute the query. For example, a worker of the Taiwan Power Company wants to query all the power data of Taipei city. If all EDs under the SN are in Taipei city, then the “WHERE city = Taipei” clause in the SQL instruction is removed by the SN, and the rest of the query instruction is transmitted.
In the following equations,
represents all the EDs that are qualified in the constraint of the SQL WHERE clause,
represents
SNi,
represents all the EDs under
,
is the bit length of the query instruction
received, and
represents the bit length of the query instruction after
truncates the original query with the query filter. When Equation (1) is true,
abandons the query instruction without forwarding to its EDs. When Equation (2) is true,
truncates the constraints specified in the WHERE clause to reduce the length of the query instruction forwarded to its EDs. Therefore,
is not smaller than
as shown in Equation (3).
In addition,
is used to represent the energy cost of a cluster of EDs under an SN without using a query filter,
represents the power usage for sending a bit, and
is the energy cost of using a query filter. The cost of sending a filter-processed query packet and no-filter one can be deduced as follows.
Thus, certain metadata [
27] should be collected to judge whether a node violates a WHERE constraint or not. The metadata applied in our query filter with the ED_CLUSTER_METADATA clause is shown in
Table 1. To judge if EDs (usually, a cluster of EDs under an SN) are in certain region, our query filter extracts the location from E the D_CLUSTER_METADATA clause and then compares the location within the constrained area specified in the query instruction. Note that each SN must record the locations of its EDs. In addition, an SN has to record the deployed time of EDs. When a query filter judges that EDs are qualified for a specific time interval, the deployed time is extracted and compared with the time interval in the query instruction. The same idea can be applied to sensor types and attribute columns.
3.2. Bounded-Error Query Processing
The bounded-error query process designs an SQL-like language, which is constructed by the SELECT-FROM-WHERE-GROUPBY structure and bounded-error parameter. The bounded-error of each sensor’s data is guaranteed to be limited in the user-specified range of his/her query. DDL depicts the record, fields, and sets of the user data model and connects the conceptual level and the internal level of the database. CREATE, ALTER, and DROP are the three main syntaxes of DDLs. In an OBEQ, DDL is extended to sensor modules and enables users to control sensor modules’ flexibly to manage hardware resources in EDs.
In real applications, the user will need to modify the database constantly due to newly sensed data. However, it is resource-consuming for sensors to keep collecting suspended data while the newly sensed data are redundant. For example, some electric services provided by Taiwan Power Company [
28] are suspended due to a user’s request. Hence, it is necessary to turn on/off the running sensor modules according to the database columns. To determine the relationship between the table columns and sensor modules, the edge-computing local database stores a SENSOR_COLUMN table.
Table 2 shows an example of a SENSOR_COLUMN table and depicts the status of modules on wireless sensors. The corresponding syntactic examples of DDL are shown in
Figure 4.
While the user sends a DDL instruction to the local database, it is separated into two instructions, one for offline databases and the other for online wireless sensors in the WSN. For example, “ALTER TABLE SENSOR_COLUMN DROP COLUMN current” is sent to modify the offline database directly. For WSN, “ALTER TABLE SENSOR_COLUMN” is redundant for wireless sensors because every ED needs to respond to this instruction. To reduce the transmission energy, edge-computing local databases can truncate “ALTER TABLE SENSOR_COLUMN” for its SNs and EDs before sending it to WSN. When an online query instruction arrives at the EDs, it will check the MODULE_COLUMN table, as shown in
Table 3, to turn on/off the sensor and then gather corresponding information so that this table can be initialized before the sensors are deployed in the WSN.
As shown in
Figure 3, the flowchart indicates that users first issue a query with bounded-error parameters. Then, this query is analyzed before being transmitted to the WSN. When a wireless device of either an SN or ED receives a query, it starts to process the query and generates a local query result. Then, finally, the local query results are sent to the local database and merged at the sink. Each sensor data error in a query result is bounded in the range specified by the user from an OBEQ. Meanwhile, one important feature of an OBEQ is that the applied bounded-error compression becomes better as bigger bounded-errors are assigned. Hence, it is important to maximize the bounded-error without damaging the QoS
2 and QoD required for IoT applications.
In our modified SQL instructions, “.be(
τ)” is used to set up the bounded-error for a query result. For example, the query instruction “SELECT Data.be(
τ) FROM SENSORS” allows the query result to have an error no larger than
τ. The results of query example “SELECT voltage.be(2) FROM SENSORS” are shown in
Table 4. It shows the relation of raw data, physical data, and output query results for voltage data in EDs. The query result is allowed to have data distortion no bigger than two. Our OBEQ confirms that the worst scenario of data distortion will not exceed the specified bounded-error range. We assume that a value of 116 is the value from an ED and the exact value of physical data is 116.5, any value between 116.5 ± 2 (e.g., 117) is an acceptable query result.
Again, we use “SELECT Data.be(
τ) FROM SENSORS” as an example for an ED to sense raw data according to the query. The raw data sensed by nodes have already shown a system error
(see
Table 3) due to sensor accuracy. After a local query result is generated, it needs to be compressed before sending tit o the SN for reducing power consumption. The compression for sensor output can be completed in the specified bounded-error
without violating the QoS
2/QoD required in diversified IoT applications. So, eventually, when the local query results are all sent to the local database, the total error
τ of the query result is equal to
plus
, as shown in Equation (7). The value of system error
is related to the hardware equipment the user deploys in the WSN. Since the hardware has been settled, the value of
remains unchanged during the query process. The error value
can be found in the datasheet of the sensor. Since the bounded-error
τ is usually given by users for the required QoS
2/QoD in their IoT applications, the compression error
can be arranged to further reduce the communication cost for query results. Finally, the compression error
can be deduced as follows.
The SQL query of “SELECT voltage.be(2) FROM SENSORS” is used as a demonstration. In this query, voltage.be(2) indicates that all voltage data in the query result should have a data error lower than two. At the time that the data are sensed, a system error is added to a datum, as shown in
Figure 5. The voltage in the physical world is a value between 116 ± 1.5 instead of 116 since the system error
for voltage data is 1.5. The relationship between the raw data and the status of the physical world is shown in
Figure 5. The upper bound and lower bound of the expected value of the status of the physical world are 114.5 and 117.5, respectively.
Because the worst scenario of data error is guaranteed to not exceed the user-specified bounded-error (i.e.,
τ = 2), the bounded-error data offered have to fully cover all the possible value that might happen in the physical world. In this example, if the upper bound is 117.5 for the raw data of voltage 116, according to the system error (
= 1.5), then we can accept any sensor value between 115.5 and 119.5. Where the lower bound is 114.5, we accept any value between 112.5 and 116.5 because the user-specified bounded-error is 2. As shown in
Figure 6, the intersection of the above two ranges is 115.5 to 116.5, which exactly equals 116 ± 0.5, where 0.5 equals the user-specified bounded-error value
that subtracts the system error
. With the example described above, the error bound can be limited strictly as the user expects, and then the power consumption can be reduced further with our previous BESDC [
11].
Our OBEQ method also supports addition, subtraction, multiplication, and division. It is a frequently used function in query processes. Parentheses and “.be()” are used to represent the error bound of a query result. For example, “SELECT (Data*m/d + n).be(τ)” indicates that the error of the calculated data should be limited in τ, where m stands for multiplicand and d stands for divider.
“SELECT (power/1000).be(1) FROM SENSORS” is an example to change the unit of the power column from W to kW. In this example, the bounded-error of the calculated result should not exceed 1 Wh. In WSNs, the calculations can be performed either in EDs or in the local database. How the calculations are performed by either an ED or a local database and how they influence the bounded-error are described below.
So, we use “SELECT (Data*
m/
d +
n).be(
τ)” as an example; if nodes are used to calculate the query result, following the principle of the four fundamental operations, the user-assigned error
τ bounded for sensor data is related to (
m/
d)
due to the multiplication and division with the system error
, then the error of the raw data stays (
m/
d)
after addition and subtraction. After the query process and calculations are performed, the sensor has to compress the local query result before sending it to the SN. Using
as a compression error, our sensor compresses the local query result and sends the compressed data that have bounded-error equal to (
m/
d)
+
to the local database. Hence, we can deduce that our
can be calculated as follows.
If the local database is used to calculate the query result, the local query result that the sensors send to the local database would preserve the bounded-error (
+
). In the local database, the local query result with error (
+
) is calculated, and the error of the local query result becomes (
m/
d) (
+
) after multiplication/division and stays (
m/
d) (
+
) after addition/subtraction. The user expects (
m/
d) (
+
) to be smaller than
τ, so we can deduce the compression error as the following equations.
Using “SELECT (power/1000).be(1) FROM SENSORS” as a demonstration, when EDs are assigned to do the calculation, each power datum has a system error equal to 0.5 (see
Table 3). After the power is divided by a thousand, the system error becomes 1/2000. Then, we calculate the compression error and obtain 1999/2000. After compression, the error bound of the data becomes one to meet the user’s request and then the compressed data are sent to the local database. When the local database performs the calculation, sensors send the local query result with an error equal to (0.5 +
). The local database divides the local query results by one thousand, so now the error bound becomes (0.0005 + 0.001
), which should equal one. Hence, it can be calculated that the bounded-error for data compression to be used in this case is 999.5.
Considering the system error, the real status of the physical world is uncertain, yet can be deduced in a certain range. Using comparison operations, it is easy to judge that the status of the physical world is larger or less than a particular value if the possible range of the physical status is fully larger or less than that value. However, when the possible range covers the compared value, it is impossible to know whether the status of the physical world is larger or less than the value.
We describe three possible approaches to deal with it in data comparisons. Each proposed technique has a different syntax with unique advantages listed below.
Syntax: column_name<?value?
The recall rate is maximized to 100% with a system error. If the range of the physical world covers the value, it is mapped into the query result.
Syntax: column_name<“value”
The precision rate is maximized to 100% with a system error. If the range of the physical world exceeds the value, it will not be mapped into the query result.
Syntax: column_name<value
The range of the physical world is treated as the value of raw data.
The first syntax makes sure that all the data with system error and possibly smaller than the value are listed in the query result without false negatives in comparison. The second syntax makes sure that each datum in the query result is strictly smaller than the value, since the system error may cause false positives in comparison. The third syntax maps the query result by comparing the sensed data with the value without considering the system error.
For example, “SELECT node FROM SENSORS WHERE power < 100” queries nodes whose power usages are with a value smaller than 100. The system error of power according to
Table 3 is 0.5. So, when raw data are 100, the physical status is the range between 99.5 and 100.5. Considering the system error, with this kind of query, it is difficult to determine whether the power usage of the physical world is less than 100 or not.
As illustrated in
Table 5, when the system error is
, the principles of data selection are described as follows: Using the example in the previous paragraph, in the syntax of “power < “100””, we have to avoid mapping those data that are possibly larger than 100 into query result. So, those data that are more than 95.5 are not mapped to a query result. In the syntax of “power < ?100?”, all possible data that are smaller than 100 have to be mapped into a query result. Hence, all the data that are smaller than 100.5 are mapped into query results. In the syntax of “power < 100”, the value of the sensed data in a query result is simply compared with 100.
3.3. Data Aggregation Functions
In an OBEQ, MAX, MIN, SUM, AVG, and GROUP BY, aggregation functions of SQL are also considered. Two syntactic formats in MAX/MIN aggregations are listed below. One restricts the bounded-error of the target column and the other restricts the bounded-error of the aggregated value.
The first syntactic example indicates that the maximum value is allowed to have bounded-error τ. The second syntactic example indicates that every datum, even the maximum value, is allowed to have bounded-error τ. Both syntactic examples return to the same results. Hence, both are executed in the same way. Using the example of “SELECT MAX(Data).be(τ) FROM SENSORS”, sensors firstly process raw data with maximum aggregation. After the MAX value is selected, the error of maximum value data is . After query processing has finished, the local results are compressed and sent to the local database. The total error of compressed data becomes (). On the local database, the result received from the sensors preserves a data error of (), so we can deduce that the compression error for a sensor to use is mentioned above in Equation (8).
The AVG aggregation has two syntactic formats. One restricts the bounded-error of the target column and the other restricts the bounded-error of the aggregated values. Syntactic examples are shown below.
The first syntactic format indicates that every average value in a query result is allowed to have bounded-error τ. The second syntactic example indicates that every raw datum is allowed to have bounded-error τ. In the second syntactic format, after the average logic is operated on a dataset in which each datum has bounded-error τ, the bounded-error of the average value would be τ. Both syntactic examples return the result with bounded-error τ, so both syntactic formats are executed in the same way. The query of “SELECT AVG(power.be(1)) FROM SENSORS” is a demonstration that queries the average power of all the sensor readings. The average value is allowed to have bounded-error 1. Using “SELECT AVG(Data).be(τ) FROM SENSORS” as an example, sensors process the average value on raw data and obtain an average value that has system error . After the local query result is generated, it can be compressed with a compression error of . The compressed query result with error ( + ) is sent to the local database. Users expect the error of the query result to be , so the compression error can be deduced as shown in Equation (8).
The SUM aggregation has two syntactic formats. One restricts the bounded-error of the target column and the other restricts the bounded-error of the aggregated value. Syntactic examples are shown below.
The first syntactic format indicates that every summation value in the query result is allowed to have bounded-error τ. The second syntactic example indicates that all raw data are allowed to have bounded-error τ. In the second syntactic format, after operating the summation logic on a dataset with n raw data, the bounded-error of the summation value of the local query result would be nτ. Each syntactic format generates different results, so they are executed in a different way.
For example, for “SELECT SUM(power).be(1) FROM SENSORS WHERE city = Taipei” and “SELECT SUM (power.be(1)) FROM SENSORS”, the former one indicates that the bounded-error range for the SUM aggregation value is allowed to be one, and the second indicates that the raw data are allowed to have bounded-error 1. In the example of “SELECT SUM(Data).be(
τ) FROM SENSORS”, because the query result is merged by multiple local query results, the error of different local query results has to be added up to merge to the final query result. To ensure that the error of the final result is
τ, each local result can only use part of
τ. There might be M sensors under an SN. If the quantity of the sensors answering this query can be predicted, this number M can be used. For example, if the user wants to know the summation of the power usage of Taipei City, the old dataset can be used to count the number of sensors in Taipei. If the query cannot be predicted, then each bounded-error sensor can use
so that the error of the final query result does not fail to meet the user’s expectations. Hence, the local database would rewrite the query instruction before propagating the query to the sensors. If there are
n raw data in a sensor, the sensor adds up the data and obtains a summation value with system error
n. It is possible that
n is larger than the bounded-error
. If
n is larger than
, it means that the system fails to meet the QoS
2/QoD requirement of the user’s needs. In this case, it is strongly recommended for the users to improve the quality of their hardware. If
n does not exceed
, then we use (
) as a compression error
(as shown in Equation (13)) to compress the local query result.
In the example of “SELECT SUM(Data.be(
τ)) FROM SENSORS”, each piece of raw data is allowed to have bounded-error
τ. If a sensor has N raw data, the summation of these raw data has a bounded-error equal to
. Because each datum is allowed to have bounded-error
, the summation value is allowed to have bounded-error
n. Hence, the allowed compression error
of this sensor is as per the equation below.
The query example of “SUM(power).be(1) FROM SENSORS WHERE city = “Taipei”” firstly calculates how much bounded-error from an ED can be allowed. The local database searches the old data and predicts how many nodes execute the query. If we have two nodes in Taipei SN, each node is allowed to have a one out of two bounded-error. Then, the local database changes instructions to “SUM(power).be(0.5)” and sends it to the WSN. Taking “SELECT SUM (power.be(1)) FROM SENSORS” as a demonstration, when an ED receives “SELECT SUM (power.be(1))”, it aggregates 10 pieces of raw data, and the total system errors will be added up to five. Then, the compression error will be five, according to Equation (14). In this case, the bounded-error of the summation value in the local result is 10.
The GROUP BY statement usually comes with aggregation functions. In the aggregation functions, it is common to merge multiple data into one group, which means many local query results from different sensors may have to be merged together. However, in other cases, such as “GROUP BY node”, local query results are not merged. Hence, the group can be divided into two kinds: one needs cross-node aggregation, and the other does not. If a query contains “GROUP BY node”, it is not necessary to send group by tags to an SN or to allocate bounded-error. Multiple local query results can simply be put together and generate the final result without aggregating the groups again. If a query does not contain a “GROUP BY node”, multiple local query results need to be aggregated from different nodes in order to generate the final query result. To merge local query results, group tags must be sent. In the example of query “SELECT SUM(power) FROM SENSORS GROUP BY HOUR(timestamp), DATE(timestamp)”, the database generates the total power consumption each hour every day. Furthermore, the AVG aggregation function needs additional metadata, which are the size of the averaged raw data. In the local database and SN, the result is merged again according to the group tags and metadata. For cross-node merging, bounded-error allocation is calculated as mentioned above in other aggregation functions.