1. Introduction
The scholar “Nakamoto” proposed Bitcoin [
1], which has multi-center characteristics, is tamper-proof, and has traceability. Bitcoin is a digital currency built on a P2P [
2] network that uses asymmetric cryptography to secure all links. With the continuous development of digital currencies, blockchain, as the underlying technology of digital currencies, has been gradually extended from the financial field to non-financial fields such as healthcare, the Internet of Things, and edge computing [
3,
4].
As a distributed system, the blockchain needs to ensure a consistent consensus among all nodes in the system. Consensus algorithms are an essential part of blockchain technology and directly affect the system’s operational efficiency and fault tolerance [
5]. The current mainstream consensus algorithms include POW, POS [
6], and PBFT [
7].
In the actual application of the blockchain system, it is found that the node throughput of the PBFT algorithm mechanism is poor and cannot meet the performance requirements of the real scene. In [
8], a scalable PBFT-based multilayer consensus mechanism is proposed, nodes are hierarchically grouped into different layers, and intra-group communication is restricted. In [
9], a parallel sequencing framework called SAREAK is proposed; it divides the service state to exploit parallelism during protocol and execution, ultimately increasing throughput performance by a factor of two in half the latency time.
A grouping method is used to divide the nodes based on their attributes. Nodes are divided into groups by a clustering algorithm, with intra-group consensus and then group-to-group consensus. In [
10], seed nodes are used to center self-clustering into several subgroups, and the consensus is reached within the group by optimizing Byzantine algorithm election, and the global consensus is accomplished jointly by the agents of each group. In [
11], the K-medoids clustering algorithm is used to cluster and hierarchically classify the large-scale network nodes involved in blockchain consensus based on characteristics. When appropriate clustering features are chosen to judge the similarity between nodes, the number of communications for the consensus process can be reduced by three orders of magnitude. In [
12], the RBFT consensus algorithm combined with an improved Raft mechanism is proposed to elect leaders to form a committee for PBFT consensus.
Using a reputation model, nodes with high node trust are selected to act as master nodes, or nodes with high confidence are chosen to become consensus groups instead of all nodes participating in consensus. In [
13], the advantages of the POS algorithm and PBFT algorithm are combined to propose a hybrid consensus algorithm for blockchain, which is divided into two stages: classification and witnessing. The number of consensus nodes is reduced to a constant value by verifiable pseudo-random ordering, and transactions are performed between nodes. In [
14], the PBFT algorithm is optimized by thresholding, the master node is selected by a voting mechanism, and the node with the highest ranking becomes the notary node.
Most of the blockchain PBFT algorithm optimizations proposed by scholars revolve around simplifying the process as well as introducing other mechanisms. This type of approach optimizes the traditional PBFT algorithm but reduces its security and reliability. The attribute grouping method and the reputation model are used to optimize the PBFT algorithm to a certain extent, but there are some problems. The grouping method uses attribute division to ensure that nodes do not communicate directly with each other. However, when the network nodes reach a certain level, they face the problem of high network overhead. The trust model has no penalty mechanism, and when a node with a high trust level makes a mistake, there is no ability to correct it, affecting the security of the algorithm.
In this paper, we propose an improved CART-based PBFT algorithm to evaluate and classify all nodes using CART decision trees [
15]. The dichotomous method and integral mechanism are used to identify nodes with high trustworthiness, reduce the probability of non-honest nodes acting as master nodes, and improve the reliability of master node selection. To avoid the PBFT consensus algorithm consuming a lot of resources when broadcasting messages, the communication volume of the consensus process is reduced by assigning voting weights to the nodes. Network-wide consensus is performed when the voting weights of consensus nodes reach half of the total consequences to achieve consensus performance.
2. Design of C-PBFT Consensus Algorithm
The C-PBFT algorithm consists of three parts: weighted impurity variables, integral grouping, and voting mechanism. The attribute division uses weighted impurity variables to select the best splitting point to ensure the accurate division of the sample. Point grouping shortens the time for system operation to enter a virtuous cycle by ranking nodes in terms of effectiveness and prioritizing nodes with high trust to become consensus nodes. A voting mechanism is used to optimize the threshold parameters and reduce the mass propagation of messages in the network. The consensus process is shown in
Figure 1, and the specific consensus process steps are as follows:
First, the sample is divided by weighting impurity variables, and the CART algorithm makes the nodes continuously subdivided. Secondly, a point-grouping mechanism is designed, with the weighted impurity variables of node attributes serving as the basis for scoring the initial points. All nodes will be assigned to consensus nodes, candidate nodes, or alternate nodes, and those with high points will become consensus nodes. Finally, when a network-wide message is published, the consensus node participates in consensus on behalf of all nodes and collects confirmation messages from other nodes. When the confirmation message reaches the voting threshold, the node updates the ledger and completes the consensus. When the confirmation message does not reach the voting threshold, the errant node is dealt with through a penalty mechanism, and a new consensus node is elected for consensus.
2.1. CART Algorithm Improvement
The CART algorithm uses the idea of dichotomy to divide the samples into multiple samples recursively. The classification attribute criteria are divided according to the Gini value metric of each attribute. Combined with blockchain consensus-specific scenarios, when performing node attribute classification, the conditional characteristics between nodes can affect each other, thus reducing the accuracy rate. In order to improve the classification accuracy, the concept of weighted impurity variable is introduced.
When the number of attribute categories is
C and
is the probability of attribute value
i in node attribute
A, the Gini impurity of node
A is
When the attributes in a node belong to the same class or there is no sample division, then this node is the root or leaf node. If the above conditions are not met, then bifurcation is performed based on the attributes and attribute values of the sample. If the current node is divided into D1 and D2 nodes based on the attribute value of attribute
and the proportion of D1 in
is m and the proportion of D2 in
is
n, sample
is divided into D1 and D2 sub-nodes. The change amount is
denotes the sum of the effects of the other attributes of attribute
A on the impurity variables,
d indicates the number of attributes
, and
denotes the importance of each conditional attribute;
. Then the weighted impurity variable is
From the formula, the larger the attribute A and the other attribute division variables, the higher the purity of the child nodes after division. The
is used as the selection metric for attribute value division, and the maximum value of the attribute complexity variable is selected as the best splitting attribute. The algorithm design process is as follows (Algorithm 1):
Algorithm 1 CART Algorithm |
Input: Samples, Threshold of Gini coefficient Output: decision tree 1: Create root node D; 2: If the samples belong to the same category or only one sample remains then 3: Return to the decision subtree; 4: else if calculate sample weighted impurity change 5: Select the characteristic value with the minimum Gini coefficient; 6: if Number of consensus node N based on credit rating: 7: Calculate the value of and detect the ; 8: end 9: end 10: end 11: According to the optimal eigenvalue, the dataset is divided into D1 and D2 to 12: establish the left node and right node of the current node; 13: The integral grouping mechanism is an essential element of the C-Use the above steps recursively for the left and right sub-nodes to generate a 14: decision tree; 15: Return to the root node to continue execution; |
2.2. Point-Grouping Mechanism
The integral grouping mechanism is an essential element of the C-PBFT algorithm improvement, which lays the foundation for the election of consensus nodes. The node scoring mechanism of the C-PBFT algorithm is mainly divided into initial node points and point rules.
In the initial state, all nodes are scored and ranked according to their combined behavior. Blockchain network nodes are divided into consensus nodes, candidate nodes, and alternate nodes, and the ranked nodes are grouped in the ratio of 2f + 1, f/2, f/2. Consensus nodes represent all nodes and complete the consensus process of the system. The candidate nodes keep the consensus results and do not participate in the system consensus process. Consensus nodes take precedence as master nodes. Candidate and alternate nodes do not participate in consensus, but only synchronize the information updates verified by consensus nodes to the blockchain ledger maintained by themselves. In practice, the ratio of candidate nodes to alternate nodes is adjusted according to system requirements, but the number of consensus nodes is not less than 2f + 1, with f being the maximum number of tolerant nodes.
Newly joined nodes cannot become consensus nodes; they need to be authorized and scored by the consortium, and then the new nodes are assigned the corresponding level. The consensus node score needs to be greater than 9. Candidate nodes with a score between 8 and 9 are candidates, and alternate nodes consist of a score less than or equal to 8. Nodes have three different states, and nodes in different states can be interchanged, with the following rules:
(1) Extra point rule: For each consensus completed by a node, 0.02 is added to the points of nodes with more than 9 points, 0.1 is added to the points of nodes with points between 8 and 9, and 0.2 is added to the points of nodes with points below 8. The maximum number of consensus node points is 10, and the number of points will not increase after reaching 10.
is the pre-consensus integral, and
is the post-consensus integral.
(2) Penalty rule: If an alternate node makes a mistake, the node points will be reduced by 0.5. If the consensus node is wrong, the node points directly change to 8, and the wrong node is converted to a candidate node. The candidate node with the highest points is converted to a consensus node, the node points change to 9. If the candidate node is wrong, the wrong node is converted to a candidate node, and the node with the highest score of the candidate node is converted to a candidate node, and the score of the node becomes 8.
As shown in
Figure 2, the nodes are sorted using the point mechanism, and the nodes with high trust degree have 2f + 1, in the order of {0, …, 2f} from high to low according to the consensus node number. The first N is selected as the master node from the node with a high trust degree, and the master node is chosen with points as the reference standard to ensure the stability of the master node.
2.3. Voting Mechanism Strategy
The purpose of reducing the communication overhead of the blockchain network is achieved through voting by a small number of consensus nodes with a high level of trust. The higher the point ranking node, the greater the weight of the voting weight, reflecting the variability among consensus nodes.
Consensus nodes need to interact two by two to determine the master node, using the node number as the election credential, and the nodes broadcast their respective credentials to the consensus node. Each node will compare the credential with its own credentials and choose the best credential to update.
When the number of consensus nodes of the blockchain network C-PBFT algorithm is
N, node
k reflects the correctness and responsiveness in the form of voting weights, denoted as
, when the consensus verification message is consistent,
. The higher the consensus node ranking in the point mechanism, the greater the voting weight.
The traditional PBFT algorithm needs to collect confirmation messages with 2/3 of the total weight to complete consensus. The integral grouping mechanism ensures the reliability of the consensus nodes and therefore reduces the voting threshold of the consensus nodes. To reduce the consumption of network resources and to improve consensus efficiency, set the voting weight threshold to half of the total weight. This reduces the bandwidth consumption of webcasts for algorithmic performance improvement. The sum of the consensus node voting weights in the C-PBFT algorithm is denoted as
2.4. C-PBFT Consensus Process
As shown in
Figure 3, the C-PBFT algorithm consists of four parts: the initialization phase, the pre-preparation phase, the preparation phase, and the submission phase. The C-PBFT algorithm maintains the initial three-stage step but adds an initialization process before the three-stage protocol. Since the number of all consensus nodes is fixed during the operation of the PBFT consensus algorithm, there is no support for nodes to join or exit freely. Therefore, the initialization phase only starts when a new node joins or an old node exits.
(1) Initialization phase: The nodes in the network are accurately classified by an improved CART algorithm model, node numbers are sorted and levels are assigned based on the integration rules, and voting weights are assigned for the selection of the master node. If an error occurs in the master node P in view N, the switch of the master node is processed according to the penalty rule.
(2) Pre-preparation phase: The master node receives the proposal n and propagates it to Replica1, Replica2, and Replica3. When the slave node receives a prep message from the master node, it begins to verify the legitimacy of the consensus node. If the verification passes, a ready message with its own voting weight is propagated to the whole network and written to the message log.
(3) Preparation phase: The consensus node collects preparation messages from other nodes, and if all voting weights are greater than the threshold value of total weights, the node enters the submission phase. Otherwise, the message of failed consensus verification of consistency is sent to the client, the ordering of points is performed again, and the master node is selected.
(4) Commit phase: The consensus master node disseminates an acknowledgment message with its own voting weight to the whole network and collects acknowledgment messages from other nodes. If an acknowledgment message exceeds the voting weight threshold, the node enters the commit state and executes the client’s request.
(5) Reply phase: Finally, the results are returned to the client and the whole network begins to update the books.
When the consensus of the consensus nodes is over, the master node writes the consensus data to the blockchain by packing them into blocks. If the master node does not generate a data block within the specified time or the feedback information is different from the majority of nodes, it is considered a Byzantine node. At this point, a new master node is selected according to the node penalty mechanism and continues to complete the consensus.
3. Analysis and Experiments
To address the shortcomings of the PBFT consensus algorithm, such as non-honest nodes acting as master nodes corrupting the consensus process, serious communication overhead, and low fault tolerance, analysis of latency, transaction throughput, and system fault tolerance is performed to compare the results before and after optimization.
3.1. Time Delay Analysis
Latency is the time required from the submission of a transaction request by a node to the end of consensus and can measure the performance of the blockchain network and the efficiency of the consensus algorithm. The lower the transaction latency, the shorter the confirmation time for that transaction, the better the performance of the blockchain system, and the more efficient the consensus. The final confirmation time is measured by simulating different transaction volumes with the same number of consensus nodes.
represents the delay,
represents the time when the transaction is generated, and
represents the time when the transaction is confirmed.
As shown in
Figure 4, according to the change in the consensus efficiency of the algorithm, in order to ensure a low consensus delay, each layer has a maximum of eight nodes. The block size of (0, 900] is selected for the simulation test, the average value under different states is taken as the final value of the consensus delay of the state, and the delay of blocks of different sizes is obtained.
The transaction delay of the C-PBFT and PBFT algorithm increases with the increase in block size. The bandwidth of the blockchain network remains unchanged, but the volume of requested transactions increases, and the times for broadcasting and hashing of consensus node messages are extended. The transaction time in the optimized PBFT algorithm will be slightly lower than that in the traditional PBFT consensus algorithm. Although PBFT and C-PBFT algorithms require three stages to complete the consensus, the optimization algorithm introduces voting weights to complete the distributed consensus. Consensus is reached through a small number of consensus nodes with high points to improve the operational efficiency of the system.
3.2. Throughput Analysis
Throughput (TPS) refers to the number of events the system can handle per unit of time and is a key performance measure for consensus algorithms. If a blockchain has a high throughput of transaction data, the consensus algorithm used for that blockchain has good performance. The blockchain system allows transactions to be processed quickly, and the system is more efficient. Comparison experiments were conducted using different time intervals;
is the transaction completion time, and Transactions is the total amount of transactions processed within the transaction completion time.
This experiment takes the time interval as a variable, simulates eight blockchain systems, and conducts an investigation every 5 s within [5, 40] s. The average value of the experimental results is taken as the throughput value under different numbers of nodes.
Figure 5 shows the throughput test results of the C-PBFT and PBFT algorithms.
As the time interval increases, the throughput of both C-PBFT and PBFT algorithms shows an upward trend. The larger time interval means an increase in the amount of transaction data in the newly generated blocks. Using the CART algorithm model division and integral rule sorting, the consensus node with a high trust degree is given priority to serve as the master node, and dishonest nodes are prevented from serving as the master node. Distributed consensus is accomplished through a small number of highly trusted consensus nodes, thereby reducing the waiting time for client transaction request validation. Throughout the process, the throughput of the C-PBFT algorithm is always greater than that of the PBFT algorithm, which can process more transactions per unit of time. It is suitable for alliance chain environments with higher throughput requirements.
The transaction throughput of the blockchain system is an essential criterion for quantifying the system’s efficiency. The optimized algorithm effectively reduces the transaction delay and improves the throughput. A TPS comparison of this optimized algorithm and other consensus algorithms is shown in
Table 1.
3.3. Fault Tolerance Analysis
The Byzantine consensus algorithm can tolerate up to f, where the value range of |R| is [4, 100]. The optimized algorithm uses the voting weight as a reference standard, and the entire network can reach a consensus if it exceeds the threshold.
In order to verify that the C-PBFT consensus algorithm has more significant advantages in fault tolerance, the transaction throughput and error node relationship after the test model runs stably. Eight consensus nodes are deployed on the blockchain simulation platform, and the error nodes are simulated as 0, 1, 2, 3, 4, and 5 in turn. The average of 10 experiments is taken to compare whether the PBFT consensus algorithm can agree on consensus before and after optimization and to find the maximum value of error-tolerant nodes. The experiment is shown in
Figure 6.
Based on the experimental results, we know that when the Byzantine node is within 3, the throughput is within the normal range and the system can operate normally. When the number of error nodes reaches or exceeds three, the throughput becomes 0, and the consensus cannot be completed. However, in the C-PBFT algorithm, eight consensus nodes have more than three error nodes before the TPS becomes zero. A threshold is used for consensus, and if half of the votes are cast, a deal is struck. Compared with the traditional Byzantine consensus algorithm, the optimized Byzantine consensus algorithm has better fault tolerance. Consensus nodes are selected based on node points, and it is always guaranteed that nodes with greater bandwidth and more stability are used as consensus nodes, which can run better in data transactions.