1 Introduction

Knowledge Graph [1] was first published by Google, and its main function is to improve the quality of answers returned by search engines and the efficiency of users' queries. Due to the characteristics of professional knowledge in the field of medicine, domain knowledge graph contains a large amount of structured knowledge, which makes the application of natural language processing in clinical electronic medical records play a vital role, such as disease prediction [2], entity link [3], correlation feature prediction [4], etc. Data mining refers to the process of searching hidden information from a large amount of data through algorithms. At the same time, data mining technology also provides more possibilities for the research of medical knowledge reasoning based on the semantic association of electronic medical records. In recent years, more and more large-scale knowledge graph have attracted attention, such as DBPEIDA [5], Wikidata [6], Yago [7], Freebase [8], etc. Although these knowledge graph contain tens of millions of entities and hundreds of millions of triples of facts, they are still incomplete compared to the domain knowledge that exists in the real world. Therefore, how to acquire new knowledge from external resources, complete the Knowledge Bases Completion (KBC) of medicine field, deduct a large number of high-quality triples Knowledge, and provide support for correlation feature prediction has become a current research hotspot.

Methods to complete the knowledge graph mainly focus on two aspects: (1) using the internal knowledge reasoning of the knowledge graph to complete the knowledge graph; (2) Extracting new knowledge from unstructured texts to complete the knowledge graph. Unstructured text data The unstructured data in the electronic medical record data contains a large number of symptoms, signs and diagnosis-related data. Combining with the efficient and feasible clinical real world data collection mode, it can be sorted into structured data for analysis, which can better use these information to carry out research.

Lumbar disc herniation is a clinical syndrome caused by the herniated disc tissue compressing the nerve root and cauda equina, manifested as low back pain, radiating pain of lower limbs, numbness of lower limbs, weakness of lower limbs, etc. [9,10,11]. In medicine, it belongs to the categories of lumbago or bi disease, and has strong domain knowledge characteristics. At the same time, lumbar disc herniation has a benign natural course. Most patients with lumbar disc herniation can be improved by conservative treatment [12,13,14], and the success rate of non-surgical treatment is about 80–90% [15, 16] Therefore, non-surgical treatment should be the first choice for patients with lumbar disc herniation without significant nerve damage. Traditional medicine has advantages in non-surgical treatment. It is good at using traditional medicine and acupuncture to treat lumbar disc herniation, which can alleviate the symptoms of patients [17]. This paper adopts a bag-of-words representation model that abstracts text semantics from unstructured data representations, and explores the laws of Chinese medical syndromes. The laws of traditional Chinese medicine and acupuncture treatment of lumbar disc herniation have certain guiding significance for the clinic. This paper proposes A knowledge reasoning TCM-KR method based on electronic medical records to enhance association rules. It extracts information through the fragmented knowledge of medical literature in the field of medicine and the unstructured text of electronic medical records, and uses the method based on graph convolutional network to predict the unknown associations in Viscera, Channel tropism, and Channel distribution.

The task of textual semantic enhancement modeling is to abstract a unified representation from a set of similar texts. The traditional Bag of Words (BoW) model represents text by the frequency of Words. Although it has achieved success in language modeling and text classification, its shortcomings, such as sparsity, neglecting the context information of Words, and neglecting the position information of Words, still have a great impact on natural language processing tasks. To alleviate the noise generated by remote supervision and improve the performance of relationship classification, REHESSION [8] defines some commonly used expression templates for relationships. However, due to the diversity of unstructured text, it is not practical to model text using these specific templates. RLSW [18] proposed a Bag of Distribution (BOD) model to establish a unified expression model for similar texts. It uses a Beta distribution to fit the position of each word in a text set, which can then be represented by a series of Beta distributions. Compared to other approaches, BODs have the ability to generate templates from any set of text, so they are more suitable for schema modeling of unstructured text and providing services for subsequent rule mining. However, BOD only focuses on the information between subject and object, ignoring the information before and after subject and object, which may contain important knowledge.

TransE [19] that knowledge graph correctly in triples fact vectors will satisfy h + r = t, said the head of an entity vector plus vectors should be equal to the tail of entity-relationship vectors, and then define the objective function, the resulting vector representation of entities and relationships. After the entities and relations inside the knowledge graph are transformed into vectors in this way, more triples of facts can be inferred to complete the knowledge graph. However, Trans had difficulty dealing with complex relationships, such as one-to-many, many-to-one, and many-to-many. TransH [20] overcomes the shortcoming of Tran. For each relation, it assumes that it falls on a hyperplane, which is similar to the hyperplane of Tran's model. In this way, the same entity vector has different representations under different relations. Another approach to logical reasoning is to learn from the knowledge graph similar to rel1(e1,e2)∧rel2(e2,e3) → rel3(e1,e3) rules of form. AMIE, for example, [21] based on the Open World Assumption (Open World Assumption, OWA) mining rules, this paper proposes a new method to simulate the negative example, but its search strategy in the use of large knowledge graph is limited, so AMIE + [22] put forward a series of pruning strategy and query rewriting technology, can make the model in the knowledge graph of large mining association rule effectively. RDF2Rules [23] generates association rules by mining Frequent Predicate Cycles (FPCs) in the knowledge graph. However, due to the incompleteness and inaccuracy of the knowledge graph itself, these methods have some potential problems and only work for the entities and relationships existing in the knowledge graph.

The completion of the medicine domain knowledge graph mainly refers to adding new entities, relationships, entity attributes, and attribute values to the knowledge graph. Fragmentation of unstructured medical literature contains a large amount of medicine knowledge and its updating speed is rapid. Therefore, how to extract medicine knowledge from unstructured texts to complete the knowledge graph has also attracted more attention. For example, some relationship classification methods [24, 25] use remote supervision to align the knowledge graph with the natural text, and then use various algorithms to identify the relationships between entities. At present, the knowledge extraction method based on combination extraction is limited by the relationship type. However, such as the literature [26, 27] can only get more similar to humans and coarse-grained entity type, extraction by literature [28] entity type is confined to the entity type hierarchy. Different from the above studies, this paper mainly uses unstructured association rules to build a bridge between knowledge graphs and unstructured text. Using these rules, new triples of knowledge can be inferred directly from unstructured texts, so these rules have the ability to discover and integrate fragmented knowledge from large-scale unstructured texts.

2 Experimental Details of TCM-KR Methodology

In this part, we introduce our proposed method TCM-KR detailed.

2.1 Data Pre-processing

In the process of clinical diagnosis and treatment of traditional medicine, the quality of electronic medical record data is often uneven, with the above characteristics of redundancy and incomplete, which makes it impossible to directly analyze the data or the analysis effect is unsatisfactory. To facilitate subsequent data processing, reduce data analysis time and make model training more effective, medical data are usually preprocessed before model training. It mainly includes the following parts:

2.1.1 Data Cleaning

In general, it includes filling in missing attributes of data records, smoothing data noise, identifying and deleting anomalies or outliers in data, and solving data consistency problems, mainly standardizing data format, clearing abnormal data, correcting incorrect data, and removing duplicate data.

  • Missing values: Missing values are common in most medical data sets, and how well the missing values are handled will directly affect the final results of the model. Common processing methods include the mean value method, median method, interpolation method, and modeling method.

  • Outliers: Mainly can be divided into the following two categories, namely, abnormal points and outliers. Abnormal point processing methods include direct deletion method, distance calculation algorithm (including K-means, KNN, etc.), average replacement method, etc. Commonly used methods to deal with outliers are the model detection method, proximity method, and so on.

  • Noise treatment: Noise is the random error or variance of the measured variable, including the error value or the outlier value deviating from the expectation. Boxed method and regression are mainly used to deal with noise.

2.1.2 Data Integration

To solve the problem of data inconsistency caused by data from different data sources, the data warehouse is established through relevant technologies (such as ID Mapping), and the data from multiple data sources are organically integrated and stored in a unified way.

2.1.3 Data Transformation

By means of smoothing aggregation, data generalization, data normalization, and so on. Each attribute of the data is transformed into a form suitable for data analysis.

2.1.4 Data Reduction

Massive data often lead to the unguaranteed duration of data analysis. However, data reduction technology can be used to obtain the reduction representation of data, effectively reduce the data scale on the premise of maintaining the integrity of the source data, and ensure that the results are the same or almost the same as those before the reduction.

2.2 Construction of Medicine Domain Knowledge Graph

The model is given a series of unstructured text sentences S and a set of facts F, as shown in Fig. 1. These sentences contain entity e ∈ E and the triple fact f(e1,e2) = 〈e1,rel,e2〉 ∈ F corresponding to the entity. The form of the first-order association rules to be mined by the model definition is as follows: (ptn,e1,e2) → f(e1,e2). Among them, ptn is a text mode used to fit S. f(e1,e2) is the triple fact related to e1 or e2. Since ptn is obtained from unstructured text, it is named as unstructured text enhanced association rules. Furthermore, if the knowledge graph contains triple facts about e1 and e2, then more triple facts can be inferred. Therefore, combined with the knowledge in the knowledge graph, the first-order rule can be expanded to the second-order rule: (ptn,e1,e2)∧f(e1,e2) → f′(e1,e2).

Fig. 1
figure 1

Schematic diagram of the construction of knowledge graph in the field medicine

2.3 TCM-KR Method

The overall framework of TCM-KR knowledge deduction method.

The main purpose of this paper is to construct association rules, and triples extracted from unstructured text fact. Due to the diversity of unstructured texts, direct point-to-point matching rules are unrealistic. Generally speaking, similar texts may contain the same triple facts. A natural idea is to cluster similar sentences to mine triple facts, and then integrate these sentences into a unified pattern into the rules. According to the above description, the overall framework of the method in this paper is shown in Fig. 2, which is mainly composed of four parts: relational text clustering, text pattern modeling, unstructured association rule mining, and Dropout clustering loss function.

Fig. 2
figure 2

Overall framework of TCM-KR method

2.3.1 Relational Text Clustering

In general, the expression of electronic medical record texts with the same relationship or attribute is more similar. This paper uses remote supervision to collect similar medical literature texts, specific steps are as follows: (1)Collect entity pairs (e1, e2) corresponding to the pre-defined relationship from the medical literature text collection. (2) Crawl the medical literature corresponding to e1, and match the sentence containing the entity pair in the paper. The matching of entity pairs mainly includes exact matching, synonym matching, partial matching, and pronoun matching, etc. (3) For each sentence, intercept the 3 words before and after e1 and e2 and the words between them as the relational text. Although the expressions of the same relationship are very similar, they may represent opposite or completely different meanings. If only the vocabulary information is used and the vocabulary order is ignored, it is impossible to distinguish between e1 and e2 who is the father. LSWMD proposes an algorithm for calculating the similarity between sentences. The algorithm uses the semantic information and grammatical information of words when calculating the similarity between sentences. The specific formula is as follows:

$$ loc(w_{i} ) = \frac{1}{n} * (i - 0.5) $$
(1)
$$ d(w_{1} ,w_{2} ) = a * ed(w_{1} ,w_{2} ) + (1 - a) * \left| {loc(w_{1} ) - loc(w_{2} )} \right| $$
(2)

where \({w}_{i}\) is the word in the relational text, and ed \((w_{1} ,w_{2} )\) is the Euclidean distance between \(w_{1}\) and \(w_{2}\).

2.3.2 Text Mode Modeling

A cluster is composed of relational texts with similar semantics and syntax. To put these clusters into unstructured association rules, it needs to be expressed as a unified text mode. The traditional bag-of-words model only uses the word frequency information of the word, ignoring the position of the word. The Bag of Distribution (BoD) model can be used to represent a cluster. This method uses the Beta distribution to fit the position distribution of words in a cluster, and then sorts them according to the frequency of the words, and uses the Beta distribution of high-frequency words to represent the cluster. However, the BoD only models the words between the subject and the object, ignoring the words before and after the subject and the object, which may cause important information to be lost. Therefore, this paper proposes an improved BoD (BoD ∗) to model relational text. The specific steps are:1) calculate the set of positions, where each word in the cluster appears in the relational text;2) The position set of each word is fitted with a Gaussian distribution. The BoD ∗ mode corresponding to a cluster can be expressed as follows:

$$ BoD*({\text{c}}) = \left\{ {(\mu_{i} ,\sigma_{i} ,p_{i} )\left| {\omega_{i} \in W_{c} } \right.} \right\} $$
(3)

where \({\text{c}}\) is a cluster, and \(W_{c}\) is all the words that have appeared in \({\text{c}}\). \(p_{i}\) is the frequency of \(\omega_{i}\) in c. \(\mu_{i}\) and \(\sigma_{i}\) are the mean and standard deviation of the set of positions corresponding to word \(i\).

2.3.3 Unstructured Text Enhanced Association Rule Mining

Since each cluster contains relational texts with similar syntax and semantics, the entities contained in these relational texts may have the same triple facts. The triple facts consist of the entity, the attributes of the entity in the knowledge graph, and the corresponding attribute value composition. It should be noted that to find the similarity between the triple facts, the entity pair is replaced with e1 and e2. If only the same triple fact is used as the subsequent of the rule, the number of rules may be too small. Therefore, support and confidence can be calculated for each rule, and rules with support and confidence greater than a preset threshold are retained. For the calculation of support and confidence, inspired by the idea of association rule mining algorithm, the triple fact corresponding to each relational text in the cluster is expressed in the form of event. To correspond to the defined first-order rules and second-order rules, frequent one-term expressions and frequent binomial expressions are mined from the transaction set. Frequent one-term expression \(f\) mined for each cluster. The rule BoD ∗ (c) → f will be added to the first-order rule set. Similarly, the frequent binomial (f,f′) mined for each cluster. The rules BoD ∗ (c)∧f → f′ and BoD ∗ (c)∧f′ → f will be added to the second-order rule set. The calculation formulas for the support and confidence corresponding to the first-order rule (BoD ∗ (c) → f) are as follows:

$$ \sup (r) = \frac{{\left| {\left\{ {t\left| {f \in t\& t \in T} \right.} \right\}} \right|}}{\left| T \right|} $$
(4)
$$ {\text{conf}}(r) = \frac{{\left| {\left\{ {t\left| {f \in t\& t \in T} \right.} \right\}} \right|}}{{\left| {\left\{ {t\left| {f \otimes t \ne \emptyset \& t \in T} \right.} \right\}} \right|}} $$
(5)

where T is the set of all transaction sets, and t contains some triple facts; \(f \otimes t \ne \emptyset\) indicates that the attribute corresponding to f must exist in t. The calculation formulas for the support and confidence corresponding to the second-order rule (BoD ∗ (c)∧f → f′) are as follows:

$$ \sup (r) = \frac{{\left| {\left\{ {t\left| {f,f^{^{\prime}} \in t\& t \in T} \right.} \right\}} \right|}}{\left| T \right|} $$
(6)
$$ {\text{conf}}(r) = \frac{{\left| {\left\{ {t\left| {f,f^{^{\prime}} \in t\& t \in T} \right.} \right\}} \right|}}{{\left| {\left\{ {t\left| {f^{^{\prime}} \otimes t \ne \emptyset \& t \in T} \right.} \right\}} \right|}} $$
(7)

Rule mining depends on its corresponding support and confidence. The support and confidence of some rules may have higher valu3es in more fine-grained clusters. Therefore, to mine more and more fine-grained rules, this paper uses a top-down hierarchical clustering algorithm to generate clusters of different granularities. When performing rule mining in clusters of different granularities, the child clusters can form new rules by inheriting the parent cluster rules. When there are duplicate consequence, just keep one.

2.3.4 Dropout Clustering Loss Function

Due to the unique adhesive characteristics of texts in the field of medicine, this paper designs a clustering loss function based on the adjacency matrix Dropout to enhance the generalization ability of the model. The lead matrix Dropout continues to divide the training set Train-Set into two subsets, which are the training base set Train-Set-Base and the training target set Train-Set_Target. The lead matrix A' used by GCN is only composed of Train-Set-Base The positive samples used in training only come from Train_Set_Target. This mode is inspired by the dropout in the forward propagation of the neural network, which is equivalent to randomly disconnecting the original lead matrix A with a probability of p. The study uses P = 0.5. In the training mode, Train_Set_Base and Train_Set_Target will be randomly divided again after each round, so that the learner can adapt to learning in different networks with missing connections. In the test mode, the lead matrix dropout is turned off, and the complete lead matrix A composed of the edges of all the training set Train-Set is used for prediction, but the weights of all edges will be scaled by a factor of 1 to p.

3 Results

3.1 Data Set

In this study, the method was verified on three public data sets, namely: CG, PC and MLEE. The CG data set focuses on the extraction of cancer-related events, including effects at the molecular level, cell tissue level, and organ level. The PC data set focuses on reaction events related to cellular molecular channels. The MLEE data set includes multiple levels of biological tissue events, from the molecular level to the organ system level. All the above data sets have provided entity tags, so the goal of this task can be focused on event extraction. The relationship involved in the experiment is that the electronic medical record of medicine contains entity pairs and many relationships. The content of the electronic medical record page corresponding to the entity is used as the entry semantic text resource. The data set is divided into a training set and a test set, with a ratio of 7:3.

This paper uses hierarchical clustering to mine multi-granularity rules. Because in the experiment, it is found that if only one level of clustering is used, there will always be a cluster that contains a large amount of relational text, and some clusters contain very little relational text. With the deepening of the clustering level, each cluster contains fewer and fewer sentences (transactions), which makes the cluster expression ability not strong, so the cluster depth should not be too deep. In the experiment, the cluster depth is set to 3.

3.2 Performance Comparison of Each Method

We use recall, accuracy, and F1 scores to measure performance. Table 1 shows the performance comparison between the method in this chapter and other existing best methods on the CG, PC and MLEE data sets. The baseline model RelAgent [29] is a rule-based biological event extraction system using linguistic analysis results. NCBI [30] uses an approximate subgraph matching method. Zhou and Zhong [31] adopted a semi-supervised learning method with unlabeled prediction. Both TEES [32, 33] and EventMine [34] are SVM-based pipeline pattern recognition methods, using a rich feature set compiled manually. TEES CNN [35] is an upgraded version of TEES, adding CNN. Wang et al. [36] And Li et al. [37] Toddler uses the convolutional network method.

Table 1 Performance comparison of each method on CG, PC and MLEE tasks

As shown in Table 1, the method in this paper has achieved the highest F1 scores on all three data sets, which are 59.11% of CG, 56.66% of PC and 62.25% of MLEE, which shows the effectiveness of this method and its generalization. performance. This method has achieved the most significant performance improvement on MLEE tasks, and the least obvious improvement on PC tasks. In all tasks, the accuracy is significantly higher than the recall rate, which may be due to the relatively insufficient training set that cannot cover a large number of highly variable event patterns. The TCM-KR training error curve is shown in Fig. 3.

Fig. 3
figure 3

TCM-KR training error curve (30 rounds)

4 Discussion

4.1 The L1 Association Prediction of Lumber Intervertebral Disc Prolapse

The classified the syndromes of lumbar intervertebral disc prolapse, namely, blood stasis syndrome, cold-dampness syndrome, excessive heat syndrome, liver and kidney deficiency syndrome. However, this classification has not been widely adopted in clinics and a common consensus has not reached in the field. Thus, it is hard to normalize syndrome differentiation on lumbar intervertebral disc prolapse in clinical practice. Normally, the clinicians take data obtained by the four TCM diagnostic methods as a reference when diagnosing the syndrome. As for great individual differences of those results in different patients, the practical classification of TCM syndrome is rather flexible and diversified and it is unlikely to simply apply the above four syndromes. As a result, many other syndromes recorded in the Electronic Medical Record Database, including kidney deficiency syndrome, damp-heat syndrome, qi stagnation, and blood stasis syndrome, wind-cold dampness stagnation, obstruction of phlegm, and blood stasis syndrome. In these complex types, it is possible to find certain rules of high-frequency syndromes.

According to the statistics in Table 2, the frequency of "qi stagnation and blood stasis syndrome, deficiency of liver and kidney syndrome, cold and dampness syndrome, and deficiency of kidney syndrome" written in electronic medical records was obviously higher than that of other syndromes. Explanations on relations between syndrome differentiation and relevant viscera of each individual were carefully recorded in each record. The weight of each viscera that appeared in the electronic medical records can reflect the relationship between syndromes and viscera. Take “qi stagnation and blood stasis syndrome” as an example. The liver controls blood and qi and holds the pivot of qi movement. In TCM theory, the liver controls conveyance and dispersion and trends to spread out freely and actively. With the depression of liver qi, disorders in qi movement will lead to qi stagnation, as qi is the commander of blood, and blood circulation depends on qi promotion, which will finally cause blood stasis. The heart, lung, and spleen have a certain connection with the syndrome as these organs are involved in the generation and activity of qi and blood. However, the live is the most affected viscera for qi stagnation and blood stasis syndrome, as its weight is much higher compared with others. After collecting and analyzing all the high-frequency syndromes, we found that the weight of liver and kidney is obviously higher than that of other viscera and thus made a conclusion that the liver and kidney are the most closely related organs, that is to say, the dysfunction of liver and kidney may work in the pathogenesis of lumbar intervertebral disc prolapse.

Table 2 Results of L1 prediction associated with lumber intervertebral disc prolapse

4.2 The L2 Association Prediction of Lumber Intervertebral Disc Prolapse

The physicians choose proper Chinese medicinal to formulate a prescription according to data of the patients collected with the four diagnostic methods, but differences in individuals and medication principles of each physician result in flexible and unfixed medicinal combination in prescriptions even when two patients diagnosed as the same syndrome. Based on the theory of viscera manifestation and channel-collateral, channel tropism indicates the affinity and therapeutic effect of a certain Chinese medicinal on the corresponding viscera and channels, which is helpful to guide the clinical application of medicinal. However, the final calculation results on the frequency of channel tropism in the electronic medical record may be inconsistent with the actual situation. For instance, pubescent angelica root (Angelicae pubescents radix) enters the Kidney and Gallbladder Channel, and notopterygium rhizome and root (Notopterygii rhizoma et radix) enters the Gallbladder and Kidney Channel with the same occurrence frequency. The frontier the order of the channel is, the stronger the affinity and therapeutic effect of the medicinal is. However, we studied the channel tropism by our research methods and we found that the weight of the kidney channel is higher than that of the bladder channel for pubescent angelica root, and the bladder channel is higher than that of the kidney channel in notopterygium rhizome and root. According to the statistics in Table 3, it reflects the relationship between the medicinal used in the electronic medical records for the treatment of lumbar intervertebral disc prolapse and their channel tropism. We concluded that the weight of the liver channel and kidney channel is much higher than that of other channels, suggesting Chinese medicinal entering these two channels plays a critical role in treating lumbar intervertebral disc prolapse.

Table 3 Results of L2 prediction associated with lumber intervertebral disc prolapse

4.3 The L3 Association Prediction of Lumber Intervertebral Disc Prolapse

To clarify the potential relationship among commonly used acupoints, affected viscera, and involved channels for treating lumbar intervertebral disc prolapse, our study focused on the acupoints on the Twelve Regular Channels which are directly related to viscera s and excluded acupoints on the eight extra channels and extraordinary acupoints, for these have a certain indirect correlation. Most acupoints are simply located on a certain channel, while some others are on several channels. For example, Fengshi (GB 31) is only located on Gallbladder Channel; Feiyang (BL 58) belongs to the bladder channel but it also connects the kidney channel with the interior-exterior correspondence. Take Sanyinjiao (SP 6) as another example. On the Spleen Channel, it is also an intersecting point of the Spleen, Liver, and Kidney Channels. The Liver and Kidney accounted for a partial weight but less than that of the Spleen Channel. Because only analyzing the channels, where the acupoints are located while neglecting the weight of other viscera will lead to the deviation between the analysis results and clinical application, our research took all the channels involved pertaining to the acupoints applied into consideration, thus comprehensively and obviously analyzing the relationship between the acupoints and their channels.

It can be concluded from the statistics in Table 4 that the weight of the Gallbladder and Bladder Channel, where the commonly used Acupoints for the disease are located is obviously higher than that of other channels, indicating that Acupoints on these two channels are likely to be superior to others.

Table 4 Results of L3 prediction associated with lumber intervertebral disc prolapse

5 Conclusion

In this paper, we analyzed and predicted the correlations on syndromes-viscera, medicinal-channel tropism, and acupoints-channel distribution after extracting information from literature and unstructured text data from electronic medical records, with knowledge reasoning method based on unstructured text-enhanced association rules. We found that the dysfunction and disorder in the liver and kidney were the key pathogenesis for all syndromes of lumbar intervertebral disc prolapse. The Liver and Kidney Channel are crucial in channel tropism for medicinal, and the Gallbladder and Bladder channels are the most common ones, where acupoints distribute. There is a certain connection between these four viscera channels: the liver and the bladder form an exterior–interior relationship by mutual connections of their meridians and the same to that of the kidney and the gallbladder. This potential rule will help us predict the possible treatment principle that we may focus on the improving dysfunction of the liver and kidney and the Liver Channel and Kidney Channel and channels sharing the interior–exterior relationship take priority in selecting specific Chinese medicinal and acupoints. In the future, we will incorporate other data types in EHR, adopt medical graphs as external knowledge, and explore effective ways to provide more interpretability in health event predictions.