¹¹institutetext: Department of Information Science and Technology
College of Emergency Preparedness, Homeland Security and Cybersecurity
University at Albany, SUNY, NY, USA
¹¹email: [email protected], ¹¹email: [email protected]
²²institutetext: Department of Psychiatry
Norton College of Medicine
Upstate Medical University SUNY, NY, USA
²²email: [email protected]

A New Perspective on ADHD Research: Knowledge Graph Construction with LLMs and Network Based Insights

Hakan T. Otal 11 Stephen V. Faraone 22 M. Abdullah Canbaz 11

Abstract

Attention-Deficit/Hyperactivity Disorder (ADHD) is a challenging disorder to study due to its complex symptomatology and diverse contributing factors. To explore how we can gain deeper insights on this topic, we performed a network analysis on a comprehensive knowledge graph (KG) of ADHD, constructed by integrating scientific literature and clinical data with the help of cutting-edge large language models. The analysis, including k-core techniques, identified critical nodes and relationships that are central to understanding the disorder. Building on these findings, we developed a context-aware chatbot using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), enabling accurate and informed interactions. Our knowledge graph not only advances the understanding of ADHD but also provides a powerful tool for research and clinical applications.

keywords:

ADHD, Knowledge Graph, Network Analysis, Large Language Models, Retrieval-Augmented Generation

1 Introduction

ADHD is a neurodevelopmental disorder characterized by persistent patterns of inattention, hyperactivity, and impulsivity that are disruptive and inappropriate for an individual’s developmental level. ADHD is one of the most common psychiatric disorders in children, with symptoms often continuing into adulthood [8]. The disorder has a complex etiology, influenced by a combination of genetic, neurobiological, and environmental factors. Despite extensive research, the precise mechanisms underlying ADHD remain poorly understood, partly due to its heterogeneous nature [8, 11].

The complexity of ADHD is reflected in its diverse symptomatology, comorbidities, and the wide range of cognitive, behavioral, and social outcomes observed in affected individuals. Traditional approaches in ADHD research have typically focused on identifying specific deficits or abnormalities within isolated domains, such as neuroimaging or genetic studies [16]. While these studies have provided valuable insights, they sometimes fail to capture the intricate interconnections between different biological, cognitive, and behavioral aspects of this medical condition.

In parallel, network analysis, rooted in graph theory and extensively utilized across disciplines such as sociology, biology, and computer science, offers a robust framework for examining complex systems. By representing these systems as networks composed of nodes (e.g., brain regions, genes, symptoms) and edges (relationships between them), network analysis facilitates the exploration of the intricate interactions and dependencies within the system as a whole. This approach is particularly advantageous for studying multifactorial conditions like ADHD, where the interplay between various biological, genetic, and environmental factors is crucial to understanding the disorder’s etiology and manifestation.

Recently, knowledge graphs (KGs) have emerged as a critical tool within the domain of network analysis, particularly with the rapid advancements in Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) systems. A knowledge graph is a structured representation of information that organizes data into triplets, consisting of a head entity, a relationship, and a tail entity. These triplets form a graph structure where entities are nodes, and their interrelationships are edges, allowing for the representation of complex, multi-hop interactions within a given domain[13] [12]. In the context of ADHD research, KGs provide a powerful means to systematically integrate and represent diverse sources of data, including genetic information, neuroimaging findings, and clinical symptoms, thereby offering a comprehensive view of the disorder.

Moreover, KGs have recently become increasingly valuable in addressing one of the significant challenges associated with LLMs: the problem of hallucination. Hallucination refers to the generation of outputs by LLMs that, while coherent, are not grounded in factual data. KGs mitigate this issue by serving as a structured, evidence-based foundation for LLM outputs[10]. By aligning LLM-generated content with the factual data embedded in KGs, it is possible to significantly reduce the occurrence of hallucinations, thereby enhancing the reliability and validity of AI-generated insights. This is particularly crucial in the medical and psychological domains, where the accuracy of information can directly impact clinical decision-making and patient outcomes.

The primary objective of this paper is to address the need for a comprehensive and structured representation of ADHD knowledge by leveraging Large Language Models (LLMs) to construct a multimodal knowledge graph specifically focused on ADHD. This study seeks to integrate information from a variety of sources, including scientific literature, clinical data, and expert knowledge, to create a rich and interconnected representation of ADHD. By systematically organizing and analyzing this knowledge using network-based approaches, the paper aims to achieve several key goals:

•

Build a Ground-Truth Evidence Base: Collect and validate expert-curated data to establish a reliable foundation for the ADHD knowledge graph. This evidence base will serve as a robust resource for downstream analysis and support further research and clinical applications.
•

Support Decision-Making and Clinical Practice: Demonstrate how the ADHD knowledge graph can be utilized as an evidence-based Retrieval Augmented Generation (RAG) system for an ADHD-Expert LLM. This application aims to create a ‘proof of concept’ expert system that shows how curated evidence has the potential to educate users, enhance decision-making and patient care by providing patients, clinicians and researchers with accurate, well-structured, and comprehensive information about ADHD.

2 Methodology

2.1 Data Collection and Preprocessing

This paper’s foundation rests substantially on the work of Stephen V. Faraone, Ph.D., a Distinguished Professor at SUNY Upstate Medical University and a preeminent figure in ADHD research. Dr. Faraone’s extensive body of work, encompassing psychiatric genetics, childhood mental disorders, and psychopharmacology, has been instrumental in shaping our current understanding of ADHD. His recent application of advanced machine learning techniques to these domains has further expanded the frontiers of ADHD research[6, 7].

Dr. Faraone spearheaded the identification of the first genes associated with ADHD, significantly influenced the diagnostic criteria for adult ADHD, and established the ADHD Molecular Genetics Network in the 1990s[9]. His research output, comprising over 1000 publications with an exceptional citation count of 195,382, underscores the impact and relevance of his work[7, 17]. His provision of scientific papers, resources, and empirical data related to ADHD has been crucial in establishing the evidence base and ground truth information for our study.

To create a preliminary, proof of concept, model, Dr. Faraone provided several review articles and other files of information that he had curated for this project. To effectively process these resources, we employed the Langchain library in Python to load the directory of our documents. These documents were then segmented into smaller text chunks, each consisting of 1,500 characters with an overlap of 150 characters. This preprocessing step ensured that the text segments were both manageable and contextually coherent, thereby facilitating more accurate and effective downstream analysis.

2.2 Semantic Concept Extraction with LLM

To extract relevant concepts from the text chunks, we employed the Llama3.1-8B [3] model, a state-of-the-art Large Language Model (LLM). The model’s primary function was to identify and categorize key concepts within the text, subsequently organizing these concepts into a structured graph where nodes represent the concepts and edges denote the relationships between them.

The extracted concepts were categorized into predefined types. Concepts that did not align with these categories were classified under the label ‘other’. The prompt which is given to the LLM can be seen at Listing 1.

2.3 Graph Construction and Refinement

The graph construction process began with the creation of an initial graph using the NetworkX library. In this graph, nodes represented the extracted concepts, while edges denoted the relationships between these concepts. Each edge was enriched with attributes such as edge_type, edge_details, weight, and a reference to the originating text chunk, ensuring that relationships were contextually grounded and traceable back to their source.

2.3.1 Contextual Proximity Edges:

To further enrich the graph, we calculated contextual proximity between concepts. Concepts that appeared within the same text chunk were connected by edges, with edge weights reflecting the frequency of co-occurrence. This approach allowed us to capture the strength of associations between concepts, highlighting their contextual relevance within the broader network.

2.3.2 Eliminating Redundant Nodes:

In the refinement stage, we addressed redundancy within the graph. The redundancy comes from the very similar or same meaning nodes being represented as different nodes in the graph. For example the node ’ADHD’ and ’Attention-deficit/hyperactivity disorder’ can be represented by two different nodes even though they mean the same thing.

By generating embeddings for each node using the “gte-base-en-v1.5” transformer model[19], we managed to represent each node as a vector. This embedding model was selected from the embedding models leaderboard[14] based on its strong performance, mid-size ( $\leq$ 1B parameters), providing a robust semantic representation of each concept. Using the DBSCAN clustering algorithm[5] with a very high similarity threshold (0.9), we identified clusters of nodes that were semantically very similar. The reason we set the threshold very high is we only needed to cluster very similar meaning nodes. So, these clusters enabled the detection and merging of redundant nodes, such as synonymous terms and abbreviations. In the final graph, each cluster was represented by the most relevant (highest-degree) node, with edges updated accordingly to reflect this consolidation, thereby reducing redundancy and enhancing the clarity and utility of the graph.

3 Experimental Results

In this section, we present the results of our network analysis on the constructed ADHD knowledge graph. Unlike typical network graphs, this knowledge graph is characterized by links that carry semantic meanings, representing relationships between different concepts. As such, the network science metrics reported here must be understood in the context of a graph where edges do not merely signify connections but also encode specific, meaningful relationships.

3.1 ADHD Knowledge as Multi-Layer Network

Refer to caption — Figure 1: ADHD Knowledge Graph Visualization

Figure 1 is a visualization of a knowledge graph centered around the concept of ADHD, with nodes representing various entities and concepts and edges illustrating the relationships between these nodes. The node labeled “ADHD” is prominently positioned at the top of the graph, serving as a central hub. This placement indicates that ADHD is the primary subject being explored, with numerous connections radiating out to other entities across different categories. The dense web of relationships around ADHD suggests a complex and multifaceted nature, with significant connections to a wide array of entities.

Colors are assigned to nodes based on their types, such as “Person,” “Organization,” “Concept,” “Documents,” and so forth. This color-coding helps to visually distinguish between different types of entities within the knowledge graph, making it easier to identify patterns and relationships within and across categories. The diverse coloring highlights how various aspects of ADHD are interconnected, providing a clearer understanding of how different entities relate to the central concept.

Edges between nodes are also color-coded based on their types. Red edges represent “relation” edges, indicating direct relationships between entities, such as a service provided by an organization or a document discussing a particular concept related to ADHD. In contrast, gray edges represent “contextual proximity” edges, suggesting that the entities are connected within a certain context, though not necessarily through a direct or explicit relationship. This might indicate shared attributes, common references, or similar thematic content, providing a richer context around the more explicit connections.

The categorization of nodes, such as “Document” “Service” and “Condition” shows the broad scope of ADHD’s influence. For instance, the connections between ADHD and “Condition” likely represent various medical or psychological conditions associated with ADHD, while “Service” might represent healthcare or educational services designed for individuals with ADHD. Similarly, “Document” likely refer to research papers, guidelines, or policy documents that discuss ADHD or its related aspects, creating a knowledge base that informs the other categories.

This knowledge graph was created using Gephi software [1], a popular open-source tool for network analysis and visualization. Gephi allows for the assignment of colors to nodes based on their types and the coloring of edges according to their relationship type, as seen in this graph. This enhances the visual representation and makes it easier to interpret the complex relationships within the dataset. The hierarchical or layered structure of the graph, with “ADHD” at the top and various categories spreading out below, suggests a comprehensive approach to understanding how ADHD interacts with different aspects of society, healthcare, and research.

The less dense connections in categories like “Organization” and “Location” could suggest that while these entities are related to ADHD, they might play more specialized or niche roles compared to other categories. This visualization offers valuable insights into the broad impacts of ADHD and could be instrumental in identifying key areas of focus or gaps in research and understanding. For instance, if certain categories or nodes have fewer connections, it could indicate areas that need further exploration or development. The graph can serve as a powerful tool for researchers, clinicians, or policymakers to better understand the complex network of entities related to ADHD and to develop targeted interventions or policies accordingly.

3.2 Network Metrics

Table 1: Network Metrics and
Community Detection Results

{tblr}

column2 = r, hline1,17 = -0.08em, hline2,11,13,15 = -0.05em, rowsep = 2pt Metric & Value

Nodes 2347

Edges 8655

Density 0.0031

Avg. Degree 6.289

Triadic Closure 0.0505

Clust. Coef. 0.8349

Node Sim. 0.0696

Eig. Cent. 0.0461

Assort. Coef. -0.1242

Leiden Mod. 0.6195

Leiden Comm. 31

Louvain Mod. 0.6157

Louvain Comm. 26

G-N Mod. 0.5876

G-N Comm. 74

Table 2: Node and Edge Type
Distributions

{tblr}

column2 = r, hline1,16 = -0.08em, hline2,13 = -0.05em, rowsep = 2pt Type & Count

Node

concept 904

condition 579

documents 434

service 268

other 221

entity 207

person 115

organization 105

location 61

date 20

Edge

contextual proximity 15289

direct relationship 3640

Table 2 summarizes the key network metrics and community detection results obtained from the analysis. The knowledge graph comprises 2,347 nodes and 8,655 edges, reflecting the rich and varied relationships within the ADHD domain. With a network density of 0.0031, the graph is relatively sparse, indicating that while there are numerous relationships, they are selectively formed based on the specific meanings encoded in the edges.

The average degree of 6.289 suggests that each concept, on average, is connected to about six other concepts. This level of connectivity highlights the nuanced interplay between various entities, conditions, and other elements within the ADHD knowledge graph. The triadic closure value of 0.0505, together with an average clustering coefficient of 0.8349, points to a graph structure where concepts are likely to form tightly-knit groups based on their relationships, consistent with the idea that certain concepts in ADHD are more closely related or co-occur more frequently in the literature.

The relatively low average node similarity (0.0696) and eigenvector centrality (0.0461) reflect the diversity of concepts in the knowledge graph, where not all nodes are uniformly central or similar across the network. The assortativity coefficient of -0.1242 suggests a slight disassortative mixing pattern, where nodes of differing degrees tend to connect, potentially indicating that more specific or rare concepts may be linked to more general or well-established ones.

3.3 Community Detection

To further analyze the structure of the ADHD knowledge graph, we applied several widely used community detection algorithms: the Leiden algorithm [18], the Louvain algorithm [2], and the Girvan-Newman algorithm [15]. Community detection is essential in identifying groups of closely connected nodes, which can reveal the underlying structure of the graph and help in understanding how related concepts cluster together within the knowledge domain.

The Leiden algorithm is a hierarchical clustering method that optimizes the modularity of the graph partition. It iteratively refines the partition by moving nodes between communities to improve the overall modularity score. In our analysis, the Leiden algorithm yielded a modularity score of 0.6195 and identified 31 communities, indicating a well-defined community structure within the graph.

The Louvain algorithm, another popular method, also optimizes modularity but does so using a slightly different approach. This algorithm produced a modularity score of 0.6157 and identified 26 communities, similar to the Leiden algorithm, suggesting robustness in the identified community structure.

On the other hand, the Girvan-Newman algorithm takes a divisive approach by iteratively removing edges with high betweenness centrality, which are likely to connect different communities. By removing these edges, the algorithm gradually splits the graph into smaller communities. This method identified 74 communities with a slightly lower modularity score of 0.5876, reflecting a more granular partition of the graph.

The application of these different algorithms aims to uncover the underlying community structure of the ADHD knowledge graph. The identified communities can provide insights into the grouping of related concepts and help identify key themes or subdomains within the ADHD knowledge landscape. Moreover, this community information is valuable for various Graph-RAG (Retrieval Augmented Generation) applications. For instance, when a user queries the system with a specific question related to ADHD, the model can leverage community information to efficiently retrieve the most relevant subgraphs or concepts [4]. This targeted retrieval can enhance the accuracy and context-specificity of the generated responses.

3.4 Node and Edge Type Distributions

Table 2 provides a detailed breakdown of the types of nodes and edges present within the knowledge graph. The node distribution reflects the diverse range of concepts captured, with ‘concept’ nodes being the most prevalent (904 instances), followed by ‘condition’ nodes (579 instances) and ‘documents’ nodes (434 instances). This variety indicates the graph’s comprehensive nature, encompassing various aspects of ADHD research, from clinical conditions to associated documents and services.

The edge type distribution is particularly telling of the knowledge graph’s distinct nature. ‘Contextual proximity’ edges are the most common, with 15,289 instances, indicating the frequency with which concepts co-occur within the same text chunks. This suggests that the graph is densely woven with contextual relationships. Additionally, there are 3,640 ‘direct relationship’ edges, which denote explicit connections between concepts as extracted by the LLM, highlighting the direct, semantically meaningful relationships present in the graph.

The distribution of node and edge types emphasizes the unique structure of the ADHD knowledge graph, where each link carries a specific meaning, thus providing a rich resource for further analysis and understanding of ADHD’s complex web of related concepts.

3.5 K-Core Analysis

The k-core analysis of the ADHD knowledge graph, constructed using the evidence base provided by Dr. Faraone, offers several significant insights into the structure and focus areas of the graph. The nodes that persist in the maximum k-core (16 cores) reveal the most interconnected and central concepts, underscoring their importance in the study of ADHD.

Type	Nodes
Concept	Years of schooling, Cigarettes smoked/day, Age at menopause, Subjective well-being, Genetic correlation, Intelligence
Condition	Lung cancer, Autism, Cannabis use disorder, Schizophrenia, Obesity, ADHD, Coronary artery disease, Type 2 diabetes
Documents	Number of children, FMRI studies, College completion, Body mass index, Age of first birth, Smoking initiation
Entity	Major depression

One key insight is the centrality of health conditions such as ”ADHD,” ”Schizophrenia,” ”Autism,” ”Obesity,” and ”Type 2 diabetes.” Their inclusion in the maximum k-core highlights their critical role within the research domain. These conditions are deeply interconnected, suggesting that they are essential to understanding ADHD, either as comorbidities, risk factors, or consequences. The presence of ”Major depression” and ”Coronary artery disease” further emphasizes the complex and multifaceted nature of ADHD, indicating its intersection with both mental and physical health domains.

Another significant finding is the importance of cognitive and behavioral concepts, such as ”Intelligence,” ”Years of schooling,” and ”Subjective well-being,” which are also part of the highest k-core. This underscores their significance in ADHD research, indicating that these factors are not peripheral but central to understanding and managing ADHD. The persistence of these nodes in the maximum k-core suggests that ADHD is intricately linked with broader cognitive and educational outcomes, which are crucial for both academic research and practical interventions.

Additionally, the research focus on measurement and documentation is evident, with nodes related to empirical studies and outcomes, such as ”FMRI studies of cognitive control,” ”Body mass index,” and ”Smoking initiation,” being present in the maximum k-core. This emphasizes the importance of empirical data and research documentation in ADHD studies. The inclusion of these nodes suggests that the methodologies and data sources used in ADHD research are fundamental to the domain, providing the empirical foundation upon which other concepts and conditions are built.

Finally, the thematic grouping and network robustness observed in the k-core analysis highlight the solid structure of the ADHD knowledge graph. The clustering of related concepts, such as health conditions and cognitive factors, within the maximum k-core, indicates that the graph is well-structured around these central themes, which are likely to be resilient to changes or perturbations. The robustness of the k-core structure, with a high $k$ value, reflects the strength of the interconnections among these core nodes, indicating that the knowledge graph has a solid and reliable foundation based on Dr. Faraone’s comprehensive research.

Overall, the k-core analysis of the ADHD knowledge graph reveals the most central and interconnected concepts within the domain, emphasizing the importance of specific health conditions, cognitive factors, and research methodologies. The high k-core value highlights the robustness of the graph’s structure, indicating that these core concepts are deeply embedded in the research landscape of ADHD. This analysis not only enhances our understanding of the knowledge graph but also provides valuable guidance for future research and interventions in the field of ADHD.

3.6 RAG-Based Chatbot Demo

After converting our resources into a knowledge graph, we developed a RAG-based LLM chatbot that is fully aware of the ADHD context. Dr. Faraone has tested the chatbot for accuracy in contextual responses and identified cases where the system hallucinates or provides incomplete information, issues which will be addressed in subsequent versions of the knowledge graph as described below.

4 Discussion

From the knowledge graph, we can see that, even with the relatively few documents provided for this preliminary work, the knowledge graph is huge, having 2,347 nodes and 8,655 edges. The graph will likely get much larger when provided with a more comprehensive set of documents. One way to reduce the complexity of the document will be to eliminate the clusters corresponding to persons and locations. These occur in the knowledge graph because some of the documents provided include names of authors and their institutions and geographical locations. Excluding this information would not reduce the accuracy of the graph but would reduce computational complexity. Likewise, dates appear as a cluster in the graph because article citations include dates but these are also not relevant to the goals of this project. These considerations also apply to the community detection analyses which show many nodes corresponding to dates, persons and locations. Deleting these nodes would dramatically reduce the number of edges as well.

The k-core analysis is instructive. It eliminates the issue with dates, persons and locations and highlights concepts that are mostly relevant to ADHD. Yet, some of the k-cores are either not relevant or only peripherally related to ADHD. “Number of Children” emerges as a k-core but is not useful for understanding ADHD or creating a chatbot. The k-cores also do not distinguish between concepts that have a strong connection to ADHD in the research literature (e.g., depression) from those that have a weak (schizophrenia) or questionable (lung cancer) relationship. It is also odd that two of the k-cores refer to methods (fMRI studies, genetic correlation). Further work is also needed to understand why the k-cores do not reference treatment approaches.

5 Conclusion

In this study, we applied network analysis to Attention-Deficit/Hyperactivity Disorder (ADHD) by constructing a multimodal knowledge graph that integrates data from diverse sources and expert knowledge. This structured representation maps intricate relationships between various factors associated with ADHD and offers deeper insights into its etiology and manifestation. Dr. Faraone’s extensive research was instrumental in informing our approach, providing validated measures that enhanced the relevance and robustness of our study. By building upon his contributions, we grounded our analysis in the most authoritative understanding of ADHD, further strengthening the credibility of our findings.

Our methodology leverages advanced techniques like contextual proximity calculations and node embeddings using Large Language Models (LLMs) to extract and structure ADHD-related knowledge. This approach reduced redundancy, captured subtle semantic connections, and employed clustering techniques to merge semantically similar concepts, resulting in a streamlined and nuanced knowledge graph. Despite using a relatively small dataset, the graph uncovered an extensive network of ADHD-related concepts, revealing its broad connections across various topics and disciplines. The graph’s potential for expansion further underscores its value as a research tool.

We also identified areas for improvement, such as eliminating irrelevant nodes to enhance interpretability and reduce computational complexity. This refinement process highlighted challenges in data extraction and preprocessing, suggesting areas for methodological improvements to optimize the graph further. The knowledge graph maps ADHD’s interdisciplinary connections, revealing intersections with conditions like depression, schizophrenia, and unexpected areas like lung cancer. Additionally, the absence of treatment-related K-cores points to gaps in the dataset, suggesting that therapeutic strategies may be underrepresented, requiring more refined techniques to distinguish core ADHD concepts.

By refining the graph, it can serve as a foundation for developing tools like targeted chatbots to provide valuable information to patients, clinicians, and educators. This comprehensive resource also facilitates interdisciplinary collaboration, supporting both future research and practical applications, such as enhancing clinical decision-making and AI-driven tools like Retrieval Augmented Generation (RAG) systems for ADHD-Expert LLMs.

This preliminary work uncovers the richness and complexity of ADHD knowledge and lays the groundwork for further research. Addressing current limitations and refining the graph will enhance its accuracy and usefulness, contributing to improved outcomes in the diagnosis, treatment, and management of ADHD.

References

[1] Bastian, M., Heymann, S., Jacomy, M.: Gephi: An open source software for exploring and manipulating networks (2009)
[2] Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment (2008). 10.1088/1742-5468/2008/10/P10008
[3] Dubey, A., et al.: The llama 3 herd of models (2024)
[4] Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Larson, J.: From local to global: A graph rag approach to query-focused summarization (2024). URL http://arxiv.org/abs/2404.16130. ArXiv:2404.16130 [cs]
[5] Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press (1996)
[6] Faraone, S.V., Banaschewski, T., Coghill, D., Zheng, Y., Biederman, J., Bellgrove, M.A., Newcorn, J.H., Gignac, M., Al Saud, N.M., Manor, I., et al.: Attention-deficit/hyperactivity disorder. Nature Reviews Disease Primers 5(1), 1–27 (2019)
[7] Faraone, S.V., Banaschewski, T., Coghill, D., Zheng, Y., Biederman, J., Bellgrove, M.A., Newcorn, J.H., Gignac, M., Al Saud, N.M., Manor, I., et al.: The world federation of adhd international consensus statement: 208 evidence-based conclusions about the disorder. Neuroscience & Biobehavioral Reviews 128, 789–818 (2021)
[8] Faraone, S.V., Bellgrove, M.A., Brikell, I., Cortese, S., Hartman, C.A., Hollis, C., Newcorn, J.H., Philipsen, A., Polanczyk, G.V., Rubia, K., Sibley, M.H., Buitelaar, J.K.: Attention-deficit/hyperactivity disorder. Nature Reviews Disease Primers (2024)
[9] Faraone, S.V., Perlis, R.H., Doyle, A.E., Smoller, J.W., Goralnick, J.J., Holmgren, M.A., Sklar, P.: Molecular genetics of attention-deficit/hyperactivity disorder. Biological psychiatry 57(11), 1313–1323 (2005)
[10] Guo, T., Yang, Q., Wang, C., Liu, Y., Li, P., Tang, J., Li, D., Wen, Y.: KnowledgeNavigator: leveraging large language models for enhanced reasoning over knowledge graph. Complex & Intelligent Systems
[11] Hayashi, W., Iwanami, A.: Biological mechanisms of adhd. Brain and nerve (2018)
[12] Jin, W., Zhao, B., Yu, H., Tao, X., Yin, R., Liu, G.: Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning. Data Mining and Knowledge Discovery 37(1), 255–288 (2023). 10.1007/s10618-022-00891-8. URL http://arxiv.org/abs/2110.12679. ArXiv:2110.12679 [cs]
[13] Li, Z., Cao, Z., Li, P., Zhong, Y., Li, S.: Multi-Hop Question Generation with Knowledge Graph-Enhanced Language Model Number: 9 Publisher: Multidisciplinary Digital Publishing Institute
[14] Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: Mteb: Massive text embedding benchmark (2022)
[15] Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E (2004)
[16] Sonuga-Barke, E.J., Becker, S.P., Bölte, S., Castellanos, F.X., Franke, B., Newcorn, J.H., Nigg, J.T., Rohde, L.A., Simonoff, E.: Annual research review: Perspectives on progress in adhd science – from characterization to cause. Journal of Child Psychology and Psychiatry (2023)
[17] SUNY Upstate Medical University: Stephen v. faraone, phd. https://www.upstate.edu/psych/research/faraone/ (2024). Accessed: 2024-08-31
[18] Traag, V.A., Waltman, L., van Eck, N.J.: From louvain to leiden: guaranteeing well-connected communities. Scientific Reports 9(1) (2019). 10.1038/s41598-019-41695-z. URL http://dx.doi.org/10.1038/s41598-019-41695-z
[19] Zhang, X., Zhang, Y., Long, D., Xie, W., Dai, Z., Tang, J., Lin, H., Yang, B., Xie, P., Huang, F., Zhang, M., Li, W., Zhang, M.: mgte: Generalized long-context text representation and reranking models for multilingual text retrieval (2024)

Appendix

Listing 1: System prompt that used for node and edge extraction

⬇

You are a network graph maker who extracts terms and their relations from a given context.

You are provided with a context chunk (delimited by ‘‘‘)

Your task is to extract the ontology of terms mentioned in the given context.

These terms should represent the key concepts as per the context.

Thought 1: While traversing through each sentence, Think about the key terms mentioned in it. Terms must only include entity, location, organization, person, condition, documents, service, concept, date. Terms should be as atomistic as possible.

Thought 2: Think about how these terms can have one on one relation with other terms. Terms that are mentioned in the same sentence or the same paragraph are typically related to each other. Terms can be related to many other terms.

Thought 3: Find out the relation between each such related pair of terms.

⬇

\colorbox{lightgray}{%

\parbox{0.95\textwidth}{%

\textcolor{entitycolor}{\textbf{Atomoxetine}} (Strattera) is approved by the \textcolor{organizationcolor}{\textbf{FDA}} for the treatment of \textcolor{conditioncolor}{\textbf{ADHD}} in \textcolor{personcolor}{\textbf{children}}, \textcolor{personcolor}{\textbf{adolescents}}, and \textcolor{personcolor}{\textbf{adults}}. It works by increasing the amount of certain neurotransmitters—\textcolor{entitycolor}{\textbf{norepinephrine}} (and \textcolor{entitycolor}{\textbf{dopamine}})—available for cells to communicate with each other in the brain. Unlike the \textcolor{conceptcolor}{\textbf{stimulants}}, \textcolor{entitycolor}{\textbf{atomoxetine}} is not a controlled substance, which makes it easier to obtain a \textcolor{othercolor}{\textbf{several-month supply of medication}}.

}

\end{center}

\vspace{-8mm}

\begin{table}[!htb]

\centering

\caption{Extracted nodes and edges from the example text}

\label{tab:example}

\begin{tabular}{|>{\centering}p{2.3cm}|>{\centering}p{1.8cm}||>{\centering}p{2.3cm}|>{\centering}p{1.8cm}||p{4cm}|}

\hline

\multicolumn{2}{|c||}{\textbf{Node 1}} & \multicolumn{2}{|c||}{\textbf{Node 2}} & \multicolumn{1}{|c|}{\textbf{\centering Edge}} \\

\cline{1-2} \cline{3-4}

\textbf{Name} & \textbf{Type} & \textbf{Name} & \textbf{Type} & \\

\hline

atomoxetine & entity & fda & organization & is approved by \\

\hline

atomoxetine & entity & adhd & condition & treats \\

\hline

atomoxetine & entity & children & person & is approved for the treatment of in \\

\hline

atomoxetine & entity & adolescents & entity & is approved for the treatment of in \\

\hline

atomoxetine & entity & adults & person & is approved for the treatment of in \\

\hline

atomoxetine & entity & norepinephrine & entity & works by increasing the amount of certain available for cells to communicate with each other in \\

\hline

atomoxetine & entity & dopamine & entity & works by increasing the amount of certain available for cells to communicate with each other in \\

\hline

atomoxetine & entity & stimulants & concept & unlike, is not a controlled substance \\

\hline

atomoxetine & entity & several-month supply of medication & other & makes it easier to obtain a \\

\hline

\end{tabular}

\vspace{-6mm}

\end{table}

\end{document}