6.1. Principal Findings
The JAMA and GQS scoring systems were categorized into distinct classifications of "Good," "Average," and "Poor" to facilitate detailed statistical analysis and enhance the interpretability of the results. For the JAMA score, the ranges were defined as follows: a score of 0-1 was categorized as "Poor," a score of 2 as "Average," and scores ranging from 3-4 as "Good." Similarly, the GQS score ranges were set with scores of 0-1 classified as "Poor," scores of 2-3 as "Average," and scores of 4-5 as "Good."
This categorization allowed for a structured and comparative analysis of the videos based on their quality and reliability. By assigning specific ranges to these categories, we could systematically evaluate and compare the performance of the videos under study. This structured approach enabled us to identify patterns and trends in the data, such as the proportion of videos falling into each category and the factors contributing to these classifications.
The use of these predefined categories also facilitated a clearer understanding of the overall quality of the videos in relation to the standards set by the JAMA and GQS criteria. For instance, videos classified as "Good" according to the JAMA score demonstrated a high level of adherence to the standards of medical accuracy and reliability. In contrast, those rated as "Poor" indicated significant deficiencies in these areas. Similarly, the GQS score classification helped in assessing the overall quality and usefulness of the videos from the viewer's perspective.
This categorization not only streamlined the statistical analysis process but also provided a robust framework for discussing the implications of our findings. It highlighted the areas where the videos excelled or fell short, thereby offering valuable insights for future content creators and researchers aiming to improve the quality of online health information.
Our study provided insightful data regarding the performance scores of Doctors and Non-Doctors in the assessment of their knowledge and understanding of nutrition and diet for CKD patients. Among the Doctors (n=18, representing 60% of the total participants), the distribution of scores revealed that 44.4% (n=8) achieved an "Average" rating, while a notable 55.6% (n=10) attained a "Good" rating. Remarkably, none of the Doctors scored in the "Poor" category, indicating a relatively high baseline of knowledge and competence within this group.
In contrast, the Non-Doctors (n=12, accounting for 40% of the total participants) displayed a different pattern in their performance scores. A significant majority, 83.3% (n=10), scored "Average," suggesting a more moderate level of understanding in this cohort. Only 8.3% (n=1) of Non-Doctors achieved a "Good" rating, markedly lower compared to their Doctor counterparts. Additionally, 8.3% (n=1) of Non-Doctors scored "Poor," highlighting a critical area for improvement within this group.
From these findings, it can be concluded that Doctors are significantly more likely to score "Good" compared to Non-Doctors (55.6% vs. 8.3%, respectively). This disparity underscores the advanced knowledge base and possibly the more rigorous training and exposure that Doctors have concerning CKD nutrition and diet management. Conversely, Non-Doctors are more inclined to score "Average" (83.3% vs. 44.4% for Doctors), indicating a generally acceptable but less proficient level of understanding. The presence of a "Poor" score exclusively among the Non-Doctors (8.3%) further emphasizes the need for enhanced educational interventions targeting this group.
These results (given below in
Table 10) suggest a clear division in the levels of expertise between Doctors and Non-Doctors, with Doctors demonstrating superior performance. This underscores the importance of targeted educational programs to elevate the knowledge base of Non-Doctors, ensuring a more uniformly high standard of care and information dissemination across all healthcare providers involved in the management of CKD.
The below
Figure 4 gives a graphical representation of the above explained relation of the professional status and the content quality of the videos.
In our study, we sought to uncover the relationships between four key variables: the number of views, JAMA scores, GQS scores, and likes of the videos, employing Spearman's rank correlation coefficient for this analysis. Our findings revealed several noteworthy correlations.
Firstly, when comparing the number of views to JAMA scores, a moderate positive correlation was identified (ρ = 0.422, p = 0.020). This relationship was statistically significant at the 0.05 level, indicating that videos with higher view counts tend to have better JAMA scores. This suggests that videos perceived as more credible or higher in quality are viewed more frequently by the audience.
In contrast, the comparison between views and GQS scores yielded a weak positive correlation (ρ = 0.229, p = 0.223). This relationship was not statistically significant, indicating that the GQS scores do not strongly correlate with the number of views. This lack of significance suggests that GQS, which might measure a different dimension of video quality or user engagement, does not influence view counts in a meaningful way.
Additionally, when examining the relationship between GQS scores and likes, we found a moderate positive correlation (ρ = 0.345, p = 0.062). Although this relationship approached statistical significance, it did not meet the 0.05 threshold. This near-significance suggests a potential trend where higher GQS scores could be associated with more likes, but further investigation is required to confirm this pattern. The comparison between likes and JAMA scores revealed a strong positive correlation (ρ = 0.530, p = 0.003), which was statistically significant at the 0.01 level. This robust relationship indicates that videos with higher JAMA scores, reflecting higher quality and credibility, are also more likely to receive a greater number of likes. This finding suggests that likeability, as an engagement metric, aligns closely with the quality standards measured by JAMA. The strongest and most significant relationship was observed between likes and JAMA scores, suggesting that higher likeability is associated with better JAMA scores. The moderate correlation between views and JAMA scores indicates that higher view counts are associated with better JAMA scores, though to a lesser extent than likes. The weaker and non-significant correlations involving GQS scores suggest that GQS might be measuring a different aspect of performance or quality compared to the other metrics.
Comprehensively, these findings imply that engagement metrics such as views and likes have a positive relationship with JAMA scores, indicating that more engaging content tends to be of higher quality as measured by JAMA. However, it is important to note that causality cannot be inferred from these correlations alone. Further research would be necessary to explore the causal relationships and underlying factors influencing these associations.
The below matrix scatter plot (
Figure 5) gives a vivid idea about the above mentioned correlation analysis done.
To mitigate potential individual rater bias, we employed a multi-rater approach wherein three independent evaluators assessed the same video content using identical scorecards. To quantify the overall concordance among these three raters and ensure alignment in their evaluative approaches, we conducted Bland-Altman analyses. This method allowed us to assess the degree of agreement between raters and verify that their interpretations of the scoring criteria were consistently applied across the sample. The Bland-Altman plots suggested that there is generally good agreement between all three raters for both JAMA and GQS scores. However, the agreement appeared to be stronger for JAMA scores compared to GQS scores, as evidenced by the tighter clustering of points around the mean difference line in the JAMA score plots. The wider spread of points in the GQS score plots indicates more variability in ratings between pairs of raters for this scoring system. This could suggest that the GQS scoring criteria may be more subjective or open to interpretation compared to the JAMA scoring criteria. Despite some variability, most data points fall within the limits of agreement for both scoring systems, indicating an acceptable level of agreement between raters overall.
Figure 6.
Bland Altman Plot for JAMA Scoring of Rater 1 vs Rater 2.
Figure 6.
Bland Altman Plot for JAMA Scoring of Rater 1 vs Rater 2.
Figure 7.
Bland Altman Plot for GQS Scoring of Rater 1 vs Rater 2.
Figure 7.
Bland Altman Plot for GQS Scoring of Rater 1 vs Rater 2.
Figure 8.
Bland Altman Plot for JAMA Scoring of Rater 2 vs Rater 3.
Figure 8.
Bland Altman Plot for JAMA Scoring of Rater 2 vs Rater 3.
Figure 9.
Bland Altman Plot for GQS Scoring of Rater 2 vs Rater 3.
Figure 9.
Bland Altman Plot for GQS Scoring of Rater 2 vs Rater 3.
Figure 10.
Bland Altman Plot for JAMA Scoring of Rater 1 vs Rater 3.
Figure 10.
Bland Altman Plot for JAMA Scoring of Rater 1 vs Rater 3.
Figure 11.
Bland Altman Plot for GQS Scoring of Rater 1 vs Rater 3.
Figure 11.
Bland Altman Plot for GQS Scoring of Rater 1 vs Rater 3.
To analyze it deeper, a Cronbach's Alpha test was conducted for each set of ratings. The reliability of the ratings provided by three raters—Rater 1, Rater 2 and Rater 3—was analyzed using a two-way mixed-effects model with absolute agreement.
For the JAMA Score, the overall internal consistency was found to be good, with a Cronbach's alpha of 0.801 (0.803 based on standardized items). This high level of inter-rater reliability suggests that the raters were consistent in their evaluations. The single measures Intraclass Correlation Coefficient (ICC) was 0.554 (95% Confidence Interval: 0.348 - 0.733), indicating moderate reliability for individual ratings. The average measures ICC was 0.789 (95% Confidence Interval: 0.615 - 0.892), reflecting good reliability when considering the mean of all three raters. Both single and average measures ICCs were statistically significant (p < 0.001), confirming that the observed ICCs are significantly different from zero. The inter-item correlation matrix supported these findings, with moderate to strong positive correlations between raters, ranging from 0.542 - 0.608. This further underscores the consistency among the raters in their scoring of the JAMA criteria.
Regarding the GQS Scores, the overall internal consistency of the ratings was excellent, with a Cronbach's alpha of 0.831 (0.835 based on standardized items). This indicates a high level of inter-rater reliability for the GQS scores. The single measures ICC was 0.594 (95% Confidence Interval: 0.391 - 0.762), suggesting moderate to good reliability for individual ratings. The average measures ICC was 0.814 (95% Confidence Interval: 0.658 - 0.906), indicating excellent reliability for the mean ratings of all three raters. Both single and average measures ICCs were statistically significant (p < 0.001), confirming the significant departure of the observed ICCs from zero. The inter-item correlation matrix revealed moderate to strong positive correlations between raters, ranging from 0.548 - 0.775. Notably, the correlation between Halim and Sayani was particularly strong (0.775), while the correlations involving Arnab were slightly lower but still substantial.
Item statistics for the GQS scores showed mean ratings ranging from 2.57 - 2.87 across raters, with standard deviations between 0.626 and 0.730. This indicates reasonable consistency in scoring patterns among the raters. The high Cronbach's alpha values, significant ICCs, and strong inter-rater correlations collectively highlight the reliability and consistency of the ratings for both JAMA and GQS scores. These findings demonstrate that the scoring methodology used in this study is robust and dependable, ensuring that the evaluations of video content are both accurate and reproducible across different raters.
As a next step, The Kruskal-Wallis H test was utilized to investigate the relationship between designation groups and two scoring systems: JAMA (Journal of the American Medical Association) and GQS (Global Quality Scale). The Kruskal-Wallis test, a non-parametric method, determines whether statistically significant differences exist between two or more groups of an independent variable on a continuous or ordinal dependent variable.
For JAMA scores, the Kruskal-Wallis H statistic was 7.403, with 2 degrees of freedom and an asymptotic significance (p-value) of 0.025. This result indicates a statistically significant difference among the designation groups, suggesting that variations in JAMA scores are associated with different professional categories (H(2) = 7.403, p = 0.025).
Similarly, the analysis for GQS scores revealed a Kruskal-Wallis H statistic of 8.542, with 2 degrees of freedom and an asymptotic significance (p-value) of 0.014. This finding also indicates a statistically significant difference among the designation groups, highlighting that GQS scores vary significantly across different professional categories (H(2) = 8.542, p = 0.014).
Both results are significant at the 0.05 level, with the GQS scores showing a slightly stronger effect (lower p-value) compared to the JAMA scores. These findings suggest that the designation or professional category of individuals significantly influences both JAMA and GQS scores.
The implications of these results are noteworthy. The significant differences in scores indicate that factors related to professional designation, such as background in medicine and professional expertise, may affect performance on these scoring systems. Specifically, it can be inferred that videos produced by doctors tend to be of higher quality and contain better information compared to those created by non-doctors. This disparity underscores the importance of professional expertise in producing content that meets higher quality standards, as reflected in the JAMA and GQS scores.
Lastly, the Kruskal-Wallis H test results highlight the impact of professional designation on the quality of video content related to nutrition and diet for CKD patients. These findings emphasize the need for content creators to have a strong professional background to ensure the delivery of high-quality, reliable information to the audience.
Further to explore the relationship between professional designation (Doctor vs. Non-Doctor) and content presence scorecard performance, Fisher's Exact test was conducted.
The content scorecard utilized in this study was systematically divided into three specific categories: "Good," "Average," and "Poor." These categories were defined by score ranges designed to facilitate a more nuanced and detailed statistical analysis of the evaluated videos. Specifically, a score of 1-2 was categorized as "Poor," indicating significant deficiencies in the quality and comprehensiveness of the content. Scores in the range of 3-4 were classified as "Average," reflecting an acceptable level of content that meets basic standards but lacks excellence in certain critical areas. Finally, scores of 5-6 were categorized as "Good," representing high-quality content that thoroughly addresses essential aspects of the topic with clarity and accuracy.
This structured categorization was crucial for the analysis and interpretation of the data, allowing for a clear differentiation between varying levels of content quality. By assigning specific ranges to each category, the scorecard provided a systematic method to evaluate and compare the performance of the videos under study. This approach enabled the identification of patterns and trends, such as the proportion of videos falling into each category and the factors contributing to these classifications.
Moreover, the predefined categories facilitated a comprehensive understanding of the overall quality of the videos in relation to the established standards. Videos rated as "Good" demonstrated a high level of adherence to quality criteria, ensuring that viewers received reliable and valuable information. In contrast, those rated as "Poor" highlighted areas where significant improvements were needed. This clear delineation helped in pinpointing specific areas of content that require enhancement and provided actionable insights for content creators aiming to elevate the standard of their work.
The results indicate a statistically significant association between these variables (p = 0.001). The crosstabulation of the data reveals notable differences in scorecard performance between Doctors and Non-Doctors.
Doctors demonstrated a higher tendency to score in the "Average Good" (50.0%) and "Good" (27.8%) categories, collectively accounting for 77.8% of their scores. In contrast, Non-Doctors predominantly scored in the "Average" category (75.0%), with a smaller proportion in the "Poor" category (16.7%). It is particularly noteworthy that no Doctors were rated as "Poor," while no Non-Doctors achieved a "Good" rating. These differences underscore a clear disparity in performance between the two groups.
Supporting these findings, the Chi-Square test results (χ² = 14.712, df = 3, p = 0.002) also suggest a significant association between designation and scorecard performance. However, it is important to consider that 62.5% of cells have expected counts less than 5, which may impact the reliability of the Chi-Square test. Given these limitations, Fisher's Exact test provides a more robust indication of the significant relationship, as it does not depend on minimum expected cell frequencies.
Conslusively, these results highlight a strong association between professional designation (Doctor vs. Non-Doctor) and scorecard performance, with Doctors generally achieving higher scores than Non-Doctors. This finding has significant implications for professional development and performance evaluation within the field. It suggests that the expertise and training associated with a medical degree may contribute to a higher quality of content creation, particularly in the context of dietary information for dialysis patients. Therefore, targeted interventions to enhance the skills of Non-Doctors in this area could be beneficial in elevating the overall quality of health information disseminated to the public.
Table 11.
Cross-tabulation for Content Coverage/Authenticity Scoring.
Table 11.
Cross-tabulation for Content Coverage/Authenticity Scoring.
Designation |
Data |
Scores |
Average |
Good |
Poor |
Total |
Nephro Health Professionals (NHP) |
Count |
13 |
5 |
0 |
18 |
% within NHP |
72.20% |
27.80% |
0.00% |
100% |
Non-NHP |
Count |
10 |
0 |
2 |
12 |
% within NHP |
83.30% |
0.00% |
16.70% |
100% |
The below
Figure 12 gives a graphical representation of the above explained relation of the professional status and the content quality of the videos.
We further examined the relationships between four variables: View, Content, Like, and Correctness, using Spearman's rank correlation coefficient to assess these relationships due to the ordinal nature of the data. The findings revealed significant and nuanced correlations among these variables.
A significant positive correlation was found between View and Content (ρ = 0.422, p = 0.020), indicating a marginally significant relationship where the number of views is associated with higher content quality. This suggests that either higher view counts contribute to better content or that better content attracts more views. The correlation between View and Correctness approached significance but did not reach it (ρ = 0.353, p = 0.056), suggesting a weak to marginally positive relationship. This trend indicates that higher view counts might be associated with increased correctness, but this relationship is not statistically significant at the 0.05 level.
A significant positive correlation was observed between Correctness and Like (ρ = 0.409, p = 0.025), implying that content with higher correctness tends to receive more likes or that more liked content tends to be more correct. The strongest correlation was found between Like and Content (ρ = 0.555, p = 0.001), indicating a moderately strong relationship. This suggests that higher quality content tends to receive more likes or that more liked content is perceived as higher quality.
The scatter plot matrix (as shown below in
Figure 13) visually supports these findings, showing positive trends in the relationships between these variables, particularly for Like vs. Content and View vs. Content. These results highlight significant interrelationships among views, likes, content quality, and correctness. The strongest association was found between likes and content quality, suggesting that user engagement, as measured by likes, is closely tied to the perceived quality of the content.
These findings have important implications for content creators and platform managers, emphasizing the need to produce high-quality, accurate content to drive user engagement and visibility. Future research could explore the causal relationships between these variables and investigate other factors that might influence content performance and user engagement. Understanding these dynamics can help optimize content strategies and improve the overall quality of information available to the public.
From the overall study, it was concluded that the content produced by doctors generally had higher content scores compared to that produced by non-doctors. This finding underscores the superior quality and reliability of information provided by professionals with medical training. Despite this, the analysis revealed that there was no strong correlation between view rate and content quality, suggesting that a higher number of views does not necessarily indicate better quality content. However, a strong positive correlation was observed between like rate and content quality. This indicates that while people may watch various videos, they tend to like content based on its reliability and authenticity. Essentially, likes are a more accurate reflection of the perceived quality and trustworthiness of the content rather than just the number of views. This relationship highlights the importance of producing high-quality, accurate content to garner positive user engagement and trust.