supplement
Influencers’ Reposts and Viral Diffusion: Prestige Bias in Online Communities.
Abstract
Cultural evolution theory suggests that prestige bias–whereby individuals preferentially learn from prestigious figures–has played a key role in human ecological success. However, its impact within online environments remains unclear, particularly regarding whether reposts by prestigious individuals amplify diffusion more effectively than reposts by non-influential users. Here, we analyzed over 55 million posts and 520 million reposts on Twitter (currently X) to examine whether users with high influence scores (hg-index) more effectively amplified the reach of others’ content. Our findings indicate that posts shared by influencers were more likely to be further shared compared to those shared by non-influencers. This effect persisted over time, especially in viral posts. Moreover, a small group of highly influential users accounted for approximately half of the information flow within repost cascades. These findings demonstrate a prestige bias in information diffusion within digital society, suggesting that cognitive biases shape content spread through reposting.
Keywords: influencer; information diffusion; prestige bias; social network; cultural evolution
*Corresponding authors: T.N. ([email protected]) and Y.N. ([email protected]).
1 Introduction
In the digital age, social media influencers have emerged as authoritative figures with significantly greater information diffusion power than ordinary users [1, 2]. The impact of influencers extends beyond the online world, as evidenced by the emergence of influencer-based viral marketing strategies [3, 4, 5] and legal disputes concerning reposts by an influencer on Twitter (currently X) [6]. Influencers are believed to significantly impact the distribution of information in online communities due to their voice and follower count. In recent years, there have various approaches to addressing problems related to online communication, such as the spread of misinformation or homogeneity of received information (so-called echo-chambers and filter bubbles). Research has shown that influencers, particularly experts in specific areas, can contribute to correcting misinformation in online communities [7]. Accordingly, a continued focus on influencers remains crucial in this context.
Previous research examining influencers’ role in information diffusion has focused on their capacity as source spreaders; i,e. their ability to generate and spread original content [1, 8, 2, 9]. However, a key feature of modern social networks comprises repost or share functions, which allow users to reproduce others’ posts; such functionality is a major contributor to information diffusion. Acerbi [10] pointed out that sharing content online fundamentally differs from oral retelling, as it does not require memorization or reproduction of the content. This facilitates and potentially accelerates the spread of information in online environments, contributing to viral-like diffusion patterns. Some researchers have examined influencers as brokers of information, analyzing their role in facilitating the spread of content created by others [11, 12, 9]. Despite these research advances, significant gaps remain. Among these is the question of whether influencers who excel at spreading self-generated content can also amplify other-generated content through reposts. That is, are recipients more likely to share a post simply because an influencer has reposted it? Thus, the present study aims to investigate whether the diffusion capabilities of influencers extend beyond their own content, potentially affecting the propagation of information posted by others. Specifically, this research seeks to explore how these influencers impact the preferences and engagement behaviors of users who receive shared content.
If recipients are indeed more likely to share content from influencers, this would suggest the presence of cognitive biases. In evolutionary anthropology and cultural psychology, researchers have long studied context-dependent biases in information acquisition, where individuals selectively adopt information based on social context rather than content alone. Among these, prestige bias is the tendency to learn from socially successful or prestigious individuals [13, 14, 15]; it is considered an adaptive trait from a historical perspective. This concept aligns with established theories such as the theory of communication (e.g., two-step flow[16] and diffusion of innovations[17]), which emphasize the role of opinion leaders and early adopters in information spread. Recent studies on social media diffusion patterns [18] have indicated that most sharing occurs within close proximity to the original poster, highlighting the importance of early-stage diffusion and influential users. Furthermore, research has shown influencers’s potential to correct misinformation [7]. Building on these theories and recent findings, this study focuses on the early stages of information diffusion. We aim to track and analyze how influence changes over time, examining the effect of prestige bias in the digital contexts characterizing our information-rich society.
When applied to social media, the concept of prestige bias raises important questions: do influencers function as prestigious models whose reposts are more likely to be further reposted than those of non-influencers? Clearly, influencers’ original content is appealing to their audience of active followers; however, it remains uncertain whether they maintain a similar influence when merely reposting others’ information. Understanding if there is a cognitive tendency to prefer any information associated with prestigious individuals, whether original or reposted, would provide useful insights about how information recipients are influenced. Our research thus aims to investigate whether having a proven ability to spread original content (our definition of ”influencer”) translates into an enhanced ability to spread others’ content through reposts. Importantly, this study does not focus on the number of followers an influencer has; rather, it examines individual cognition and behavior. Specifically, it examines whether individuals are more likely to spread information they receive from influencers compared to non-influencers. Our investigation of the early stages of information diffusion allows us to explore the nuanced dynamics of information spread in social media, building insights about the cognitive processes underlying user behavior (content sharing) beyond simple metrics.
To test our hypothesis about the existence of prestige bias in online communities, we need to precisely track how information flows to subsequent recipients via reposts; this requires the introduction of several key concepts. First, we introduce the concept of a repost cascade, which builds upon the established notion of information cascades [19, 20, 2, 18, 21]. Figure 1 illustrates this concept and shows two key types of information spread that we distinguish in our study. This framework allows us to identify who received information from whom. As shown in Figure 1(a), a repost cascade represents the chain of reposts originating from an original post, capturing the cascading structure of information diffusion on social networks. The blue speech icon (speech bubble) represents an original post, while the green repost icons represent subsequent reposts, with arrows indicating the direction of information flow over time. Next, we distinguish between two types of information spread: primary spread, which is the diffusion of a user’s original post, and secondary spread, which is the diffusion of a user’s repost. Figure 1(b) depicts primary spread, where an original post (blue speech bubble icon) from a source spreader is directly shared with multiple users, including followers and followers’ followers. This process represents the propagation of original content through the network. Secondary spread, shown in Figure 1(c), represents the phenomenon whereby content is reposted by a user (indicated by the green repost icon) and then further spread among their followers. This secondary process captures how information continues to spread beyond its initial audience through the actions of intermediary users. This distinction is crucial for our analysis as it allows us to differentiate between a user’s ability to spread their own content (primary spread) and their capacity to amplify others’ messages (secondary spread). By focusing on these two types of spread, we can investigate whether accounts with powerful primary spread capabilities (influencers, which are quantified by the hg-index[22] in our study) also have powerful secondary spread capabilities.
Furthermore, to track who views and engages with reposts from specific users, we construct a virtual timeline from sampled reposts and follower-followee relationships, as illustrated in Figure 2(a). This approach allows us to simulate the followers’ exposure to reposts and observe their subsequent engagement. Lastly, using this approach, we introduce the concept of cascading repost probability (CRP) to measure the efficiency of information spread (Figure 2(b,c)). We consider a repost on the virtual timeline as having been further shared if it appears in a repost cascade, allowing us to calculate the probability of a user’s secondary spread continuing (Figure 2(b)). The CRP quantifies the likelihood of a repost being further shared, enabling us to evaluate the efficiency of repost diffusion among users with varying levels of influence; the CRP thus measures a user’s influence in the context of secondary spread. (Details are shown in Figure 2(c).) We calculates the CRP by aggregating view and repost counts across all users’ timelines. By comparing the CRPs of influencers and non-influencers, we can assess whether prestige bias manifests in online environments. If influencers consistently demonstrate higher CRP values, it would suggest that their status enhances their ability to spread information, even when that information originates from others.
These concepts (i.e., repost cascade, secondary spread, virtual timeline, and CRP) provide us with a framework to quantify and analyze users’ influence in terms of secondary information spread. Our methodology focuses on the dynamics of secondary spread and how it relates to users’ level of influence, particularly comparing influencers and non-influencers. Based on this framework, we hypothesize that influencers have higher CRP values compared to non-influencers; these values remain consistently higher over time, though they are especially high in the early stages after the original post has been posted. If supported, this would provide evidence for the effect of prestige bias in online information diffusion. Importantly, while the CRP represents the efficiency of reposts in secondary spread, it does not capture the actual scale of distribution. Therefore, to further support our hypothesis, we also identify the proportion of the actual distribution of reposts in secondary spreads that can be attributed to influencers. This comprehensive approach allows us to examine not only the efficiency of influencers’ information spread, but also their overall impact on information diffusion in online communities.
We thus aim to provide a nuanced understanding of how user status and influence shape the dynamics of information spread in digital environments. Our study thus contributes to the broader understanding of information diffusion in online social networks, the role of influencers in this process, and the potential presence and impact of prestige bias in digital contexts. The results of this investigation have implications for our understanding of online information dynamics, influencer marketing strategies, and the design of social media platforms.
2 Results
2.1 Data Collection
Following established methodologies in previous studies [18, 21], we constructed repost cascades based on one month of Japanese-language posts sampled from Twitter (currently X) and their associated follower-followee relationships. The dataset comprised 55,882,528 source posts, 520,048,995 reposts, and 14,910,772 unique users. For detailed information on the data collection process and the rationale behind choosing Japanese-language posts, please refer to the 4.2 in Methods section.
2.2 User Influence Distribution and Source Post Popularity
To categorize users based on their influence as source spreaders, we employed the hg-index, which quantifies a user’s ability to consistently generate and spread original content. This metric extends the h-index [23], which is commonly used to measure scientific productivity, by incorporating additional factors to provide a more comprehensive measure of a user’s primary spread capability (details in Methods). Using the distribution of hg-index scores, we classified users into six influence categories through quantile binning: very high (top 1%), high (top 1-5%), upper-mid (top 5-10%), mid (top 10-30%), lower-mid (top 30-50%), and low (bottom 50%), where each category excludes its upper threshold. Table 1 presents a comprehensive overview of influence scores and the popularity of users’ original posts as measured by repost statistics for each user category. As shown in Table 1, influencers classified by the hg-index as having very high influence are highly effective source spreaders.
User influence | No. of users | Avg. hg-index | Avg. reposts | Freq. of reposts |
---|---|---|---|---|
(category) | (count) | (mean) | (mean) | (maximum) |
very high | 160,719 | 19.89 | 24.48 | 140,856 |
high | 545,992 | 5.69 | 4.39 | 63,496 |
upper-mid | 597,749 | 2.87 | 2.58 | 28,301 |
mid | 897,878 | 1.80 | 2.00 | 23,518 |
lower-mid | 3,416,185 | 1.00 | 1.48 | 16,407 |
low | 10,725,364 | 0.00 | 0.00 | 0 |
2.3 Analyzing Prestige Bias in Secondary Spread
We explain how the chain of secondary spread(i.e, the impact of user influence on information diffusion) changes for each category. The chain of secondary spread is measured by the CRP. This is calculated by determining whether reposts flowing in the virtual timeline created for each user are reposted by that user before aggregating the results for all users. To determine whether a repost occured, we use repost cascades. When user Y’s repost appears in user X’s virtual timeline, we check the post’s repost cascade to confirm if X subsequently shared that content. If such a repost by X is found in the cascade, we consider the content to have been shared. Please refer to Figure 2 (b) for details. Since we are interested in the initial CRP and how it changes, we focus particularly on the first 6 hours and the temporal dynamics.
Figure 3(a) shows the CRP within the first 6 hours after posting, categorized by the minimum number of reposts ( 1000, 1000, 5000, 10000) and user influence categories. The results demonstrate that users with very high influence consistently exhibit higher CRP values for posts with a high number of reposts ( 1000), indicating a higher ability to propagate information, even when they are not the original source. Notably, the difference in CRP between influence categories becomes more pronounced as the minimum number of reposts increases, suggesting that the impact of user influence is particularly strong for highly popular content.
To examine the temporal dynamics of this effect, we analyzed the CRP over a 24 hour period for cascades with 5000 reposts, as shown in Figure 3(b). The graph displays the CRP for different influence categories over 24 hours. While CRP generally decreases over time for all categories, users with very high influence maintain substantially higher CRP throughout the observation period, further emphasizing their sustained impact on information diffusion. These findings consistently suggest that user prestige enhances the perceived value or interest of shared content, increasing the likelihood of further diffusion.
Our analysis thus provides strong evidence for the role of prestige bias in information diffusion in online social networks, supporting our hypothesis. Users with higher influence, measured by the hg-index, show greater ability to propagate information, even when it was originally created by someone else. This aligns with the theoretical framework of prestige bias proposed by Henrich and Gil-White [13] and Jiménez and Mesoudi [14], extending this cognitive tendency to digital environments. Notably, this effect is particularly pronounced for posts with high popularity ( 1000 reposts) and persists throughout the initial diffusion stages. The CRP for influential users remains consistently higher over time, indicating their status enhances their information spread capability. However, prestige bias appears to more strongly affect popular content ( 1000 reposts), suggesting its influence varies across content types. These findings have significant implications for understanding information spread dynamics in social media and influential users’ role in shaping online discourse. The nuanced effect of prestige bias we observed highlights the critical importance of user status in information diffusion, while also indicating that content characteristics play a role.
2.4 Quantifying the Impact of Influencers on Secondary Spread
While CRP is a useful indicator of information propagation efficiency, it does not fully capture the scale of that propagation, which can be measured by the actual quantity of reposts. To address this limitation and provide a more comprehensive analysis of influencers’ role in shaping information flow through secondary spread, we investigated the proportion of views and reposts in each influencer category. In our analysis, we calculate the number of times users potentially see shared posts in their simulated timelines, which we refer to as views; these views represent instances where a user would encounter a repost in their timeline (Figure 2(a)). When analyzing the secondary spread in the virtual timeline, it is crucial to distinguish between these views and reposts, as views only indicate exposure to content, whereas reposts represent active propagation of the content to other users. By tracking views, we can determine how often reposts (secondary spread) by users with different influence levels are potentially seen in other users’ timelines. This allows us to quantify not only how many times content is reposted, but also its potential visibility across the repost cascade. Therefore, our approach provides insights into both the spread efficiency (measured by CRP) and the potential reach of information shared by different user categories. Figure 4 illustrates the proportion of users according to influence levels as well as the distribution of views and reposts in secondary spread according to influence categories.
The results reveal that users with very high influence, despite comprising only 1% of the user base (Figure 4(a)), account for 58.0% of views and 53.3% of reposts in secondary spread across all posts (Figure 4(b) above). This disproportionate influence becomes somewhat less pronounced when the number of reposts is large ( 5000), where very high influence users are responsible for 40.6% of views and 47.7% of reposts (Figure 4(b) bottom). Notably, for highly popular posts ( 5000 reposts), an interesting shift occurs in the behavior of very high influence users. While their overall share of both views and reposts decreases compared to all posts, their share of reposts (47.7%) now exceeds their share of views (40.6%). This contrasts with the pattern observed for all posts, where their share of views (58.0%) is higher than their share of reposts (53.3%). This reversal is particularly significant, given that reposts can only occur after a user views the content. This finding not only aligns with the high CRP observed for this category in Figure 3(b) but also indirectly demonstrates that when influencers share content, it generally receives high engagement from their followers, thereby contributing to its popularity. In this way, the role of very high influence users shifts with regards to viral content. Instead of merely exposing content to their large follower base, they are effectively amplifying the content through their own sharing (by encouraging reposting)). This amplifying effect of influencers plays a crucial role in accelerating and expanding the propagation of popular content through secondary spread.
2.5 Summary and Interpretation of Influencers’ Impact on Information Diffusion
Our analysis combining CRP with the quantification of view and repost shares provides a comprehensive picture of how user influence impacts information diffusion in online social networks. These results support our hypothesis regarding prestige bias and reveal the extent to which a small group of highly influential users shape information flow. As hypothesized, our findings align with the concept of prestige bias in several key ways:
-
1.
Users with very high influence consistently demonstrate higher CRP over time for popular posts (Figure 3) for popular posts.
-
2.
Very high influence users (1% of user base) account for 53.3% and 47.7%of reposts in secondary spread across all posts and posts with 5000 repost, respectively (Figure 4(b) above).
-
3.
For highly popular posts ( 5000 reposts), very high influence users’ share of reposts (47.7%) exceeds their share of views (40.6%) (Figure 4(b) bottom).
-
4.
Very high influence users maintain substantially higher CRP throughout the 24 hour observation period (Figure 3(b)).
Importantly, these effects are not due to more frequent reposting by highly influential users. Analysis in Supplementary Material LABEL:sup-reposting-behavior demonstrates that very high influence users do not repost more often than users in other influence categories. Instead, the impact comes from the wider reach of influential users’ reposts and the higher likelihood of those reposts being shared further (higher CRP). This confirms that prestige bias manifests through enhanced diffusion of content shared by influencers. As discussed above, for highly popular posts ( 5000 reposts), we observe a notable shift in very high influence users’ impact; their share of reposts exceeds their share of views, reversing the pattern seen in all posts (Figure 4 (b)). This reversal in the view-to-repost ratio for viral content suggests that influential users become even more effective at driving engagement as content gains popularity, indicating that prestige bias plays a particularly potent role in amplifying the spread of viral information. To further investigate this phenomenon, we conducted additional analyses on the relationship between user influence and virality, which can be found in Supplementary Material LABEL:sup-viral-diffusion. In summary, our results provide strong evidence for the effect of prestige bias in online information sharing. Content reposted by influential users consistently reaches a broader audience and is diffused more widely, supporting our initial hypothesis. These findings highlight the significant role that a small group of highly influential users play in shaping information diffusion patterns on social media platforms. This has important implications for understanding and potentially managing information diffusion in digital environments.
3 Discussion
In this study, we introduced new concepts (secondary spread and CRP) to test the hypothesis that influencers (users with high hg-index scores) more effectively propagate information when resharing others’ content than non-influencers; this analysis was conducted using one month of data Japanese-language posts on Twitter (currently X).
Our analysis revealed that influencers consistently demonstrate significantly higher CRP for popular posts ( 1,000 reposts) in secondary spread compared to non-influencers. The CRP indicating influencers’ impact remained high over time, showing a robust sustained influence on information diffusion. Moreover, influencers were found to substantially impact the actual distribution of reposts in secondary spread. Over half of the views and reposts in secondary spread could be attributed to the top 1% of users, with this trend being even more pronounced for highly popular posts. One might argue that the CRP value is inherently low in absolute terms (e.g., 0.01). However, this is likely because we count all reposts appearing in the virtual timeline as views. If we could evaluate only the posts that users actually viewed, the probability might be slightly higher. Nevertheless, as suggested by previous studies [18, 20], consecutive reposts occurring in sequence remains a rare phenomenon. Even if the absolute term of the CRP is low, its relatively high value in the early stages can still be considered a crucial metric if it influences the subsequent size of the cascade for that post.
Our findings provide strong evidence for the effect of prestige bias in online social networks. Users with higher influence, as measured by their hg-index, consistently demonstrate a greater ability to propagate information, even when that information is not their own original content. This effect persists throughout the important early stages of information diffusion. These results align with the theoretical framework of prestige bias proposed in anthropology and psychology [13, 14], suggesting that this cognitive tendency extends to digital environments. Influential users’ consistently higher CRP values indicate that their status enhances their ability to spread information, supporting our initial hypothesis. The novelty of this research lies in its integration of the roles of influencers as source spreaders (originators of content) and brokers (information intermediaries), which have been the focus of previous influencer studies. By introducing the concepts of secondary spread and CRP, we have revealed how influencers effectively fulfill both roles, providing a new perspective that bridges existing research areas. Furthermore, this study empirically demonstrates the effect of prestige bias in online communities. We have shown that information reposted by influencers tends to spread more widely and persistently.
Interestingly, influencers did not demonstrate much influence in relation to unpopular posts. This finding suggests that prestige bias may depend both on the influencer and content characteristics. Traditional theories assumed that prestige bias uniformly affected all kinds of information; our results suggest this may not be the case in online communities. In this regard, the research by Acerbi and Tehrani [24] provides important insights. Their experiments showed that in selecting quotations the content was more important than their attribution to famous individuals. This result suggests that in online environments, the quality of content may be more pertinent in evaluating information than the status of the sender. However, our results also suggest that he prestige of the sharer information may becomes important when content has strong appeal. That is, prestige bias may have a stronger role when the quality of content is equivalent. Moreover, while previous research such as that by Brand et al. [25] has shown that source expertise influences the perceived reliability of information on social media, our findings significantly extend this understanding.
As detailed in Supplementary Material LABEL:sup-viral-diffusion, our analysis of structural virality [26] in widely diffused posts reveals that influencers’ impact actively shapes information diffusion patterns. Posts reposted by influencers not only reach a wider audience but also exhibit higher structural virality, indicating more complex and extensive diffusion. This, coupled with the high CRP observed among influencers for potentially viral content, suggests a dual role for influencers: they act as both broadcasters, directly reaching large audiences, and amplifiers, significantly increasing the likelihood of further sharing by their followers. By enhancing the shareability of their posts (and reposts), influencers create a ripple effect extending beyond their direct connections. Identifying this amplifying role in viral diffusion represents a significant advance in our understanding of social media information spread, moving beyond simple wide-reach effects to a more nuanced view of influencers’ impact on content propagation.
This research provides a bridge linking cultural evolution to a series of studies on online information diffusion from a computational social science perspective. Cultural evolution research has presented many insights into the adaptive aspects of human cognition through psychological experiments and mathematical simulations. Meanwhile, information diffusion research has presented quantitative analyses of the mechanisms of misinformation spread and echo chamber phenomena through analysis of large-scale data from social network services. By integrating these two research lines, our study confirmed that adaptive theories relating to and hypotheses of cognition presented in cultural evolution theory can be observed even in social networks, a contemporary digital environment for human relationships. Furthermore, we demonstrated that prestige bias manifests in online communities, where information can be transmitted to others through easy sharing functionalities.
Our results emphasize the importance of influencers in marketing on social media [17, 2, 18, 5]. The fact that influencers’ influence is stronger for certain types of content (i.e., influencers’ impact on secondary spread varies depending on the content) can aid decision-making around influencer marketing strategies. These results implicitly suggest that influencers play a role in the spread of misinformation, which has become a significant social issue. In recent years, research has attempted to propose solutions to the issue of misinformation from the perspectives of education and technology [27] and behavioral science [28]. Notably, it has been suggested that influencers (especially experts) can be effective in correcting misinformation [7]. The present study complements this finding (which addresses the perspective of the sender or influencer) from the perspective of the receiver; our results suggest that information resharing (secondary spread) by influencers can also play an important role in the correction of misinformation. Notably, the study by Lim et al. [7] dealt with a limited case of correcting specific medical terms, and caution is needed in generalizing this to the correction of broader misinformation or information diffusion in general. Nevertheless, our results suggest that it is important to effectively utilize the power of influencers when addressing misinformation, and provide useful hints for narrowing down the targets of the application. This approach may enhance the efficiency and effectiveness of misinformation countermeasures.
In conclusion, this study empirically demonstrated the existence and function of prestige bias in online communities and revealed that influencers have a significant impact on information diffusion through secondary spread. These findings make important contributions to understanding the dynamics of information diffusion in social media and the role of influencers. The concepts of primary and secondary spread proposed in this study, as well as the framework using virtual timeline and repost cascades, provide new perspectives for research on information diffusion in online communities. This research deepens our understanding of the influence of social media and provides insights into the complex interactions between user status, content sharing, and information propagation in online spaces. These findings can inform various applications, such as the development of more effective information diffusion strategies and measures against misinformation spread.
4 Methods
4.1 Data Collection and Preprocessing
We sampled Japanese-language reposts from Twitter (currently X) using the streaming API from October 1 to October 1 to 31, 2021. We chose Japanese posts as the object of analysis for the following reasons:
-
1.
The existence of a large Japanese-speaking community on Twitter
-
2.
A close correspondence between the Japanese language and national borders, enabling us to control for variations in cultural and social background
-
3.
A more limited and homogeneous sample compared to English-language posts
Our one-month data collection period aligns with timeframes used in previous research [18, 9]. We chose this timeframe for several reasons. First, our large-scale sampling method enabled us to gather a volume of data comparable to that of studies that utilized several years’ worth of information. Secondly, considering that the primary focus of this research is on the relationship between influencers and instantaneous virality, a one-month period provides a sufficiently comprehensive snapshot to capture these dynamics.
To reconstruct follower-followee relationships as of October 2021, we sampled users every other day during September 2021, retrieved their following and follower lists, and synthesized these snapshots. Notably, we were unable to capture private accounts, for which follower/following information is not publicly accessible. Our network reconstruction is thus limited to publicly visible connections and provides an approximation of the actual user links during the period; there may be some discrepancies with regards to actual follower-followee relationships.
Notably, our analysis focused solely on simple reposts in Twitter and excluded quote posts. This limitation is an important point to consider when interpreting the results.
4.2 Quantifying User Influence
To quantify user influence, we adopted the hg-index [22], which combines the h-index [23] and g-index [29]. This metric was originally developed to evaluate scientific productivity, where citations are used as a measure of impact. In our application, we treat reposts of a user’s posts as analogous to citations, allowing us to quantify a user’s influence in social media contexts. This approach is able to balance consistent diffusion power (captured by the h-index) and the scale of reposts (captured by the g-index, similar to degree centrality).
While network centrality measures such as degree centrality and PageRank are commonly used to analyze the influence of source spreaders in social networks [1], they may disproportionately emphasize users with a single viral post. Lü et al. [30] demonstrated through a Susceptible-Infected-Recovered (SIR) model that the h-index, despite being strongly correlated with degree centrality and coreness, more appropriately evaluates node influence. However, the h-index alone does not sufficiently account for the total number of reposts, which is important in social networks. Therefore, we adopted the hg-index as a more robust measure that considers both sustained influence (through the h-index) and engagement of individual posts (through the g-index).
The hg-index was calculated using the following algorithm:
-
1.
Sort all posts by a user in descending order of repost count
-
2.
Calculate h-index: the largest h where the h-th post has at least h reposts
-
3.
Calculate g-index: the largest g where the top g posts have at least total reposts
-
4.
hg-index =
Importantly, the hg-index mitigates the overestimation of influence based solely on total or maximum repost count, which can be skewed by a single viral post. Instead, it emphasizes users who achieve consistent engagement over time.
Based on this metric, we classified users into six influence categories: very high (top 1%), high (top 1-5%), upper-mid (top 5-10%), mid (top 10-30%), lower-mid (top 30-50%), and low (bottom 50%). Note that these categories are exclusive of their upper thresholds, meaning for example that the high category includes users above the 1% threshold but below the 5% threshold.
4.3 Construction and Analysis of Repost Cascades
Repost cascades (information cascades) model how information spreads across social networks. This concept, proposed by Bikhchandani et al. [19] and further developed by Watts [20] and Kempe et al. [2], was applied in our study to capture repost chains on Twitter, following methodologies similar to those of Goel et al. [18] and Vosoughi et al. [21].
We constructed repost cascades using the following rules:
-
1.
Set the original posts as the cascade root
-
2.
Select the temporally closest potential parent as the actual parent
-
3.
Exclude official accounts (identified by specific keywords in their account name, screen name, or profile)
For a given repost, if there are multiple potential parent posts, we generally consider the repost with the most recent timestamp as the parent. This approach is based on our inference of how Twitter’s algorithm functioned at the time of the study. Using this method, we constructed 4,882,985 repost cascades.
By analyzing these cascades, we can understand how information propagates between users; we can determine the source of a user’s repost through the cascade. Additionally, we can now calculate the depth and structural virality[26] of partial cascades formed by propagation from a specific reposted user.
4.4 Virtual Timeline
We constructed a virtual timeline from sampled data and follower-followee relationships, as illustrated in Figure 2(a). This approach is inspired by the method used by Vosoughi et al. [21]. This virtual timeline simulates a single user’s timeline, providing a realistic representation of how users encounter and interact with content when scrolling through social media feeds. Notably, in our study this virtual timeline focuses exclusively on reposts.
We arranged reposts by each user’s followees in chronological order, applying the following conditions:
-
1.
Include only users who made at least one repost during the period, in order to remove dormant accounts
-
2.
Exclude official accounts (identified by specific keywords in their account name, screen name, or profile) from secondary spread cascades, as official users have different motivations for reposting. However, source posts from these accounts are not excluded from the analysis.
-
3.
Remove subsequent reposts by a user’s followees for posts after the user reposted
This method enabled us to analyze what information users have been exposed to and how secondary spread occurs.
The virtual timeline approach allows us to simulate users’ exposure of users to content in a way that closely mimics real-world social media interactions. By focusing on reposts, we can more effectively track information diffusion through the network.
4.5 Cascading Repost Probability
To quantify the efficiency of information spread in secondary spread, we developed the concept of CRP. Using the virtual timeline approach, we consider a post as reposted if it appears in a repost cascade, allowing us to calculate the probability of a user’s secondary spread continuing, as shown in Figure 2(b).
The CRP quantifies the likelihood of a repost being further shared, enabling us to evaluate the efficiency of repost diffusion by users with varying levels of influence. It thus provides a measure of a user’s influence in the context of secondary spread, which can be formalized as follows.
The CRP measures the likelihood of a repost being further shared, defined as:
(1) |
As illustrated in Figure 2(c), for a single reposted user whose reposted content is viewed by three followers and further shared by two, the CRP would be , indicating that 66% of repost views led to further sharing.
To calculate the CRP for our analysis, we aggregated view counts and repost numbers across all users’ timelines. This approach allowed us to assess the extent to which users promote information spread, taking into account both the reach of their posts and the likelihood of those posts being further shared.
4.6 Ethical Considerations
As this study used only publicly available data and did not involve human subjects, it was exempt from ethics review according to the guidelines of the authors’ affiliated institutions.
5 Acknowledgement
This work was supported by JSPS KAKENHI Grant Numbers JP22K18150, JP23K28376. Initial English language improvements were assisted by AI chat services (Anthropic’s Claude 3.5 Sonnet and OpenAI’s ChatGPT). The final English language editing was provided by Editage (www.editage.jp), for which we are grateful.
References
- Pei et al. [2014] Sen Pei, Lev Muchnik, José S. Andrade Jr., Zhiming Zheng, and Hernán A. Makse. Searching for superspreaders of information in real-world social media. Scientific Reports, 4(1):5547, 2014.
- Kempe et al. [2003] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137–146, 2003.
- Costello and Yesiloglu [2020] Joyce Costello and Sevil Yesiloglu, editors. Influencer Marketing Building Brand Communities and Engagement. Routledge, 2020.
- Zietek [2016] Nathalie Zietek. Influencer marketing : the characteristics and components of fashion influencer marketing. Master’s thesis, University of Borås, Faculty of Textiles, Engineering and Business, 2016.
- Watts and Peretti [2007] Duncan J. Watts and Jonah Peretti. Viral marketing for the real world. Harvard Business Review, 85(5), 2007.
- Murakami [2021] Yuri Murakami. Artist, retweeters ordered to pay journalist over ‘fake rape’ posts, 2021. URL https://www.asahi.com/ajw/articles/14492778. Accessed on September 26, 2024.
- Lim et al. [2022] Dongwoo Lim, Fujio Toriumi, and Mitsuo Yoshida. Do you trust experts on Twitter? Successful correction of COVID-19-related misinformation. In IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pages 518–523, 2022.
- Morone and Makse [2015] Flaviano Morone and Hernán A. Makse. Influence maximization in complex networks through optimal percolation. Nature, 524(7563):65–68, 2015.
- Tsugawa and Watabe [2023] Sho Tsugawa and Kohei Watabe. Identifying Influential Brokers on Social Media from Social Network Structure. Proceedings of the International AAAI Conference on Web and Social Media, 17(1):842–853, 2023.
- Acerbi [2022] Alberto Acerbi. From Storytelling to Facebook. Human Nature, 33(2):132–144, 2022.
- Burt [2000] Ronald S. Burt. The Network Structure Of Social Capital. Research in Organizational Behavior, 22:345–423, 2000.
- Theo Araujo and Vliegenthart [2017] Peter Neijens Theo Araujo and Rens Vliegenthart. Getting the word out on Twitter: the role of influentials, information brokers and strong ties in building word-of-mouth for brands. International Journal of Advertising, 36(3):496–513, 2017.
- Henrich and Gil-White [2001] Joseph Henrich and Francisco J. Gil-White. The evolution of prestige: freely conferred deference as a mechanism for enhancing the benefits of cultural transmission. Evolution and Human Behavior, 22(3):165–196, 2001.
- Jiménez and Mesoudi [2019] Ángel V. Jiménez and Alex Mesoudi. Prestige-biased social learning: current evidence and outstanding questions. Palgrave Communications, 5(1):20, 2019.
- Henrich and McElreath [2003] Joseph Henrich and Richard McElreath. The evolution of cultural evolution. Evolutionary Anthropology: Issues, News, and Reviews, 12(3):123–135, 2003.
- Katz et al. [1955] Elihu Katz, Paul F. Lazarsfeld, and Elmo Roper. Personal Influence: The Part Played by People in the Flow of Mass Communications. The Free Press, 1955.
- Rogers [2003] Everett M. Rogers. Diffusion of Innovations, 5th Edition. Free Press, 2003.
- Goel et al. [2012] Sharad Goel, Duncan J. Watts, and Daniel G. Goldstein. The structure of online diffusion networks. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 623–638, 2012.
- Bikhchandani et al. [1992] Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, 100(5):992–1026, 1992.
- Watts [2002] Duncan J. Watts. A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences, 99(9):5766–5771, 2002.
- Vosoughi et al. [2018] Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true and false news online. Science, 359(6380):1146–1151, 2018.
- Alonso et al. [2010] Sergio Alonso, Fancisco. Javier Cabrerizo, Enrique Herrera-Viedma, and Francisco Herrera. hg-index: a new index to characterize the scientific output of researchers based on the h- and g-indices. Scientometrics, 82(2):391–400, 2010.
- Hirsch [2005] Jorge E Hirsch. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46):16569–16572, 2005.
- Acerbi and Tehrani [2018] Alberto Acerbi and Jamshid J. Tehrani. Did Einstein Really Say that? Testing Content Versus Context in the Cultural Selection of Quotations. Journal of Cognition and Culture, 18(3-4):293–311, 2018.
- Brand et al. [2021] Charlotte O. Brand, Alex Mesoudi, and Thomas J. H. Morgan. Trusting the experts: The domain-specificity of prestige-biased social learning. PLOS ONE, 16(8):1–15, 2021.
- Goel et al. [2016] Sharad Goel, Ashton Anderson, Jake Hofman, and Duncan J. Watts. The structural virality of online diffusion. Manage. Sci., 62(1):180–196, 2016.
- Toriumi and Yamamoto [2024] Fujio Toriumi and Tatsuhiko Yamamoto. Informational Health –Toward the Reduction of Risks in the Information Space, 2024. URL https://arxiv.org/abs/2407.14634.
- Lorenz-Spreen et al. [2020] Philipp Lorenz-Spreen, Stephan Lewandowsky, Cass R. Sunstein, and Ralph Hertwig. How behavioural sciences can promote truth, autonomy and democratic discourse online. Nat. Hum. Behav., 4(11):1102–1109, 2020.
- Egghe [2006] Leo Egghe. Theory and practise of the g-index. Scientometrics, 69(1):131–152, 2006.
- Lü et al. [2016] Linyuan Lü, Tao Zhou, Qian-Ming Zhang, and H. Eugene Stanley. The H-index of a network node and its relation to degree and coreness. Nature Communications, 7(1):10168, 2016.