1 Introduction

Social robots have been shown to effectively elicit socially meaningful behaviours and emotions from humans across a number of experimental and real-world contexts [1,2,3]. Nevertheless, one of the challenges to human–robot interaction (HRI) research is employing and evaluating long-term interactions, especially in people’s natural settings. Since interactions with social robots are novel and exciting for many people, one particular concern in this specific area of HRI is the extent to which behavioural and emotional expressions might develop from initial interactions with a robot, when its novelty is particularly salient, to responses, behaviours, and perceptions that are sustained over time [4, 5]. Empirical studies in this area of HRI research are often limited to controlled laboratory settings, due to various logistical (e.g., limited number of robots per lab and robots’ high cost) and technical factors (e.g., multiple computers or other controlling devices required to coordinate a robot’s behaviours and/or requirements for skilled Wizard-of-Oz (WoZ) operation). These challenges can make it difficult for HRI researchers to gain insights into factors that shape people’s long-term interactions with social robots in natural, real-world settings. This is especially noticeable with studies that are focused on evaluating the use and utility of social robots in social settings. Robots for these settings are often designed to interact and communicate with humans or other agents (such as pets or other robots) by following social scripts and rules relevant to their role and function within a given social setting [1, 6]. Our understanding of social robots’ potential scope and limitations will be substantially informed via experiments where people can interact across longer periods of time with robots in natural social settings, such as within one’s home, workplace, local clinic, or school.

Overcoming this challenge will be particularly important when devising interactions to support people’s well-being. Social robots are widely studied and are gradually being introduced in care settings, aimed at supporting people’s physical and mental well-being [1]. However, due to the complexity of administering social robotic interventions, studies in the field rarely establish ecologically valid interactions with human users, and instead often implement insufficient methods (e.g., using single-subject studies, quasi-experimental designs, cross-sectional research designs, etc.), or explore single interactions rather than ongoing longitudinal interventions [see 7].

Considering social robots’ social features [8, 9], animated design [2, 10], and autonomous abilities [1], social robots situated in people’s homes hold potential for helping to monitor physical health, as well as improve emotional well-being by engaging in conversation that fosters self-disclosure. Self-disclosure is a communication behaviour aimed at introducing and revealing oneself to others, and it plays a key role in building relationships between two individuals [12, 13]. It serves an evolutionary function of strengthening interpersonal relationships, while also fostering a wide variety of health benefits, including helping people to cope with stress and traumatic events through eliciting help and support [14,15,16]. Moreover, self-disclosure appears to play a critical role in successful treatment outcomes [17] and has a positive impact on mental and physical health [18]. For health interventions to succeed, they depend on open channels of communication where individuals can disclose needs and emotions, from which a listener can identify stressors and respond accordingly [19, 20]. This is crucial for interventions with social robots, as human behaviour and emotions are analysed and synthesized by machines from human output, to respond and react appropriately [21].

Given the necessity of studying social robotic interventions over longer time courses than what are often seen in one-off laboratory studies, as well as the importance of self-disclosure for psychological health and HRI, here we aimed to evaluate people’s self-disclosure during interactions with a social robot over time. More specifically, we explored how prolonged and intensive interactions with a social robot affect people’s self-disclosure behaviour toward the robot, perceptions of the robot, and additional factors related to well-being. Therefore, we were asking—

RQ1: To what extent are people’s self-disclosures, perceptions of the robot, and well-being affected over time during long-term interactions with a social robot?

To build a more complete understanding of the application of social robots in different emotional settings, we were also interested in the role of the interaction’s discussion frame. Hence, we were also asking—

RQ2: To what extent are people’s self-disclosures, perceptions of the robot, and well-being, affected due to the discussion frame during long-term interactions with a social robot?

To address these research questions, we conducted a mediated long-term online experiment with participants conversing with a social robot 10 times over 5 weeks about general everyday topics. Participants were allocated to two groups, discussing topics framed in the context of the Covid-19 pandemic, or the same topic except the discussion had no explicit mention of the Covid-19 pandemic.

2 Related Work

The mere-exposure effect refers to the psychological phenomenon where people develop a preference for things they are repeatedly exposed to [22]. In the context of long-term HRI, the mere-exposure effect operates differently compared to the realm of Human–Computer Interaction (HCI) focusing on usability. In HCIs, users often acquire a positive attitude towards tools and objects through repeated use and familiarity, leading to improved usability [23]. However, in the domain of HRI, where social interaction with robots is a key component [1], the dynamics change. The social dynamic in HRI sets it apart from traditional HCI. Unlike HCI, where the focus is primarily on optimizing usability and functionality [23, 24], HRI incorporates a social component, aiming to create a sense of companionship or collaboration [e.g., 25]. Robots are designed to engage with humans in a more social manner, simulating human-like behaviours, gestures, and communication. This social element introduces a unique dynamic, where humans naturally seek to establish social connections, rapport, and even attribute human-like qualities to the robots [1, 9]. The social dynamics in HRI require a deeper understanding of human–robot communication beyond usability, and instead focus on the development of a social bond between humans and robots.

Longitudinal designs are important for understanding people’s long-term adaptation of social robots, and moreover, to further understand human behaviour and perception of social robots and how it changes over time [26]. While single interaction studies provide us with interesting insights into human behaviour when engaging with robots, we are often challenged to learn from these studies as "in the wild" application of robots aims to develop machines that people interact with over sustained periods of time [1, 27]. One of the most significant (and common) limitations to one-off HRI studies relates to novelty effects [see 4, 5], while long-term studies have often found evidence for reduced engagement with various robotic platforms over time [26, 28]. As social robots are a new emerging technology that is still novel and exciting for most, users often have high expectations for social robots and experience dissonance when a social robot’s performance fails to meet their expectations. Accordingly, when users interact with robots over time, they tend to perceive them as being less social as interactions go on as their expectations of the robot are not being fulfilled [5]. Previous studies show that even household robotic devices that are not particularly social (like the Roomba vacuum cleaner) suffer from the novelty effect [29], with users being excited about the robotic device at first and using it less as they get familiar with it.

Due to the constrained and highly choreographed nature of many HRI studies, deep insights into people’s responses and interactions with robots in natural settings remain relatively rare. Of the field studies that have conducted HRI research in these spaces, important insights are emerging from both single interaction [e.g., 30, 31] and repeated interaction [e.g., 32, 33, 34, 35, 36, 37, 38] studies, with much of this work taking place in public spaces or tied to specific settings like education [e.g., 36, 39, 40, 41, 42, 43], care [e.g., 44, 45, 37, 46, 38, 47], or rehabilitation [e.g., 32, 48, 35, 49, 50, 51]. Longitudinal studies that address similar questions with disembodied agents such as virtual assistants and chatbots [e.g., 52, 53, 54] benefit from access to users’ personal devices, whereas research with physically embodied artificial agents (i.e., social robots) remains far rarer due to challenges with logistical and cost barriers to situating these devices in users’ domestic settings (i.e., in their home environment) to explore single or repeated interactions. While several attempts have been made before to reduce the barriers to feasibility for such work [c.f., 55, 56, 33, 34, 38], we still know relatively little about user perceptions of and behaviours toward social robots when these take place in familiar home environments. Further insights into the challenges and opportunities afforded by placing social robots into familiar domestic settings should aid human-robot communication in general, as well as further refine the development and utility of these machines for commercial use.

2.1 Social Robots for Well Being

Social robots hold great potential for delivering or improving psycho-social interventions [7], supporting mental health [57], monitoring symptoms of chronic psychopathologies [58], aiding rehabilitation [35] and providing much-needed physical and social support across a number of daily life settings [1]. For example, a previous study by Nomura and colleagues [59] showed the benefits of employing social robots for minimising social tensions and anxieties, describing that participants with higher social anxiety tended to feel less anxious and demonstrate lower tensions when knowing that they would interact with robots in opposition to humans. In fact, a recent paper [60] stresses the benefits of employing social robots as interventions for social anxiety, stating that these could complement the support provided by clinicians. The authors explain that social robots could support people to get into therapy and maximize the effectiveness of the therapy by increasing the patients’ engagement and continuing the support outside the therapy session. In a previous paper [58] we addressed similar benefits of using social robots for diagnosing and treating people suffering from post-traumatic stress disorder (PTSD), social robots can assist with overcoming several logistical and social barriers that trauma survivors face when required to monitor symptoms and when seeking mental health interventions.

Beyond supporting people with clinically diagnosed psychopathologies like PTSD and anxiety, social robots could also provide emotional support via self-managed interventions to healthy individuals that might experience difficult emotional situations and stressors in their daily lives. Previous studies administered the application of social robots in emotionally supportive settings showing meaningful outcomes in terms of cognitive change and affect. A study by Bodala and colleagues [47] employed a social robot delivering teleoperated mindfulness coaching for five weeks. Another example by Axelsson and colleagues [61] tested a robotic coach conducting positive psychology exercises, showing positive mood change after participation in the robotic intervention. Robotic interventions for people’s well-being are rarely taking place in people’s homes. One successful example is a study employing the social robot Jibo as a positive psychology coach to improve students’ psychological well-being in students’ on-campus housing. The study results describe a positive effect on students’ psychological well-being with positive mood change, and also students expressing their motivation to change their psychological well being [38]. Other studies show positive outcomes in terms of behavioural change. A series of studies by Robinson and colleagues used social robots to deliver behaviour change interventions, applying verbal motivational interventions for reducing high-calorie snack consumption. The studies showed promising results addressing the behavioural change using objective measurements like weight loss [see 45], and also via qualitative data addressing the subjective experiences of the participants during such interventions [see 62]. A similar intervention have been tested with a clinical population showing potential for using social robots for diabetes management [44].

The recent COVID-19 pandemic further illuminated the potential of social robots as an assistive technology in times when strict infection control measures mandate physical distancing between people. Several researchers have argued that physically embodied social robots should be able to assist with a number of tasks to help keep people physically and mentally healthy, ranging from temperature taking and food and supply delivery to providing companionship for individuals suffering from loneliness [1, 63,64,65,66], and even mediating social interactions with other individuals [67]. Nevertheless, as discussions concerning the potential applications for social robots became more prominent during the pandemic, HRI research was limited due to social distancing and the inability to use lab facilities for research that is highly dependent on laboratory-constrained environments. The pandemic forced most individuals (including researchers) to adopt computer-mediated means of communication (CMC) [68]. Following the wholesale adaptation to CMC during the pandemic, the current research sets forth a means for conducting rigorous and reproducible social robotics research to explore people’s engagement with social robot-mediated interactions within their own homes. More generally, this research sets the stage for further research exploring online mediated speech-based psychosocial interventions with social robots when public health, cost, or logistical barriers prevent situating a physically embodied robot in users’ homes across the long term.

2.2 Self-Disclosing to Robots and Artificial Agents

Several studies address self-disclosure to social robots in single sessions [e.g., 3, 69, 70, 71, 72, 73], however, few studies to date have addressed self-disclosures to robots in long-term settings [e.g., 74]. Previous studies describe that in single interactions, people’s subjective perceptions of their self-disclosures to robots tend to align objectively well with their actual disclosures. Moreover, people tend to share more information with humans than with humanoid social robots or other artificial agents [3]. Yet, a different study by Nomura and colleagues [59] found that speech interactions with a social robot elicited lower tensions compared to interactions with a human agent. Another recent study explains that people might self-disclose more to a robot when conversing with a robot that changes their listening attitude [72]. Long-term studies with disembodied conversational agents give us some evidence of the nature of long-term self-disclosure to artificial agents. For example, a longitudinal study by Croes and Antheunis [52] tested long-term interactions with the chatbot Mitsuku via 7 interactions that were conducted over 3 weeks. Their results show that social processes decreased after each interaction with Mitsuku and that participants reported lower feelings of friendship with Mitsuku across sessions. They describe the presence of the novelty effect, with participants describing how Mitsuku became predictable after the first session. An additional study by Croes and Antheunis [53] showed that in self-disclosure interactions despite feeling more anonymous when interacting with chatbots, people trust humans more than they trust chatbots and reported for higher degrees of social presence.

2.3 Using Self-Disclosure for a Social Robotic Intervention to Support Mental Health

The literature describes that various forms of human–human self-disclosure can support and improve mood and provide a convenient space for concealment and regulating emotions with many health benefits. For example, James Pennebaker writing disclosure paradigm [75, 76] helps people to facilitate their emotions when writing about their own experiences. Previous studies have reported that people in a bad mood benefited more from disclosing to a robot than participating in writing disclosures in a journal [70] or on social media [77]. Another good example is affect labelling, a simple and implicit emotional regulation technique aimed at explicitly expressing emotions, or in other words—putting feelings into words [78]. In addition, the act of self-disclosure is highly useful for emotional introspective process, self-reflection on one’s emotions, actions, and behaviours [79], and is a meaningful act of mindfulness [80]. In the previous section, we mentioned several studies applying social robots in emotionally supportive settings [see 47, 61, 38], but social robotic interventions rarely encourage open self-disclosure [see 81]. Considering the vast evidence for the positive effect of self-disclosure on emotional well-being, our behavioural paradigm was aimed at encouraging participants to self-disclose to a social robot as a therapeutic activity. Engaging a robot in a reciprocal conversational interaction is a complex technical task, that might negatively affect people’s disclosures and perceptions of the robot and the interaction due to the robot’s communication limitations. However, here we suggest that employing a social robot for encouraging and listening to people’s disclosure (an act that would not be as limiting to the robot’s communication skills) would have a positive effect on people’s disclosures and perceptions. Furthermore, we suspect that by engaging people in self-disclosures to a social robot, participants would be engaged in affect labelling [78] and it will positively affect their well-being.

3 Methods

Consistent with recent proposals [82, 83], we pre-registered the study and report for how we determined our sample size, all data exclusions, all manipulations and all measures in the study [see 84]. In addition, following open science initiatives [e.g., 85], the de-identified data set, stimuli and analysis code associated with this study are freely available online [86]. By making the data available, we enable and encourage others to pursue tests of alternative hypotheses, as well as more exploratory analyses.

3.1 Experimental Design

A between-groups 2 (Discussion Frame: Covid-19 related or general) by 10 (chat sessions across time) repeated measures experimental design was followed. Participants were randomly assigned to one of the two discussion frames groups, according to which they conversed with the robot Pepper (SoftBank Robotics) via Zoom video chats about general everyday topics (e.g., social relationships, work-life balance, health and well-being; see Table 1) for 10 sessions. One group’s conversation topics were framed within the context of the Covid-19 pandemic (e.g., social relationships during the pandemic, sustaining mental health during the pandemic, etc.), whereas the other group’s conversation topics were similar, except no explicit mention of the Covid-19 pandemic was ever made (see Sect. 3.4.3). Each interaction consisted of the robot asking the participant 3 questions (x3 repetitions). The topic of each interaction was assigned randomly before the experimental procedure started, as was the order of the questions. Participants were scheduled to interact with the robot twice a week during prearranged times for five weeks.

3.2 Participants

3.2.1 Sample

A priori power calculations using G*power software [87, 88] suggest that for reasonable power (0.83) to detect small to medium effect sizes, a sample size of 22 participants would be required. Due to the relatively complex data collection procedure and the potential for a high dropout rate, we recruited 40 participants via the Prolific website. One participant dropped out, resulting in a final sample size of 39 participants. Participants were between the ages of 18 and 60 (M = 36.41, SD = 12.20), 54% identify as females, and the rest identify as males. More than half (59%) of the sample reports having a Bachelor’s degree as their highest level of education, and more than half of the sample (51.3%) are employed full-time. 55% of the sample are either married (33.3%) or in a relationship (21.7%). 41% of the sample have at least one child. Most of the participants (97.4%) did not live on their own during their participation in the study, with an average number of 3.36 individuals (SD = 1.37) in a household (including the participant). Almost all of the participants (92%) did not have previous experience with robots.

3.2.2 Target Population

The target population for this study was exclusively adults from the general population aged 18 or over with normal to corrected to normal vision, no known mental disability, hearing loss or difficulties, or physical handicap, native English speaking, and currently residing in Great Britain. Due to the technical requirements of the mediated experimental design, the target population of this study consisted of individuals with access to a personal computer with Zoom installed, a functioning web camera, a stable internet connection, a microphone, and speakers/headphones.

3.2.3 Recruitment

Participants were recruited via Prolific and were allowed to participate only after confirming that they were older than 18 years, are native English speakers, and have access to a computer with Zoom installed as well as a decent web camera, stable internet connection, microphone, and speakers/headphones. Also, Prolific users were asked to commit to attending 2 sessions a week across 5 weeks. Eligible Prolific users could access the Prolific page of the study to receive further information, consider their participation, and complete the induction questionnaire if interested. On the Prolific page of the study (of the induction questionnaire—Session 0) and in the induction questionnaire Qualtrics form, Prolific users were introduced to the study, the task, and the available time slots as part of the longitudinal experiment schedule. After receiving this information about the study’s requirements, Prolific users were then asked if they would like to continue in the study by declaring that they can commit to the study’s requirements. Finally, Prolific users were then asked to choose their participation time slots, after which they received a participant number to start their participation. Participants were paid a total of \(\pounds \)3 for every 30 min of participation or participation session if it lasted less than 30 min. Participants who completed all 10 sessions were paid an extra \(\pounds \)20 after their final interaction. A detailed description of the recruitment procedure and a full list with specific Prolific filters used for participant recruitment can be found in the study’s OSF page [see 86].

3.2.4 Ethics and Communication

All study procedures were approved by the research ethics committee of the University of Glasgow (ethics approval numbers 300200094 & 300200132). All participants provided written informed consent before participating in the study. Participants were asked to provide, if they wished, optional consent to allow the research team to use their video and audio footage (including videos, audio, and photos made from video material) as materials for research publications, conference presentations, and other multimedia outputs that can and might be disseminated and distributed online, in the media and for public presentations. All Prolific users interested in participating in the study were introduced to the study, the requirements of the study, and the task, but were not informed about the functionalities of the robot Pepper, to ensure all robot knowledge or priming was minimised. During each session (including Session 0), participants were re-introduced to the study, the study’s schedule (about their chosen day of participation), and received reminders and information about what the study involves. Furthermore, they were reminded about the benefits and risks of their participation (i.e., ensuring that they would receive their payment, no risks were anticipated as a result of study involvement, and their right to withdraw their participation at any time with no penalty or punishment). Participants were further informed how their data (i.e., behavioural and self-reported data collected in the study) would be used and again reminded of their right to withdraw their data and/or ask that it not be used at any time during or after their participation. Participants were guaranteed that their right to privacy and anonymity would be respected and that no identifiable data would be shared with anyone beyond the research team. Participants were reminded that their participation was voluntary and they were given the contact information of the main researcher and experimenter should they wish to follow up with any further questions. After completing the study, participants received a comprehensive debriefing message in Prolific (forwarded by Prolific to their associated email address), providing further information about the study, the deception that was used (i.e., the experimenter was using WoZ approach for communicating with participants to make it look like the robot was responding autonomously), and were again given the contact information of the main researcher and experimenter should they wish to follow up with any further questions or feedback.

When completing each session, participants were reminded in the Qualtrics form about the date of their next session. Two days before each session, participants received an email via Prolific regarding the specifics of their next session. This message contained details about the session number, the time at which the Prolific page with the link for the session questionnaire form would be published, a reminder not to start the session before the allocated participation time slot, and to contact the experimenter if they are to be late, cannot remember their participation time-slot, or cannot make it. Finally, participants were thanked for their participation and cooperation in each of these messages and were reminded of their rights and the fact that they were welcome to contact the experimenter at any time using the Prolific messaging system, or by email. On the day of participation, participants received an automated message from Prolific at 08:00 AM, that the Prolific page of the session is available online. Later that day, 30 to 15 min before each participant’s participation time slot, each participant received an individual message via Prolific from the experimenter about their upcoming session and where they could find the link to start the session. If and when participants were late to their participation (without providing earlier notice that this would be the case), the experimenter messaged the participant via Prolific to ensure attendance or reschedule the session. When participants experienced any technical difficulties or needed to communicate with the experimenter, they were instructed to do this via Prolific or email, and not using the Zoom chat. This was to reduce any potential association between the session interactions with the robot and the experimenter. Accordingly, all communications between participants and the experimenter took place via the Prolific messaging centre or emails on rare occasions (when initiated by a participant). The main researcher and experimenter (GL) signed his name on all communications with participants.

3.3 Stimuli

Fig. 1
figure 1

The lab settings, including the robot Pepper (SoftBank Robotics) in front of a web camera, while the experimenter in the back is controlling the robot using the Wizard of Oz technique

Conversational interactions were guided by the robot Pepper (SoftBank Robotics), a humanoid robot capable of communicating via speech and gestures. Following [26] guidelines for social robots’ design for long-term interactions, Pepper was chosen as a suitable robotic platform for this task, given the alignment between Pepper’s humanoid embodiment and the social requirements of the conversational task [see [26]; “Guidelines for Future Design”]. While Pepper’s appearance and behaviours are somewhat human-like (i.e., Pepper has a head, face, torso, two arms, two hands, five fingers per hand, etc.), Pepper has not been designed to resemble a real person. Instead, Pepper’s embodiment and behaviours clearly convey human likeness, (further evidenced, for example, by Pepper’s abilities to communicate using human speech, but not demonstrating any facial expressions given the rigid, immobile face and head).

Pepper was placed in front of a web camera (Logi-tech, 1080p), connected to the experimenter’s computer (see Fig. 1). Behind Pepper was a white wall and a flowerpot with a green plant (see Fig. 2). Pepper communicated with participants in this study via the WoZ technique controlled by the experimenter via a PC laptop. All pre-scripted questions and speech items were written and coded in the WoZ system, with the experimenter controlling Pepper by pressing buttons on a PC laptop. Accordingly, the procedure followed a clear pre-programmed protocol where the experimenter did not need to speak or type anything during the interaction, but only pressed the relevant keys to trigger the required or appropriate text delivery via Pepper.

Fig. 2
figure 2

The interaction from the eyes of the participants and the experimenter. The participants were exposed only to the robot Pepper (SoftBank Robotics) via the zoom chats

Pepper responded to participants’ answers and statements with neutral or empathetic responses. Pepper’s vocabulary was limited and constrained to reflect the current state of speech recognition technology in social robotics. Following [26] guidelines for social robots’ design for long-term interactions, Pepper’s responses were affective and empathetic, aiming to convey an understanding of users’ affective state, communicate appropriate responses, and also display contextualised affective reactions [see [26]; “Guidelines for Future Design”]. Hence, a limited set of responses were pre-defined for answers and statements with neutral sentiment or containing factual information (e.g., "I understand", "I see", "okay"), for answers and statements of positive sentiment (e.g., "I am happy to hear that", "This is really interesting", "That’s amazing"), and for answers and statements of negative sentiment (e.g., "I am sorry to hear that", "This sounds very challenging", "These are not easy times"). Moreover, Pepper had pre-defined statements for opening an interaction (e.g., "Hello there", "Hi!", "How are you doing today?"), closing an interaction ("That’s it for now", "See you next time", "Have a good weekend", "Goodbye"), answer with basic polite gratitude (e.g., "I am fine, thank you!", "Thank you", "That is lovely of you to say so", "It was nice to chat with you too!"), and thank participants for their cooperation and disclosures (e.g., "Thank you for sharing with me", "Thank you for telling me", "What a nice memory. Thank you for sharing with me"). Due to Pepper’s high-pitched voice and robotic style of pronunciation, Pepper’s answers and statements were structured using commas so that Pepper’s speech segments will be clearer. See the OSF repository [86] for a file with all of Pepper’s vocabulary and the structure of Pepper’s speech segments.

Pepper communicated using a cheerful, high-pitched voice, and expressive and animated body language that corresponded to the spoken content and Pepper’s physical capabilities. Pepper’s movements were self-initiated based on Pepper’s demo software’s "Animated Speech" functionFootnote 1, in order to provide a sense of neutral interaction and to ensure replicability by future studies using the same functionality that all Pepper robots are equipped with. Moreover, Pepper’s gaze was almost always focused on the camera, but it shifted and moved from the camera with no pre-programmed logic by leaving the "Basic Awareness" functionFootnote 2 on. To ensure that the mediated interactions would come across as natural, Pepper’s gaze was not programmed to be focused on the camera at all times as this would not be normal behaviour with conversing with a human interlocutor. Therefore, Pepper’s gaze shifts were allowed to naturally occur following its demo software.

3.4 Behavioural Paradigm

The behavioural paradigm was specifically designed to encourage participants to engage in self-disclosure behaviour. By asking them questions about personal matters, we aimed to create an environment conducive to self-disclosure during interactions with the social robot. The intention behind this design was to facilitate meaningful exchanges and encourage participants to share personal information with the robot. In accordance with [26] guidelines for social robots’ design for long-term interactions, the interactions followed a clear structure and routine, including greetings and farewells, identifying participants by their name, and demonstrating appropriate affective and emphatic responses to participants’ answers to provide a sense of personal interactions and encourage self-disclosure [see [26]; “Guidelines for Future Design”].

3.4.1 Structure

Each interaction was guided by Pepper as a semi-structured interview discussing non-sensitive topics regarding general everyday experiences. Each interaction followed the same order, starting with greetings followed by 3 questions (x3 repetitions). The participants were instructed to have a short conversation with Pepper, following Pepper’s lead in the interaction and answering Pepper’s questions. Participants were instructed that no time limit was applied for the interactions and that the interactions usually took about five to ten minutes. They were further encouraged to participate in the interactions the way they saw fit—speaking as little or as much as they wished. In addition, participants were instructed that there were no correct or incorrect answers, and they were encouraged to provide honest answers according to what they felt comfortable with. In the first interaction with Pepper (Session 1), participants were asked for their name by the robot as part of the robot introduction (i.e., "Hello there, my name is Pepper, what is your name?), as such a question would be part of a normal introduction in on-going social exchanges with another person. Before the interaction started, participants were instructed that they were not obliged to share their names with the robot and that they could give a fake name if they preferred to do so. From the second interaction (Session 2) onwards, Pepper addressed each participant by the name they gave during the first interaction (Session 1), to provide a sense of natural and personalized interactions. The task followed the following structure and order:

  • Short greetings/introduction (e.g., Hi there, how are you doing?).

  • One pre-defined general question about the participant’s day, week, or weekend, to build rapport (e.g., "how was your weekend? Did you do anything interesting?").

  • An opening statement introducing the topic of the question (e.g., "I am about to ask you about your social life").

  • Two pre-defined, non-sensitive questions that correspond to the topic that was randomly allocated to the interaction. These questions were either framed in the context of the COVID-19 pandemic or in a more general everyday context, depending on the discussion frame group assignment.

Table 1 The ten topics and corresponding quality of life categories following [92] framework

3.4.2 Content

Previous studies that investigated relationship formation and disclosure with artificial agents followed conceptual frameworks for inducing rich disclosures and forming meaningful connections [e.g., 52, 53, 3, 89]. For example, a study by [53] presented an implementation of 36 questions as a method to generate interpersonal closeness [see [90]; “36 questions to love”] and elicit self-disclosure from human users to a chatbot. A previous study [3] demonstrated how simple questions about everyday experiences (i.e., work-life balance and finances, social life and relationships, and health and well-being) can elicit meaningful disclosures when communicated by a social robot. The questions and topics in the study were influenced by [12] and [91] as an elicitation technique aiming to capture participants’ subjective experiences regarding various everyday topics. Here we used a similar type of questions to the ones used in [3], adapting disclosure topics for the ten sessions from [91] and [92], and framing the disclosure topics and questions in this study following a framework and guidelines by [92].

The framework by [92] introduces guidelines via six main themes for asking questions that capture and elicit disclosures that relate to different elements of quality of life within counselling psychology settings and mental health therapy. The guidelines and themes were defined by [92] after reviewing and synthesizing qualitative research studies (especially from the counselling psychology literature, psychotherapy, and mental health therapy literature) that explicitly asked adult participants with mental health problems about the factors they considered important to their quality of life or how it had been impacted by their mental health. Based on [92] review’s results the six themes are: (1) Well-being and Ill-being, (2) Control, Autonomy, and Choice, (3) Self-Perception, (4) Belonging, (5) Activity, (6) Hope and Hopelessness.

The ten topics for the ten sessions describe one or more of the six themes described by [92], aiming to elicit meaningful disclosures following [91] guidelines, but also to initiate self-reflection and capture meaningful information regarding the quality of life and mental health, following [92] framework. The ten topics and their corresponding themes according to [92] framework can be seen in Table 1. The phrasing of each of the two questions under each topic followed [90] approach for questions and practical methodology for creating interpersonal closeness in an experimental context (see the questions in the OSF repository [86]).

3.4.3 Discussion Frames

For both discussion frames, the interaction always started the same way, with greetings and with the robot asking the first question about the participant’s day/week/weekend (see Sect. 3.4.1 for the structure of the task). The following two pre-defined questions were about a topic that was randomly allocated to the interaction from the 10 topics about general everyday experiences (see Table 1). For participants assigned to the neutral discussion frame group, the questions were not limited to any specific frame other than general everyday context. For participants assigned to the Covid-related discussion frame group, questions were asked about the same topics, however, the questions were framed within the context of the COVID-19 pandemic. For example, participants were asked how their work situation changed due to the pandemic, or how they were socializing during the pandemic. See the questions and differences between conditions at the OSF repository [86].

3.5 Measurements

To ensure that our models only include high-quality data, we included only cases that were captured and processed correctly.

3.5.1 Demographics

Participants were requested to complete a short questionnaire that gathered information on demographic parameters including age, biological sex, gender identification, level of education, nationality, job, previous experience with robots, and whether English is their native language.

3.5.2 Self-Disclosure

We operationalized the concept of self disclosure via three angles. Single dimensions cannot capture the complex nature of self-disclosure, as it is a multidimensional behaviour [93]; perceptions of self-disclosure can be subjectively reported and objectively observed differently from behaviour and content [94]. First, we measured participants’ subjective perceptions of their own self disclosures towards Pepper to gain subjective insights about the interaction from the participants point of view. This method is practical as it is easy to administer and provides a subjective perspective on self-disclosure. However, it may be subject to bias, as participants may not accurately report their behaviour. To capture self-disclosure behaviour more objectively, we focused on two additional, quantitative, measures: the duration of all utterances in an answer (i.e., self disclosure duration in seconds) and the average word count per answer (i.e., self disclosure length in number of words). These measures allowed us to examine the volume of disclosure during the interactions. Self-disclosure has been linked to the total number of words a person produces during an interaction or within a single turn in the interaction. Higher word counts are associated with greater self-disclosure [95,96,97]. Therefore, these measures provided further objective criteria to assess changes in disclosure behaviour and its relationship with the social robot. Finally, to capture the emotional and sentimental tone of the disclosure we used a measure of disclosure compound sentiment, rating the sentiment present in the disclosure from negative to positive.

Subjective self-disclosure

Participants were requested to report their level of subjective self-disclosure via the sub-scale of work and studies disclosure in Jourard’s Self-Disclosure Questionnaire [91]. This questionnaire was adapted and adjusted for the context of the study, addressing the statements to general life experiences. The measurement included ten self-reported items for which participants reported the extent to which they disclosed information to Pepper on a scale of one (not at all) to seven (to a great extent). Accordingly, a mean scale was constructed (\(M\) = 3.60, \(SD\) = 1.17) which was found to be reliable (Cronbach’s \(\alpha \) =.83).

Disclosure duration

Duration of speech in seconds from each recording was extracted and processed using Parselmouth [98], a Python library for Praat [99].

Disclosure length

The volume of disclosure in terms of the number of words per disclosure. The recordings were automatically processed using the IBM Watson speech recognition engine, applying the British telephony model. To ensure capturing all utterances within each disclosure we amplified the audio files with 7 decibels and slowed the audio file’s pitch by adjusting the sample rate, with a crucial factor of 0.55. By reducing the sample rate using the key factor of 0.55, we effectively decreased the playback speed of the recordings. This method was used only for receiving text output for analysis and not for the robotic interactions. This method was used systematically across all data units in the study to ensure that the procedure would be identical across all data units. The number of words per disclosure was extracted from the text using a simple length command in Python.

Disclosure compound sentiment

Using Vader for Python [100], the disclosures were measured to determine their overall sentiment in terms of positive, neutral, and negative sentiment. The compound sentiment evaluates a disclosure sentiment from negative (-1) to positive (+1), based on the calculated sentiment score [see 100].

3.5.3 Perception

Agency and experience

Research into mind perception has revealed that agency (the ability of an agent to plan and act) and experience (the ability of the agent to sense and feel) are two key dimensions when valuing an agent’s mind [101]. To determine whether any differences in mind perception emerged across the testing sessions, participants were requested to evaluate Pepper in terms of agency and experience, after being introduced to these terms [adapted from 101]. Both concepts were evaluated by the participants using a 0 to 100 rating bar.

Friendliness and warmth

This scale was aimed at capturing how participants perceived Pepper in terms of friendliness and warmth using one item from [102] and two items from [103], as suggested by [54]. These items were evaluated on a seven-point scale ranging from 1 (not at all) to 7 (extremely). Accordingly, a mean scale was constructed (\(M\) = 6.11, \(SD\) = 1.02) which was found to be reliable (Cronbach’s \(\alpha \) =.94).

Communication competency

This scale was aimed at capturing how participants experienced and evaluated Pepper’s communication competency using an adapted and adjusted version by [52] for a scale by [104]. The scale included three items that were evaluated on a seven-point scale ranging from 1 (not at all) to 7 (extremely). Accordingly, a mean scale was constructed (\(M\) = 5.78, \(SD\) = 1.18) which was found to be reliable (Cronbach’s \(\alpha \) =.93).

Interaction quality

This scale was aimed at capturing how participants perceived and evaluated the interaction with Pepper using an adapted and adjusted version by [52] for a scale by [105]. Each interaction included two random items out of seven, except for the mid-session (session 5) and the last session (session 10) which included all six items of the scale. These items were evaluated on a seven-point scale ranging from 1 (not at all) to 7 (extremely). Accordingly, a mean scale was constructed (\(M\) = 5.48, \(SD\) = 1.56) which was found to be reliable (Cronbach’s \(\alpha \) =.96).

3.5.4 Well Being

Mood

To capture participants’ mood change from their interactions with Pepper, participants reported their mood before and after the interaction with Pepper using the Immediate Mood Scaler [IMS-12; see 106. IMS-12 includes 12 items of polarized moods, ranging from 1 (for negative moods) to 7 (for the equivalent positive moods). The scale is a novel validated tool based on the Positive and Negative Affect Schedule [PANAS; 107], adapted and adjusted to capture current mood states in online and mobile experiments [106]. Mean reliable scales were constructed for participants’ mood before the interaction (\(M\) = 5.35, \(SD\) = 1.16, Cronbach’s \(\alpha \) =.96) and after the interaction (\(M\) = 5.75, \(SD\) = 1.08, Cronbach’s \(\alpha \) =.97).

Comforting responses

To measure the extent to which participants perceived Pepper’s responses as comforting the comforting response scale was adapted [see 108]. The scale includes 12 self-reported items rated on a seven-point scale, ranging from 1 (I strongly disagree) to 7 (I strongly agree). Accordingly, a mean scale was constructed (\(M\) = 5.50, \(SD\) =.89) which was found to be reliable (Cronbach’s \(\alpha \) =.91).

Loneliness

Each session participants were requested to report their feelings and thoughts of loneliness from the last three days using the short-form UCLA loneliness scale [ULS-8; see 109]. The scale includes 8 items rated on a seven-point scale, ranging from 1 (not at all) to 7 (all the time). Accordingly, a mean scale was constructed (\(M\) = 2.86, \(SD\) = 1.28) which was found to be reliable (Cronbach’s \(\alpha \) =.90).

Stress

Participants were requested to report their feelings and thoughts of periodic stress from the past month using the perceived stress scale [110]. The scale includes 10 statement items rated on a seven-point scale, ranging from 1 (never) to five (very often). A mean scale was constructed (\(M\) = 3.30, \(SD\) = 1.03) which was found to be reliable (Cronbach’s \(\alpha \) =.89).

3.6 Materials

3.6.1 Zoom Video Chat

All interactions (video chats) were conducted with the software Zoom, using a university staff account (see Fig. 2). The interactions were recorded using the recording functionality on Zoom and edited to include only those portions of the recordings where participants and/or Pepper were speaking.

3.6.2 Qualtrics Questionnaires

All of the questionnaires were administered via the survey software Qualtrics, using a university staff account. In the online questionnaires, the functionality of recording participants’ IP addresses was disabled to comply with GDPR guidelines.

3.7 Procedure

When recruited, participants completed an induction questionnaire (Session 0) approximately one week before beginning their video chat interactions with Pepper (Sessions 1 to 10). Participants were instructed to have a short conversation with Pepper about several topics that Pepper will bring up, that Pepper will ask them 3 questions and that the interactions will take place twice a week across five weeks during prearranged times. They were further told that each interaction with Pepper should last about 5 to 10 min, and another 10–15 min will be required to complete questionnaires afterwards. When answering the induction questionnaire (after providing consent to participate in the study), participants were instructed on how to position their video camera for the video chats, and what the lighting in the room is expected to be like. Following this, participants reported on several demographic parameters and several questionnaires (for the full list of questionnaires and their order in each session see the OSF repository at [86]). Participants were redirected to the Prolific website when completing the induction questionnaire (Session 0). A participant number was automatically generated for each participant who completed the induction questionnaire (Session 0) and proceeded to the following sessions. The random assignment of participants to conditions, allocation of topics to sessions for each participant and the order of questions in each interaction were randomized and allocated automatically and an excel sheet was created to help the experimenter control and follow the experimental design procedure for five weeks (see the randomization and allocation code, experimenter notebook with the conditions, allocated topics to a session, and order of questions for each of the participants on the OSF repository at [86]).

When starting each session, participants were asked to enter their Prolific ID and their participant number. Following, participants were asked to answer the Immediate Mood Scale [IMS-12, [106]] for reporting their mood before interacting with Pepper. Next, participants received a reminder regarding their interaction with Pepper, what the task requires, and some basic instructions. The page included a link to the Zoom interaction, a frame with the zoom landing page, and the experimenter’s e-mail address and instructions on how to communicate with the experimenter in case there are any issues during the interaction. Then, participants interacted with Pepper via a Zoom video chat (see Sect. 3.4), only seeing Pepper in the chat (see Fig. 2). After finishing their interaction with Pepper, participants went back to the Qualtrics page and answered the rest of the questionnaires. The full list of questionnaires and their order in each session can be found on the study’s OSF page [see 86]. When finished answering the questionnaire, participants were thanked for the completion of the session, reminded about the date and day of their upcoming session, were provided again with the contact details of the experimenter, and were directed back to Prolific to receive a completion message. When completing the last session participants were clarified that this is indeed the last session, they were thanked for their participation, and provided with contact details of the experimenter to ask any further questions about the study.

4 Results

4.1 Disclosure

We used lme4 [111] for R to perform a linear mixed effects analysis of the effect of session number, discussion frame and their interaction term on participants’ disclosures to Pepper. As fixed effects, we entered the session order, the discussion frame and their interaction term into the model. As a random effect, we had intercepts for subjects. Significance was calculated using the lmerTest package [112], which applies Satterthwaite’s method to estimate degrees of freedom and generate p-values for mixed models.

4.1.1 Subjective Self-Disclosure

The model explains 60.5% of the variance in participants’ subjective perceptions of their self-disclosure to Pepper, whereas the fixed effects in the model explain 3.4% of the variance (see Table 2). The results stress that despite the variance between the participants (\(SD\) =.90), the session number has a significant positive fixed effect on participants’ subjective perceptions of their self-disclosures (\(\beta \) =.07, SE =.02, \(p <.001\), see Fig. 3). Nevertheless, there were no significant fixed effects in terms of the discussion frame (\(\beta = -.16, SE =.33, p =.627\)), and the interaction term of the session number and discussion frame (\(\beta = -.02, SE =.03, p =.529\)).

Table 2 Results of disclosure

4.1.2 Disclosure Duration

The model explains 49.1% of the variance in participants’ disclosures duration (in seconds) to Pepper, whereas the fixed effects in the model explain 3.7% of the variance (see Table 2). The results stress that despite the variance between the participants (\(SD\) = 21.04), the session number has a significant positive fixed effect on participants’ disclosures duration (\(\beta \) = 2.10, SE =.33, \(p <.001\), see Fig. 4). Nevertheless, there were no significant fixed effects in terms of the discussion frame (\(\beta \) = 4.04, SE = 7.32, p =.583), and the interaction term of the session number and discussion frame (\(\beta = -.13, SE =.46, p =.774\)).

Fig. 3
figure 3

Mean subjective disclosure scores by session number and discussion frame

Fig. 4
figure 4

From left to right: (1) Mean disclosure duration (in seconds) by session number and discussion frame. (2) Mean disclosure duration (in seconds) by session number and discussion frame, including only the items corresponding to the disclosure topic

Another linear mixed effects model was used to test if the discussion frame, the session number, and their interaction term significantly predicted the disclosure duration when interacting with the social robot Pepper, including only the items corresponding to the disclosure topic. The model explains 61.1% of the variance in participants’ disclosures duration (in seconds) to Pepper, whereas the fixed effects in the model explain 5.1% of the variance (see Table 3). The results reveal that despite the variance between participants (\(SD\) = 26.53), the session number has a significant positive fixed effect on participants’ disclosures duration (\(\beta \) = 2.54, SE =.40, \(p <.001\), see Fig. 4). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta \) = 6.50, SE = 9.18, p =.482), and the interaction term of the session number and discussion frame (\(\beta \) =.03, SE =.56, p =.964).

Table 3 Results of disclosure including only the items that corresponded to the topic of disclosure
Fig. 5
figure 5

From left to right: (1) Mean disclosure length (in number of words) by session number and discussion frame. (2) Mean disclosure length (in number of words) by session number and discussion frame, including only the items corresponding to the disclosure topic

4.1.3 Disclosure Length

The model explains 49.6% of the variance in participants’ disclosures length (in number of words) to Pepper, whereas the fixed effects in the model explain 4.1% of the variance (see Table 2). The results stress that despite the variance between the participants (\(SD\) = 49.32), the session number has a significant positive fixed effect on participants’ disclosures length (\(\beta \) = 4.97, SE =.76, \(p <.001\), see Fig. 5). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta \) = 9.11, SE = 17.13, p =.598), and the interaction term of the session number and discussion frame (\(\beta = -.09, SE = 1.07, p =.936\)).

Another linear mixed effects model was used to test if the discussion frame, the session number, and their interaction term significantly predicted the disclosure length when interacting with the social robot Pepper, including only the items corresponding to the disclosure topic. The model explains 61.6% of the variance in participants’ disclosures length (in number of words) to Pepper, whereas the fixed effects in the model explain 5.6% of the variance (see Table 3). The results stress that despite the variance between the participants (\(SD\) = 61.89), the session number has a significant positive fixed effect on participants’ disclosures length (\(\beta \) = 6.09, SE =.92, \(p <.001\), see Fig. 5). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta \) = 15.58, SE = 21.39, p =.470), and the interaction term of the session number and discussion frame (\(\beta \) =.24, SE = 1.29, p =.852).

4.1.4 Disclosure Compound Sentiment

The model explains 10.8% of the variance in participants’ disclosures compound sentiment (see Sect. 3.5.2), whereas the fixed effects in the model explain 2.1% of the variance (see Table 2). The results stress that despite the variance between the participants (\(SD\) =.15), the session number has a significant positive fixed effect on participants’ disclosures compound sentiment (\(\beta \) =.02, SE =.01, \(p <.001\)). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta = -.04\), SE =.08, p =.569), and the interaction term of the session number and discussion frame (\(\beta = -.01\), SE =.01, p =.537).

Another linear mixed effects model was run to test if the discussion frame, the session number, and their interaction term significantly predicted the disclosure compound sentiment when interacting with the social robot Pepper, including only the items that corresponded to the topic of disclosure. The model explains 9.4% of the variance in participants’ disclosures compound sentiment to Pepper, whereas the fixed effects in the model explain 2.6% of the variance (see Table 3). The results stress that despite the variance between the participants (\(SD\) =.13), the session number has a significant positive fixed effect on participants’ disclosures duration (\(\beta \) =.02, SE =.01, \(p =.005\)). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta = -.07\), SE =.09, p =.418), and the interaction term of the session number and discussion frame (\(\beta = -.01\), SE =.01, p =.575).

4.2 Perception

We used lme4 [111] for R to perform linear mixed effects analysis of the effect of session number, discussion frame and their interaction term on participants’ perceptions of Pepper, including perceptions of agency and experience [see 101], friendliness and warmth, communication competency and interaction quality. As fixed effects, we entered the session order, the discussion frame and their interaction term into the model. As a random effect, we had intercepts for subjects. Significance was calculated using the lmerTest package [112], which applies Satterthwaite’s method to estimate degrees of freedom and generate p-values for mixed models.

4.2.1 Agency

The model explains 82.5% of the variance in participants’ perceptions of Pepper’s degree of agency, whereas the fixed effects in the model explain 1.6% of the variance (see Table 4). The results stress that despite the variance between the participants (\(SD\) = 21.38), the session number has a significant positive fixed effect on participants’ perceptions of Pepper’s degree of agency (\(\beta \) = 1, SE =.25, \(p <.001\), see Fig. 6). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta \) = \(-\)1.34, SE = 7.20, p =.853), and the interaction term of the session number and discussion frame (\(\beta \) =.08, SE =.35, p =.828).

Table 4 Results of perception

4.2.2 Experience

The model explains 79.4% of the variance in participants’ perceptions of Pepper’s degree of experience, whereas the fixed effects in the model explain 3.8% of the variance (see Table 4). The results stress that despite the variance between the participants (\(SD\) = 24.29), the session number has a significant positive fixed effect on participants’ perceptions of Pepper’s degree of experience (\(\beta \) = 1.82, SE =.32, \(p <.001\), see Fig. 6). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta \) = \(-\)3.59, SE = 8.27, p =.666), and the interaction term of the session number and discussion frame (\(\beta = -.11\), SE =.45, p =.811).

4.2.3 Friendliness and Warmth

The model explains 79.7% of the variance in participants’ perceptions of Pepper’s degree of friendliness and warmth, whereas the fixed effects in the model explain 2.5% of the variance (see Table 4). The results stress that despite the variance between the participants (\(SD\) =.93), the session number has a significant positive fixed effect on participants’ perceptions of Pepper’s degree of friendliness and warmth (\(\beta \) =.05, SE =.01, \(p <.001\)). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta = -.07\), SE =.32, p =.828), and the interaction term of the session number and discussion frame (\(\beta \) =.02, SE =.02, p =.245).

Fig. 6
figure 6

From left to right: (1) Mean agency scores by session number and discussion frame. (2) Mean experience scores by session number and discussion frame

4.2.4 Communication Competence

The model explains 70.3% of the variance in participants’ perceptions of Pepper’s communication competence, whereas the fixed effects in the model explain 1.2% of the variance (see Table 4). The results stress that despite the variance between the participants (\(SD\) = 1), the session number has a significant positive fixed effect on participants’ perceptions of Pepper’s communication competence (\(\beta \) =.03, SE =.02, \(p =.040\)). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta = -.16\), SE =.35, p =.655), and the interaction term of the session number and discussion frame (\(\beta \) =.02, SE =.02, p =.456).

4.2.5 Interaction Quality

The model explains 66.4% of the variance in participants’ perceptions of the interaction quality, whereas the fixed effects in the model explain 4.1% of the variance (see Table 4). The results stress that despite the variance between the participants (\(SD\) = 1.26), the session number has a significant positive fixed effect on participants’ perceptions of the interaction quality (\(\beta \) =.09, SE =.02, \(p <.001\)). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta = -.21\), SE =.45, p =.646), and the interaction term of the session number and discussion frame (\(\beta \) =.04, SE =.03, p =.291).

4.3 Well-Being

We used lme4 [111] for R to perform linear mixed effects analysis of the effects of session number, discussion frame and their interaction term on participants’ perceptions of Pepper’s comforting responses, mood change, and feelings of loneliness. As fixed effects, we entered the session order, the discussion frame and their interaction term into the model. As a random effect, we had intercepts for subjects. Significance was calculated using the lmerTest package [112], which applies Satterthwaite’s method to estimate degrees of freedom and generate p-values for mixed models.

4.3.1 Mood

The model explains 69.8% of the variance in participants’ mood, whereas the fixed effects in the model explain 4.2% of the variance (see Table 5). The results stress that despite the variance between the participants (\(SD\) =.94), we observed a positive significant fixed effect on mood change, as participants reported a positive mood change after interacting with Pepper (\(\beta \) =.49, SE =.11, \(p <.001\)). Moreover, the session number has a significant positive fixed effect on participants’ mood (\(\beta \) =.03, SE =.33, \(p =.019\), see Fig. 7). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta = -.24\), SE =.32, p =.469), the interaction term of the session number and discussion frame (\(\beta \) =.02, SE =.02, p =.154), the interaction term of the session number and mood change (\(\beta \) = -.01, SE =.02, p =.388), and the interaction term of the discussion frame and mood change (\(\beta = -.01\), SE =.09, p =.943).

4.3.2 Comforting Responses

The model explains 78.3% of the variance in participants’ perceptions of Pepper’s comforting responses, whereas the fixed effects in the model explain 3.9% of the variance (see Table 5). The results stress that despite the variance between the participants (\(SD\) =.79), the session number has a significant positive fixed effect on participants’ perceptions of Pepper’s comforting responses (\(\beta \) =.07, SE =.01, \(p <.001\), see Fig. 8). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta \) =.13, SE =.27, p =.635), and the interaction term of the session number and discussion frame (\(\beta = -.01\), SE =.02, p =.540).

Table 5 Results of well being

4.3.3 Loneliness

The model explains 75.9% of the variance in participants’ feelings of loneliness, whereas the fixed effects in the model explain 7.8% of the variance (see Table 5). The results stress that despite the variance between the participants (\(SD\) = 1.08), the session number has a significant negative fixed effect on participants’ feelings of loneliness (\(\beta = -.05\), SE =.01, \(p <.001\), see Fig. 9). Nevertheless, no significant fixed effects emerged in terms of the discussion frame (\(\beta \) =.63, SE =.37, p =.091), and the interaction term of the session number and discussion frame (\(\beta \) =.01, SE =.02, p =.674).

Fig. 7
figure 7

From left to right: (1) Mean mood scores of participants in the neutral discussion frame, before and after the interaction, by session number. (2) Mean mood scores of participants in the Covid-related discussion frame, before and after the interaction, by session number

Fig. 8
figure 8

Mean comforting responses scores by session number and discussion frame

Fig. 9
figure 9

Mean loneliness scores by session number and discussion frame

Another linear mixed effects model was used, omitting the data units collected in the induction session (session 0) before the exposure to the discussion frame manipulation, in order to have a better evaluation of the effect of the discussion frame on participants’ feelings of loneliness. The model explains 79.1% of the variance in participants’ feelings of loneliness, whereas the fixed effects in the model explain 8.4% of the variance (see Table 5). The results stress that despite the variance between the participants (\(SD\) = 1.10), the session number has a significant negative fixed effect on participants’ feelings of loneliness (\(\beta = -.04\), SE =.02, \(p =.008\)), and the discussion frame has a significant fixed effect on participants’ feelings of loneliness (\(\beta \) =.77, SE =.38, \(p =.046\), see Fig. 9). Participants in the COVID-related experiences discussion frame group reported higher levels of loneliness compared to participants in the general experiences discussion frame group. Nevertheless, no significant fixed effect emerged in terms of the interaction term of the session number and discussion frame (\(\beta = -.01\), SE =.02, p =.575) on participants’ feelings of loneliness.

4.3.4 Stress

The model explains 77.3% of the variance in participants’ feelings of stress, whereas the fixed effects in the model explain 9.4% of the variance (see Table 5). The results stress that despite the variance between the participants (\(SD\) =.86), the discussion frame has a significant fixed effect on participants’ feelings of stress (\(\beta \) = 1.31, SE =.45, \(p =.005\)). Participants in the COVID-related experiences discussion frame group reported higher levels of stress compared to participants in the general experiences discussion frame group. Moreover, the interaction term of the session number and discussion frame also has a significant fixed effect on participants’ feelings of stress (\(\beta = -.01\), SE =.05, p =.042) with participants in the COVID-related experiences discussion frame group reporting that their feelings of stress decreased from the fifth session to the tenth, whereas participants in the general experiences discussion frame group reported for increasing levels of stress from the fifth session to the tenth. Finally, no significant fixed effect emerged in terms of the session number across the entire sample (\(\beta \) =.04, SE =.03, p =.205).

5 Discussion

Here we have introduced a novel long-term mediated experimental design to evaluate the extent to which a social robot can elicits and influence peoples’ self disclosures to the robot, and how perceptions of the robot develop over time. Moreover, we measured the extent to which interactions with the social robot affected participants’ well-being in different ways across time. Participants conversed with the social robot Pepper during 10 sessions distributed over 5 weeks, about one of two different topics depending on random group assignment. One group’s conversation topics were framed within the context of the Covid-19 pandemic (e.g., social relationships during the pandemic, sustaining mental health during the pandemic, etc.), whereas the other group’s conversation topics were similar, except no explicit mention of the Covid-19 pandemic was ever made. We evaluated the effect of time (session number) as well as how the discussion frame affected participants, comparing general everyday topics, to the same topics framed to the Covid-19 pandemic to address a more emotional context.

5.1 People Self-disclose Increasingly More to a Social Robot Over Time

Our first key finding shows that across the 10 sessions, people speak longer and share more information in their disclosures to the social robot Pepper. Moreover, consistent with previous results [3], subjective perceptions of self-disclosure align well with the objective data, and correspond to observed evidence of the length and duration of the disclosure, as people correctly perceived themselves to gradually share more information with Pepper across sessions. Finally, we found that people were more positive in their disclosures over time. The effects described here were even more meaningful when addressing only disclosures that are related to the session’s conversation topic. Nevertheless, our results also reveal that the discussion frame has no meaningful nor significant effect on participants’ disclosures to Pepper. Self-disclosure is a dynamic and socially complex human behaviour [12, 13], and accordingly, this key finding contributes to our understanding of humans’ social behaviour and communication with robots. While numerous prior studies have exported humans’ social behaviour towards robots in single-session studies, our knowledge of how people’s behaviours towards robots change or develop over the longer-term remain limited in social HRI. Naturally, we recognize that people are different and might adapt different behavioural patterns when conversing with social robots. Nonetheless, we showed that people self-disclosed increasingly more to Pepper over time in a systematic fashion even when the potential for such inter-individual differences are taken into account through the use of rigorous methodology. This is a meaningful contribution to HRI theory, showing that prolonged and intensive interactions with social robots can overcome novelty effects from behavioural objective evidence and not only from users’ self-reported subjective perceptions.

5.2 People Perceive a Robot as More Social and Competent Over Time

We found that across the 10 sessions, participants attributed higher qualities of mind [see 101], in terms of agency and experience. Likewise, over time participants found Pepper to be friendlier and warm, as well as Pepper’s communication skills more competent. Finally, across time, participants also rated the interactions with Pepper to be of increasingly higher quality. Here again, our results stress that the discussion frame has no meaningful nor significant effect on the way people perceive Pepper and the interaction. This key finding highlights the extent of people’s social perception of robots over time. Despite Pepper’s limited responses, over time participants attributed more social qualities to this particular robot, thus providing evidence for the influence of social engagement with a robot on its social perception over time. Furthermore, beyond finding Pepper to be more social, participants also attributed higher degrees of competency to Pepper over time. It is of note that people’s perceptions of the robot and the interactions corresponded to their self-disclosure behaviour toward the robot over time. This key finding supports previous research showing how people’s behaviours aligned with their social perceptions and attitudes towards the robot in single-session interactions [113]. Here we provided further support for this behavioural mechanism in HRI, and our results demonstrate how perceptions of robots and behaviours toward robots co-align over time during prolonged interactions.

5.3 Establishing Relationships with Social Robots

While previous longitudinal studies often report novelty effects in human–machine communication encounters [e.g., 52, 53], here we see a clear opposing trend, with evidence rooted in people’s objective behaviour to robots (i.e., with the length and quality of participant disclosures increasing over time) and their subjective perceptions of robots (i.e., with participants’ social perceptions of Pepper increasing over time, in terms of Pepper’s agency, experience, friendliness and warmth, communication competency, and the interaction quality). These findings are particularly interesting as they provide clear evidence for social robots’ potential to establish meaningful relationships with human users. While consistent with previous suggestions on the matter [see 114, 115], the present study provides initial support for long-term relationships between humans users and a social robot, supported with multidimensional data. Furthermore, our findings establish important foundations for future HRI studies looking into how human-robot relationships develop over time, as well as for roboticists trying to create meaningful relationships between their robots and their users. Finally, these results highlight how human–robot relationships could act as ideal settings for robotic interventions for well-being. In addition, compared to Croes’s previous studies [e.g., 52, 53] and despite our previous results in single-session studies [see 3], the present study suggests differences between embodied and disembodied agents in long-term interactions. We assume that people might attribute more social qualities to embodied agents (for the scope of this study, social robots) and accordingly, the relationship with such agents should evolve over time and not experience the same degree of novelty effects as experienced in [52, 53]. Nevertheless, this calls for further investigation and clear opportunities exist for future research to address the effects of embodiment on relationship establishment with artificial agents.

Our results further support the notion that the social dynamic in HRI, where humans often seek to establish social connections and rapport with robots, influences people’s perceptions and attitudes towards the robot. The results of the present study further confirmed that mere-exposure effect [22] in HRI operates differently compared to traditional HCI, as the focus shifts from usability to the establishment of social bonds. By simulating human-like behaviours and engaging in social interactions, social robots, like the Pepper robot used in the present study, can elicit positive responses and be perceived as increasingly socially competent over time. This highlights the importance of understanding the dynamics of human–robot communication in long-term interactions and the potential for social robots to establish meaningful relationships with human users [8]. This distinction becomes evident when examining prolonged social interactions with a robot, resulting from repeated exposure. In the current context, we observe that these interactions demonstrate increasingly social behaviour and perception, representing the richest form of adaptation toward a social robot. Our findings suggest that users are not solely treating the Pepper robot used here as an object, but are willing to engage in long-term social interactions, and perhaps even establish some form of social connection with Pepper. Thus, our study highlights the difference between learning how to use an object through repeated usage, as observed in traditional HCI studies [see 23], and the social behaviour and perception exhibited toward a robot in HRI settings. This distinction emphasises the unique nature of social interaction in HRI and the need for a deeper understanding of human–robot communication beyond traditional usability perspectives.

5.4 Talking to Robots Positively Affects People’s Well-Being

In terms of well-being, we found that participants’ moods improved after interacting with Pepper, and also across the 10 sessions. Moreover, across the 10 sessions, participants reprted Pepper’s responses to be more comforting. Our results revealed that the discussion frame per se did not have a meaningful or significant effect on people’s moods and on the way people perceived Pepper’s comforting responses. These findings provide further valuable evidence for the positive outcomes of employing a social robot as an intervention supporting people’s well-being. Moreover, our results here add to previous studies [e.g., 47, 116, 38] that show the benefits of using robots for emotional support. Taken together with other results from this study (i.e., that people self-disclose increasingly more to a social robot over time and that people perceive a robot as more social over time), this study provides crucial evidence for establishing relationships with robots in health and care settings. These findings contribute to the introduction of social robots as conversational partners, and how this type of verbal interaction could support people with emotional regulation by talking about stressors and well-being. Simple tasks, like the one described in the study, are relatively easy to administer automatically in HRIs (by focusing on providing general and broad responses to users’ disclosures) but can simulate effective procedures via self-disclosure like affect labelling [78] and other emotional introspective processes with users self-reflecting on their emotions and behaviours [79]. Accordingly, social robots can offer meaningful opportunities for self-managed interventions designed to support people’s emotional health and well-being.

Another key finding in this regard has to do with people’s feelings of loneliness. We found that over time across the experiment, participants reported feeling significantly less lonely. Loneliness is both a risk factor and a symptom of mental disorders and is a significant and growing public health issue with many comorbidities [117]. The recent COVID-19 pandemic stressed loneliness’s tremendous effect on individuals’ lives and society and highlighted the need for accessible intervention and support [118]. Social robots are often discussed as potential companions for people suffering from loneliness [see 119, 120, 121], especially concerning the Covid-19 pandemic [e.g., 63] with growing media attention [e.g., 122] and public initiatives [see 123]. Our results here further support that using objective and systematic measures, showing that repeated interactions with social robots reduced people’s feelings of loneliness. This calls for further innovation and future research targeting loneliness as a public health issue using social robots.

5.5 Robots that Discuss Emotional Content can Simulate Feelings

Consistent with previous results in single-session HRIs [3], the discussion frame did not affect people’s self-disclosure toward social robots or the way they perceived the robot or the interaction. However, our results do suggest that framing a discussion with a robot around a more emotional topic may elicit more emotional feelings among participants. This was specifically observed in this study with feelings of loneliness and stress. Our results here showed that when Pepper addressed the COVID-19 pandemic, participants reported higher levels of loneliness and stress, compared to participants in the general experiences discussion frame group. This important finding provides initial support for the notion that robots can trigger an emotional reaction from the interaction’s content. When studying robots’ affective capabilities, previous studies often address factors related to the robot’s visual features (e.g., embodiment) or robotic functionalities (e.g., emotional recognition) [124]. Yet, studies aiming at developing and assessing social robotic interventions for well-being should also study the robot’s ability to simulate human affect in different ways [see 124]. Our results highlight the role of content and frame when aiming to simulate human emotions and feelings during HRIs. they further show that robots can trigger complex emotions when addressing meaningful and personal moments and events. Nonetheless, our evidence here is based solely on two factors of loneliness and stress, answering to one emotional frame—mentioning the Covid-19 pandemic. Thus, for further understanding humans’ emotional response to social robotic stimuli, this should be studied with various feelings and emotions, within several settings and in response to different frames. It is important to acknowledge that the limited differences observed between the conditions in this study may be attributed to ceiling effects. This could be because the study took place during the peak of the pandemic, causing participants to primarily focus on the consequences of the pandemic regardless of their assigned condition. Therefore, future studies could explore other approaches for manipulating emotional themes during experimental social interactions with social robots. While established emotion-elicitation techniques have long been used in human–human social interaction research [125], it may be challenging to seamlessly integrate them into HRI behavioural paradigms. Therefore, we believe that researchers interested in such questions further explore these techniques and evaluate their capacity to evoke varying levels of emotion in social interactions with robots.

5.6 Methodological Contribution

Through the present research, we aimed to establish experimental methods that researchers from HRI, as well as from a number of related fields, including psychology, psychiatry, social work, anthropology, and computer science, might wish to use to further explore people’s perceptions of a sociable, humanoid robot in natural everyday settings during prolonged conversational interactions. Beyond exploring general questions regarding how people engage with a social robot from their home settings and how it supports their well-being, the current research also provides a means to further examine the impact of novelty effects, and the impact of long-term social engagement with a robot on behaviour [c.f., 56]. Furthermore, this study can be replicated and tested with various populations, clinical and healthy, in order to understand how social robots could be introduced in different care settings and as interventions using speech-based interactions [c.f., 126]. By introducing this novel paradigm in detail here, and documenting results from a rigorous empirical study using this paradigm, we aim to provide a tool that we hope will be of use to the HRI research community more broadly, while also assisting with facilitating research rigour and reproducibility [1, 8, 127, 128], as well as the development of data-centric robotic models [c.f., 129, 130, 131]. Moreover, we would argue that the online computer-mediated means of human-robot communication used in this experimental design can overcome some of the challenges and barriers that are related to long-term HRI studies in natural ecologically valid settings (such as the costs associated with sending individual robots home for an extended period of time with participants) and suggest alternative means for conducting HRI research in people’s natural settings.

6 Limitations and Future Research

Our study has contributed valuable insights into the effects of long-term repeated interactions with a social robot on self-disclosure behaviour, perceptions, and well-being. However, several limitations should be acknowledged, which open up avenues for future research to deepen our understanding of HRIs and their potential applications in supporting emotional well-being.

6.1 Mediated Embodiment Limits Users’ Perception of Robots

Due to the mediated nature of the interactions, participants’ perception of Pepper’s embodiment and physical presence may have been limited. Conducting the study online enabled us to reach a larger and more diverse sample size, enhancing the external validity of our findings while being cost-effective [132]. This method has proven valuable in generating insights and hypotheses that can later be further examined in real-life settings, enabling a more comprehensive understanding of the topic. Furthermore, while some previous studies claim for the moderating role of physical embodiment [133, 134], recent experimental studies that compared in-person interactions with mediated interactions involving social robots have reported no significant differences in participants’ perception and behaviour [3, 135, 136]. Although online settings may not fully replicate real-life interactions with social robots, they provide an initial exploration of the potential effects of long-term interactions and allow us to examine the specific research questions we aimed to investigate. This is particularly significant due to the widespread adoption of CMC during the Covid-19 pandemic, which made online interactions more commonplace and therefore made our experiment more reflective of the prevailing social context [68]. The controlled environment of an online experiment also facilitated consistent conditions across participants and minimised confounding variables, which is essential for drawing reliable conclusions. While our study’s outcomes offer significant benchmarks and valuable insights for future investigations conducted in real-life situations, they can also provide insights into the significance of robots’ physical presence compared to the prolonged mediated interactions observed in our study. To address the limitation of generalizability, future research could incorporate real-life interactions with social robots to validate and extend our findings. By comparing outcomes from online and in-person interactions, researchers can gain insights into how embodiment influences the effectiveness of social robots as conversational partners.

6.2 The Absence of a Control group

Another limitation of our study is the absence of a control group. Including a control group would have strengthened our ability to make direct comparisons and isolate the effects of the social robot [137]. However, we employed a mixed-factorial design with repeated interactions (i.e., measures), where each participant served as their own control, allowing us to examine changes within individuals over time. This design provided a baseline for comparison and enabled us to examine changes within individuals over time through repeated interactions with the social robot [138, 139]. Moreover, the logistical challenges associated with recruiting and managing an additional control group were considered, which influenced our decision for this experimental design. Whilst the absence of a control group limits the strength of our causal claims to the presence of the robot, our repeated measures design provides valuable insights into the changes occurring over time in individuals’ self-disclosure behaviour, perceptions, and well-being during long-term interactions with a social robot. Future research incorporating a control group and investigating additional variables or interventions alongside the social robot could provide a more rigorous comparison and further disentangle the specific impacts of the robot itself.

6.3 The Challenges of Measuring Self Disclosure and Well-Being

The study primarily relied on self-reported and behavioural measures rather than physiological indicators to assess participants’ well-being. While recognizing the value of incorporating physiological measures, logistical constraints posed by conducting the experiment via Zoom and the long-term nature of the study required careful prioritisation of research objectives. Self-report measures were chosen as they are widely accepted for assessing subjective experiences and have been extensively used in well-being research [140,141,142]. Despite limitations, the focus on self-report measures offers valuable insights into participants’ subjective well-being experiences during long-term interactions with a social robot. Moreover, previous studies found that self reported measurements aligned with participants objective behaviour [140,141,142], also in HRI research [3, 113] and in regards to self-disclosure behaviour [3]. Future studies can replicate the experimental design while incorporating additional physiological measures to further assess participants’ well-being and emotional changes over time. This approach would enable researchers to build upon the findings from this study, and explore the interplay between affective interactions with social robots, well-being, and objective physiological and behavioural indicators.

Further analysis of the disclosure content through automated and manual text analysis methods could provide deeper insights into the emotional content conveyed in participants’ disclosures to the robot. Conducting a more comprehensive textual analysis using advanced computational techniques [see 143, 144] and manual evaluation [e.g., 145, 146, 147] could offer valuable insights and context to participants’ self-disclosures to the robot [93]. Although the study did not utilise manual coding of self-disclosure, the behavioural paradigm employed was specifically designed to encourage self-disclosure by asking participants questions about personal matters. The analysis of both all items and items related to the topic of disclosure provides a comprehensive view of self-disclosure behaviour observed in the study. This approach allows for exploration of broader patterns and trends in self-disclosure and an examination of specific instances of personal disclosures within the defined topics. While manually coding each instance of self-disclosure may be challenging due to the substantial amount of data, using basic measurements such as duration and word count enhances reliability and consistency. These measurements objectively assess changes in participants’ behaviour over time and correlate with personal and meaningful self-disclosures as observed in previous research [95,96,97].

In future research, the open-ended questions incorporated in the study should be analysed using qualitative methods to deepen the understanding of human disclosures to social robots and explore subjective responses provided by participants [89]. Additionally, qualitatively analysing the content of the disclosures will allow for a more in-depth exploration of the nuances and frames within the self-disclosures. Furthermore, conducting secondary analyses to examine the effect of disclosure topics could provide valuable insights [c.f., 3]. This exploration could shed light on the effects of different topics and the order of presentation, contributing valuable knowledge to the field.

7 Conclusion

These results set the stage for addressing social robots as conversational partners in social settings, and how this type of verbal interaction could support people with emotional regulation by talking about stressors and well-being. The study provides crucial evidence for establishing relationships with robots, and their potential introduction as interventions supporting people’s emotional health through encouraging self-disclosure. These results provide meaningful evidence for user experience, acceptance, and trust of social robots and other conversational agents [148, 149], highlighting how the perception of robots and behaviour towards them is closely related. These results hold several implications for assessing interactions as well as interventions with socially assistive robots, and for HRI research in general. Future research is encouraged to replicate and reproduce the current findings with different robots and different populations. In doing so, this will help to overcome the vast challenges and barriers that are related to long-term HRI studied in natural ecologically valid settings.