Ebook Understanding China Through Big Data Applications of Theory Oriented Quantitative Approaches Yunsong Chen Online PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Understanding China through Big Data:

Applications of Theory-oriented
Quantitative Approaches Yunsong
Chen
Visit to download the full and correct content document:
https://ebookmeta.com/product/understanding-china-through-big-data-applications-of-
theory-oriented-quantitative-approaches-yunsong-chen/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Applications of Big Data in Healthcare: Theory and


Practice 1st Edition Ashish Khanna (Editor)

https://ebookmeta.com/product/applications-of-big-data-in-
healthcare-theory-and-practice-1st-edition-ashish-khanna-editor/

Omics Approaches to Understanding Muscle Biology 1st


Edition Yi-Wen Chen

https://ebookmeta.com/product/omics-approaches-to-understanding-
muscle-biology-1st-edition-yi-wen-chen/

Production Planning and Control in Semiconductor


Manufacturing: Big Data Analytics and Industry 4.0
Applications Tin-Chih Toly Chen

https://ebookmeta.com/product/production-planning-and-control-in-
semiconductor-manufacturing-big-data-analytics-and-
industry-4-0-applications-tin-chih-toly-chen/

Big Data Applications in Industry 4.0 1st Edition P.


Kaliraj

https://ebookmeta.com/product/big-data-applications-in-
industry-4-0-1st-edition-p-kaliraj/
Internet Philanthropy in China 1st Edition Chen

https://ebookmeta.com/product/internet-philanthropy-in-china-1st-
edition-chen/

Big Data and Security Third International Conference


ICBDS 2021 Shenzhen, China Yuan Tian

https://ebookmeta.com/product/big-data-and-security-third-
international-conference-icbds-2021-shenzhen-china-yuan-tian/

Primary Mathematics 3A Hoerst

https://ebookmeta.com/product/primary-mathematics-3a-hoerst/

Social Big Data Analytics: Practices, Techniques, and


Applications Bilal Abu-Salih

https://ebookmeta.com/product/social-big-data-analytics-
practices-techniques-and-applications-bilal-abu-salih/

The Internet of Us Knowing More and Understanding Less


in the Age of Big Data Michael Patrick Lynch

https://ebookmeta.com/product/the-internet-of-us-knowing-more-
and-understanding-less-in-the-age-of-big-data-michael-patrick-
lynch/
Routledge Advances in Sociology

UNDERSTANDING CHINA
THROUGH BIG DATA
APPLICATIONS OF THEORY-ORIENTED
QUANTITATIVE APPROACHES
Yunsong Chen, Guangye He, and Fei Yan
Understanding China through
Big Data

Chen, He, and Yan present a range of applications of multiple-source big data
to core areas of contemporary sociology, demonstrating how a theory-guided
approach to macrosociology can help to understand social change in China, espe-
cially where traditional approaches are limited by constrained and biased data.
In each chapter of the book, the authors highlight an application of theory-
guided macrosociology that has the potential to reinvigorate an ambitious,
open-minded, and bold approach to sociological research. These include social
stratifcation, social networks, medical care, and online behaviours among many
others. This research approach focuses on macro-level social process and phe-
nomena by using quantitative models to statistically test for associations and
causalities suggested by a clearly hypothesised social theory. By deploying theory-
oriented macrosociology where it can best assure macro-level robustness and reli-
ability, big data applications can be more relevant to and guided by social theory.
An essential read for sociologists with an interest in quantitative and macro-
scale research methods, which also provides fascinating insights into Chinese
society as a demonstration of the utility of its methodology.

Yunsong Chen is Professor of Sociology at Nanjing University. He earned a


DPhil in sociology from University of Oxford, Nuffeld College. His main
research interest lies in advanced quantitative methodology in sociology, social
capital, and big data in social science. He has published in Social Networks, British
Journal of Sociology, Social Science Research, The Sociological Review, Poetics,
Journal of Contemporary China, and leading Chinese journals.

Guangye He is Associate Professor at Nanjing University, Department of


Sociology. Her research focuses on family sociology, social stratifcation, and quan-
titative methodology in sociology. She has published in Social Science Research,
Chinese Sociological Review, China Review, and Journal of Contemporary China.

Fei Yan is Associate Professor of Sociology at Tsinghua University. He received


his PhD in sociology from University of Oxford and completed a postdoc from
Stanford University. His research focuses on political sociology, historical soci-
ology, and sociology of development. His work has appeared in Social Science
Research, The Sociological Review, Social Movement Studies, Poetics, Urban Studies,
and Oxford Bibliographies in Sociology.
Routledge Advances in Sociology

310 Exploring Welfare Bricolage in Europe’s Superdiverse


Neighbourhoods
Jenny Phillimore, Hannah Bradby, Tilman Brand, Beatriz Padilla and Simon
Pemberton

311 The Home in the Digital Age


Antonio Argandoña, Joy Malala and Richard C. Peatfeld

312 Coronavirus Capitalism Goes to the Cinema


Eugene Nulman

313 Suicide Social Dramas


Moral Breakdowns in the Israeli Public Sphere
Haim Hazan and Raquel Romberg

314 Understanding China through Big Data


Applications of Theory-oriented Quantitative Approaches
Yunsong Chen, Guangye He and Fei Yan

315 Transnationalism and the Negotiation of Symbolic Boundaries in the


European Commission
Towards an Ever-Closer Union?
Daniel Drewski

316 Anxiety in Middle-Class America


Sociology of Emotional Insecurity in Late Modernity
Valérie de Courville Nicol

317 Boredom and Academic Work


Mariusz Finkielsztein

For more information about this series, please visit: https://www.routledge.com


/Routledge-Advances-in-Sociology/book-series/SE0511
Understanding China
through Big Data
Applications of Theory-oriented
Quantitative Approaches

Yunsong Chen, Guangye He,


and Fei Yan
First published in English 2022
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
52 Vanderbilt Avenue, New York, NY 10017
Routledge is an imprint of the Taylor & Francis Group, an informa
business
© 2022 Yunsong Chen, Guangye He and Fei Yan
The right of Yunsong Chen, Guangye He and Fei Yan to be identifed
as authors of this work has been asserted by them in accordance with
sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or
reproduced or utilised in any form or by any electronic, mechanical,
or other means, now known or hereafter invented, including
photocopying and recording, or in any information storage or retrieval
system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks
or registered trademarks, and are used only for identifcation and
explanation without intent to infringe.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging in Publication Data
A catalog record for this book has been requested
ISBN: 978-0-367-75826-4 (hbk)
ISBN: 978-0-367-75825-7 (pbk)
ISBN: 978-1-003-16416-6 (ebk)
Typeset in Galliard
by Deanta Global Publishing Services, Chennai, India
Contents

List of fgures vii


List of tables ix
Preface xii

PART I
Introduction 1

1 Bringing big data to quantitative macrosociology 3

PART II
Mapping public discourse and social stratifcation 19

2 Social stratifcation as a public discourse in China, 1949–2008 21

3 Public concerns about class immobility and economic inequality 34

4 Self-perceived social mobility and class solidifcation 52

5 Stratifed nostalgia for the Chinese revolutions 70

PART III
Portraying social transformations and cultural practice 87

6 The international visibility of Chinese cities in modern times 89

7 The cultural determinant of foreign direct investment 106

8 The effect of cultural familiarity on inbound tourism 132

9 Coauthor networks in China’s humanities and social sciences 148


vi Contents
PART IV
Revealing public health and community wellness 165

10 Evaluating effect of PM2.5 exposure on suicidal ideation 167

11 Forecasting trends in prevalence and incidence of HIV/AIDS 179

12 Profling the vaping epidemic and public favorites on e-cigarettes 194

13 Measuring public concerns in LGBT issues 203

References 220
Index 252
Figures

2.1 Changing trends of social class concern by macroeconomics,


income inequality, political participation, and public opinion 26
2.2 The total value of the annual word frequency ratio in class- and
stratum-related vocabulary 29
3.1 Rising search volume of immobility-related words on Baidu 39
3.2 Index of class immobility concerns of Chinese provinces (2008–
2014) 41
4.1 Frequency of appearance of the phrase “class immobility” on
Weibo (left), news webpages (middle), and online forums
(right) in different provinces (2011–2016) 53
4.2 Monthly search volumes for “second-generation rich,” “second-
generation offcial,” “second-generation impoverished,” and
“second-generation poor” on Baidu (2010–2016) 53
5.1 IRN of the Chinese provinces, 2008–2014 78
5.2 IRN based on different types of red songs, 2008–2014 80
6.1 Top ten Chinese cities in the Google N-gram Corpus,
1700–2000 94
6.2 Top ten Chinese cities in New York Times, 1851–2000 97
6.3 z-Score of international visibility and media exposure of the
Chinese cities, 1851–2000 98
6.4 z-Score of overall international visibility and media exposure of
mainland Chinese cities, 1851–2000 101
7.1 International visibility of Chinese provinces (1900–2008) 114
9.1 Proportion of collaborative papers in the CSSCI Source Journal
Catalogue (Chinese C journals) and top Chinese and foreign
journals related to humanities and social sciences (2007–2017) 149
9.2 Collaboration networks of core journals in the humanities and
social sciences (2007–2017) 155
9.3 Scale-free graph of node centrality and number of papers in
collaboration networks 161
viii Figures
9.4 Collaborative network in economics 162
10.1 Weekly PM2.5 level and ISI in Chinese cities by season (2017) 175
11.1 Yearly HIV/AIDS incidence in China by province in 2009 and
2013 183
11.2 Standardized monthly HIV/AIDS incidence and BSI by
province from January 2009 to December 2013 185
11.3 Standard error for PMG model 187
11.4 The true and predicted logCDC in six provinces with high
HIV/AIDS concentration 188
11.5 The forecasted logCDC using seasonal ARIMA 191
11.6 The forecasted error by province 192
12.1 Mean index of electronic cigarettes search 198
13.1 Trend of LGBT-related content and associated sentiment in
webpages (left) and Sina-Weibo feeds (right) in 31 provinces
during 2011–2016 206
13.2 Index of searches for LGBT interest in 31 Chinese provinces
(2009–2015) 209
13.3 Regional differences in LGBT interests in China 210
Tables

2.1 Statistics indicating vocabulary about class and stratum in


Google N-gram (simplifed Chinese) (1949–2008) 25
2.2 PCA result of stratum vocabulary 27
2.3 Granger causality test (1978–2008) 30
3.1 Descriptive statistics of variables of Chinese provinces related to
class immobility concerns (2008–2014) 44
3.2 Dynamic panel regressions of ICC in Chinese provinces (2008–
2014) 46
4.1 Statistics of the main variables in the model analysis 60
4.2 Multiple model regression results (dependent variable = ICC) 62
4.3 Robustness test model: measurement and dynamic panel model
based on different ICC variables 65
5.1 The full list of red songs (search terms in the Baidu Index) 76
5.2 Descriptive statistics of variables of Chinese provinces related to
the index of revolutionary nostalgia (2008–2014) 81
5.3 Dynamic panel regressions of IRN in the Chinese provinces,
2008–2014 83
6.1 Time series analysis on city international visibility (IV) and
media exposure (ME) 100
6.2 Time series analysis on international visibility (IV), export
trading (ET), and urban population (UP) in selected cities 102
7.1 Descriptive statistics of variables of Chinese provinces related to
FDI 115
7.2 Dynamic panel regressions of FDI into Chinese provinces
(1994–2004) (international visibility extracted from English
books) 120
7.3 Dynamic panel regressions of FDI into Chinese provinces
(1994–2004) (international visibility extracted from alternative
corpora) 123
x Tables
7.4 Temporal difference of the role of international visibility in FDI
infows (1994–2004) 125
7.5 Spatial difference of the role of international visibility in FDI
infows (1994–2004) 128
8.1 Descriptive statistics of variables related to the index of cultural
familiarity 140
8.2 Dynamic panel regressions of predicting the effect of ICF on
inbound tourism (1994–2004) (without controlling for NNV
and TNV) 142
8.3 Dynamic panel regressions of predicting the effect of ICF on
inbound tourism (1994–2004) (controlling for NNV and TNV) 143
8.4 Dynamic panel regressions of predicting the effect of ICF
on foreign exchange earnings from tourism (1994–2004)
(controlling for NNV and TNV) 144
8.5 Search terms for Chinese provincial-level regions 146
8.6 Dynamic panel regressions of FDI into Chinese provinces
(1994–2004) 147
9.1 Comparison of paper collaboration in core journals in the
humanities and social sciences from 2007 to 2017 152
9.2 Indicators of collaborative network in the humanities and social
sciences from 2007 to 2017 (N = 8044) 158
9.3 Collaboration networks of core journals in the humanities and
social sciences from 2007 to 2017 161
10.1 Factor loadings of depression search index in Baidu 171
10.2 Descriptive statistics of major variables related to the index of
suicidal ideation 172
10.3 SGMM predicting suicidal ideation by season using ISI-1 176
10.4 SGMM predicting suicidal ideation by season using ISI-2 177
11.1 Descriptive statistics for variables related to HIV/AIDS
incidence 184
11.2 Long- and short-run association of HIV/AIDS BSI and HIV/
AID incidence 186
12.1 Descriptive statistics of selected variables related to the index of
e-cigarettes search 197
12.2 Pooled OLS and fxed-effects model predicting ICS 199
12.3 Fixed-effects model predicting ICS2 200
12.4 Search terms for electronic cigarettes–related word 202
12.5 Construction of ICS1 using factor analysis 202
13.1 Descriptive statistics of key variables related to public interests in
LGBT (2009–2015) 211
Tables xi
13.2 Dynamic panel models predicting interest in LGBT issues in
China (2009–2015) 212
13.3 The SGMM results of different measures of LGBT interest 215
13.4 Search terms for LGBT-related words 217
Preface

This book, arguably even more than usual, is a product of its times. Far from
being a long gestating project, the volume was conceived unexpectedly and in
haste during the outbreak of the novel coronavirus (2019-nCoV or COVID-19)
in early 2020. Like many of our colleagues, Guangye He, Fei Yan, and I found
ourselves living in lockdown conditions at home for months in order to avoid
infection—in downtown Shanghai and in suburban Nanjing. In this sense, the
book is one of the few positive by-products of the global pandemic. At the start of
the outbreak, I was preoccupied with a volunteer project to quantify anti-plague
risks for some 300 major cities in China using multisource big data. As the project
approached its end, I thought about what to do next. The rapidly evolving condi-
tions, particularly in China, gave me a new sense of motivation and urgency to
bring together several China-focused articles using big data and publish them in a
volume that would hopefully be more visible to scholars and policymakers, rather
than hidden in academic journals. I shared my plan and the proposed outline of
the book with both coauthors in a WeChat group one night in January 2020.
Their enthusiastic and immediate responses made it clear that I was not the only
one who felt the need to contribute meaningfully to the public discussions rag-
ing around us. Faced with an onslaught of disease-related news, fake or true, we
felt frustrated, but increasingly determined to harness powerful data to describe,
explain, and even predict social transactions on a macro level in a more robust
way. As we watched in horror as Wuhan and other cities in China were ravaged by
COVID-19, our sense of obligation to make our research useful could never have
been stronger, pushing us to fnish the project at an unheard-of pace.
In this book, we seek to (1) clarify what big data can add to quantitative
sociology in the present day and (2) show what a theory-oriented quantitative
macrosociology looks like by deploying big data to examine a range of social
processes in contemporary China. At the core of theory-oriented quantitative
macrosociology is the quantifcation of key social factors extracted from big data;
these quantifed social factors are then subjected to macro-level theory testing
using conventional model regression. My coauthors and I are interested in how
big data can shift at least part of the focus of quantitative sociology from the
micro level to the macro level, where “bigness” can directly contribute to our
understanding of macro-social processes. For us, big data has powerful potential
Preface xiii
and relevance for social science; used properly, it can link data, theory, and com-
putational methods more robustly and easily, thereby helping sociologists to pro-
vide with the world more powerful arguments and greater infuence. Despite the
advantages and value of this approach, however, big data macrosociology has, by
and large, been ignored by the extant big data literature in social science.
The book is enriched by the collaboration with my coauthors, two diligent and
talented young sociologists with whom I have worked in recent years. Together,
we are committed to demonstrating the value of big data for core sociological
inquires, with special attention to China, where we live and work. Alongside my
coauthors, I would like to extend our appreciation for the productive discussions
we have had with colleagues in China, Europe, and the United States.
First of all, we would like to thank Shuanglong Li, Senhu Wang, Jiankun Liu,
Buwei Chen, Ting Ge, Xiaoshan Lin, and Guodong Ju, who helped to write
some of the chapters and assisted with data cleaning and critical reviews. Special
thanks to Yu Xie, Andrew Walder, Peter Hedström, Michael Biggs, Xiaohong
Zhou, Yanjie Bian, Yi Zhang, and Xiaogang Wu for being wonderful mentors
to all of us. We have had the great fortune to be affliated to several prestigious
academic institutions: the Hopkins Nanjing Center (Nanjing University–Johns
Hopkins University Center for Chinese and American Studies), the Institute of
Analytical Sociology at Linköping University, the Department of Sociology at
Nanjing University, the Department of Sociology at Tsinghua University, The
Shorenstein Asia-Pacifc Research Center at Stanford University, the Chinese
Sociological Association, and the International Chinese Sociology Association.
These institutions have permitted us to focus on this project by providing suf-
fcient time and the most stimulating intellectual environments. Without their
generous support from the outset of this project, this book would never have
been written. The research presented in the book has also appeared in leading
journals including Social Science Research, The China Quarterly, and Urban
Studies, and been supported by generous grants from the National Social Science
Fund of China. Ultimately, however, it is Simon Bates, our amazing editor at
Routledge, who deserves our greatest gratitude for his incredible patience and
support throughout this time. Thank you for helping us get this book to the
fnishing line. By publishing as quickly as we have, we hope that the book will be
able to contribute to some of the most pressing questions and discussions in this
diffcult time.
Yunsong Chen
Nanjing University
Part I

Introduction
1 Bringing big data to
quantitative macrosociology

Introduction
Big data burst onto the scene of social science nearly a decade ago. Coined by
Manovich (2011) to describe datasets too large to be stored and analyzed by
conventional software and personal computers, the term has become a data-sen-
sitive meme in felds as varied as business, sports, journalism, science, and public
health, entailing a near-universal pivot toward data-driven research, business, and
governance (Edelmann et al., 2020; Langlois, Redden & Elmer, 2015; Mayer-
Schönberger & Cukier, 2013; Veltri, 2017). The unprecedented scope and scale
of big data and the variety of qualities—including variety, velocity, volume, and
values—that it can sort in the process of digitally recording the traces of social
transactional activities make it a compelling subject for research into the “social
world” (Kitchin & McArdle, 2016; Savage & Burrows, 2007).
In the feld of sociology, big data brings with it both high expectations and
heated debate. On the one hand, it represents an enormous new source of “digi-
tal footprints” comprising individual actions and social transactions among bil-
lions of people in real and historic time, along with a battery of new approaches
to collect, describe, and analyze them (Halford & Savage, 2017; McFarland,
Lewis & Goldberg, 2016; Watts, 2012). This unprecedented wealth of informa-
tion greatly accelerated expectations for its potential application to social science
research and scholarship, suggesting that the very foundation of empirical studies
in social science would be reconstructed (King, 2014).
Many scholars have pointed to the signifcance of big data in arming soci-
ologists with access to new research resources and opportunities. For example,
Lazer and Radford (2017) summarized fve opportunities that big data can offer
sociologists, namely, accessing meaningful social behavior, monitoring social
phenomena, analyzing data on social systems, providing data for experiments,
and supporting data heterogeneity. Evans and Aceves (2016) surveyed compu-
tational approaches for large-scale analyses on textual data, highlighting the use
of machine learning for theorizing the nature of collective attention, social rela-
tionships, and communication lurking in enormous volumes of archives. Many
robust big data analyses have emerged in recent years, focusing on the applica-
tion of multiple-source big data to diverse topics in core areas of contemporary
4 Big data and quantitative macrosociology
sociology. Overall, as Burrows and Savage (2014, p. 5) pointed out, “sociologists
need to be prepared to intervene in the world of Big Data in order to ensure we
command a voice in this new terrain.”
On the other hand, despite its promise, big data analytics in sociology has
two key limitations. One is that without the theoretically informed and con-
text-driven research that come from domain expertise, the purely computational
approaches of big data analytics can cause research to devolve into speculative
data mining. For sociology, big data applications relying on black-box tools con-
fict with the hermeneutic tradition that is at the core of the discipline (Kitchin,
2014a; Pasquale, 2015).
The other limitation of big data analytics is that despite its size, big data can
still be biased; the agents, applications, and devices producing and collecting the
data can themselves be either selective or manipulated. This points to the paradox
that despite its name, big data is likely to be either “small,” representing only a
subset of social transactions among particular demographics and thereby cap-
turing partial and/or fragmented information (McFarland & McFarland, 2015;
O’Brien, 2016; Park & Macy, 2015; Shaw, 2015); or “artifactual,” whereby
social forces, including censorship, political robots, and system error manipulate
the process of information production, leading to the proliferation of artifacts,
errors, and anomalies (see Lazer & Radford, 2017).
Sociology is now at a crossroads. Although pressured by burgeoning intellec-
tual forces, in particular those harnessing computational approaches and engaging
with big data, sociologists still lack a clear road map leading to effective integra-
tion of big data analytics with contemporary sociology. Their resistance has much
to do with skepticism born of the defciencies in approaches to big data. More
importantly, sociologists need to fnd some mode of study that can lead to some-
thing more than mere fancy analytical tools and exciting results; we need tools
that lead to clear solutions, and we need templates for research that formally link
data, theory, and methodology in more robust, scientifc, and sociological ways.
Put simply, we need to choose precisely where to insert big data into a range of
key facets of empirical sociology—whether it should best be used to portray big
pictures, unveil hidden structures, verify null hypotheses, or infer causality.
The answer is frst to turn back to the data themselves and to ask not what
makes big data exciting, but rather which dimensions of sociology big data is
most aligned with. More precisely, can big data be a kind of macro-data? What is
big data’s advantage when compared with other solutions in sociological inquiry,
such as assembling survey data? In this chapter, we will address these concerns
and show that the empirical strength of big data can be expected to elicit the
emergence of a new type of research that we have so far largely ignored in the
territory of empirical sociology: theory-guided quantitative macrosociology.
For sociology, despite an initial surge of interest and a powerful residual skep-
ticism, big data has been expected to offer insights into each subfeld of the dis-
cipline, not only because each facet of our daily lives has been penetrated in real
time and over time by sophisticated big data apparatuses, but also because the
recorded social environment—the entirety of human behavior, interaction, and
Big data and quantitative macrosociology 5
thought—constitutes a panoramic data repertory that offers us a rare opportunity
to inspect society in an entirely new way. It is important to note that big data is
a composite of myriad transactions of myriad individuals. This reminds us that
despite early claims that the sheer size of big data can attenuate many of its cons
and biases (Mayer-Schönberger & Cukier, 2013), ultimately it is not the size of
big data that matters but the ontological level of information that we can extract
from it. That is, we should critically interrogate available big data to harness its
strength at the macro-level and from a macro-perspective.
Theory-guided quantitative macrosociology has made notable inroads in its
integration of big data in macro-level analysis. This novel approach has the poten-
tial to contribute to sociological studies by exploiting distant reading to get a big
picture of the sizable unread portions of the corpus, which cannot be achieved by
traditional qualitative approaches featuring close reading on selected archives and
quantitative methods defned by model regressions on limited surveyed samples.
The rich spatial and temporal dynamics available through this line of research is
extremely promising.

Data assemblage versus big data


Sociologists today are daunted by the same big questions that consumed soci-
ologists in the mid-twentieth century, including the relation between economy
and culture, the factors that lead to social inequality, and whether and why
social behaviors can be contagious. This is because when focusing on society
from an ecological or systematic perspective, no single information package is
suffciently informative to capture the big picture over large temporal and spatial
scales. Consequently, to explore the confguration and regulation of sociocultural
environments, macro-sociologists tend to bypass quantitative methodology and
resort to abstract theory constructs, which in turn often invite criticism for induc-
ing tautology and ambiguity. While there are certainly some exceptional macro-
analyses using quantitative approaches, particularly some transnational analyses
in the traditional felds of sociology such as social stratifcation and inequality,
macro-analyses remain relatively rare compared with individual or micro-level
regressions, which are predominant in the arena of quantitative sociology, thanks
to the availability of a vast amount of well-designed social surveys and the lack
of data about macro-social indicators. This has cast a shadow across the entire
realm of macrosociology, despite the claim of self-suffciency that macrosociology
shares with philosophy and the humanities.
There are two ways to tackle this problem. One, proposed by Halford and
Savage (2017), is called “symphonic social science,” a term proposed to label a
new methodology making use of data assemblage to test big theories. The other
is big data itself, some inspiring empirical applications of which have been intro-
duced in sociological areas.
Because accessing and deploying various sources of surveyed sample data is
relatively easier than harnessing big data, assemblage of survey data has a distinct
advantage; in fact, it can even be seen as a type of comparison analysis. Halford
6 Big data and quantitative macrosociology
and Savage (2017) argued that the symphonic research paradigm in effect com-
bines micro- and macro-level research and integrates information from conven-
tional survey, regression statistics, and ethnographic and interview data under
the same framework. By exploring the contradictions and complementarities of
fndings from diverse datasets, sociologists can pursue the understanding of major
social questions in a symphonic way.
Specifcally, Halford and Savage (2017) used three well-known books to
illustrate symphonic social science research: Thomas Piketty’s Capital in the
Twenty-First Century (2013), Robert Putnam’s Bowling Alone (2000), and
Richard Wilkinson and Kate Pickett’s The Spirit Level (2011). The three works
similarly deployed large-scale heterogeneous data assemblages and repurposed
fndings from multiple data sources instead of representative samples or eth-
nographic case studies. The three books thus “relied on the deployment of
repeated ‘refrains,’ just as classical music symphonies introduce and return to
recurring themes, with subtle modifcations, so that the symphony as a whole is
more than its specifc themes” (Halford and Savage, 2017, p. 4). Compared to
conventional sociology using formal models and championing parsimony, sym-
phonic social science draws on a more aesthetic repertoire and sets more store
in prolixity.
Still, Halford, and Savage (2017) conceded that symphonic projects are time-
consuming and that they require signifcant workload and resources. The scope
of those projects also demands long-form presentation, such as books rather
than shorter works such as articles, to allow for the derivation of argument from
empirical and theoretical resources. More importantly, assembling conventional
survey data can only construct a data repertory containing information from sur-
veyed samples. This suggests that data assemblage improves merely the scale of
data, not the informativity of data. In this regard, key factors of a macro-analysis
of interest, often featured by large-scale temporal and spatial scale, are very likely
to be unavailable in conventional survey datasets. Big data therefore matters more
for macrosociology.

Putting big data at the heart of macrosociology


Sociologists have long recognized the enormous potential of using big data to
dissect social process and phenomena. In the last decade, especially over the past
fve years, pioneering sociologists have endeavored to link theory, data, and com-
putational algorithms as a composite whole to gain sociological insight (Berman
& Hirschman, 2018). In this section, we group reviews of works empirically
exploring two aspects of big data applications: how to operationalize core theo-
retical constructs and map a big picture for sociocultural structures and trends;
and how to quantify a certain variable that is hard to measure using survey data,
for the sake of testing theories using conventional regression models. Although
these two tasks are big-data-driven and theory-guided, the respective studies
are organized and presented in different ways. This divergence has largely been
ignored in present debates about big data’s application in social science.
Big data and quantitative macrosociology 7
Charting the sociocultural milieu for theorizing
For scholars and researchers determined to systematically examine the sociocul-
tural milieu as a composite whole, big data is an uncontested resource. Almost
all core constructs of macrosociology, such as social system, collective action,
discourse, feld, expression, and contagion, lurk in colossal volumes of digital
archives, and many scholars have advocated mobilizing big data to help uncover
and measure sociocultural meaning in digitalized and semantic archives (Bail,
2014; DiMaggio, 2015; Frade, 2016; Halford, Pope & Weal, 2012; Halford &
Savage, 2017; Lee & Martin, 2015; Mützel, 2015). For example, a special issue
in the journal Poetics was devoted to the theme of applying an array of topic mod-
els in cultural sociology, tracing the ontological tradition back to content analyses
pioneered in the 1950s (Mohr & Bogdanov, 2013). The essence and strength of
large-scale textual analysis lies in the synthesis adjoining conventional qualitative
methods and novel computational techniques for big data analytics (Bail, 2014;
Nelson, 2019), which can be counted on to advance our understanding of socio-
cultural processes.
As a result, cultural sociology is among the frst sociology subfelds to engage
with big data, and it has made substantial progress in harnessing several com-
putational approaches, ranging from accessing huge unstructured data to meas-
ure sociological meaning, to lifting the methodological capacity to empirically
develop, derive, refne, and test sophisticated theories of the social origins of
meaning, and to explore important theoretical constructs. Some have used a
range of topic models to reveal how social position and structure (e.g., gender,
organizations, and identities) work in shaping cognitive frames, discourse, and
social logics in cultural archives, including organizational publications, govern-
mental documents, academic journals, newspapers, and literature (Bail, 2012;
DiMaggio, Nag & Blei, 2013; Jockers & Mimno, 2013; Mohr et al., 2013).
Some have used large book corpora to map the temporal trends of tangible and
intangible sociocultural phenomena and entities over a period of hundreds of
years for a distant reading and comparison (Chen & Yan, 2016a, 2016b, 2018;
Chen, Yan & Zhang, 2017; Chen, Yan et al., 2020; Chen, He et al., 2020;
Guggenheim, 2014; Kozlowski, Taddy & Evans, 2019; Michel et al., 2011).
Others have uncovered the hidden links among cultural products, such as pub-
lished academic articles or music videos on YouTube or Twitter, to explore the
evolution of networks as a whole and to extend relevant theories (Airoldi, Beraldo
& Gandini, 2016; Foster, Rzhetsky & Evans, 2015; Goldenstein & Poschmann,
2019; Rzhetsky et al., 2015; Tangherlini & Leonard, 2013; Tinati et al., 2014).
These studies tend to provide an overview of social processes of interest in
which operationalizing theory constructs serves to chart the milieu for theo-
rizing. We know sociological theory can be divided into two subsets: concepts
that trace social entities, and relationships that link and structure social entities.
Although theory testing, especially testing the relationship between two social
entities, remains central to quantitative research, big data can augment this line
of analytical focus and clarify social concepts and structures by also “fguring
8 Big data and quantitative macrosociology
out how to structure a mountain of data into meaningful categories of knowl-
edge” (Goldberg, 2015, p. 3). In this mode of sociological investigation, sociolo-
gists with methodological expertise employ theorized concepts and structures to
direct the process of exploiting the richness of big data. In turn, data directs the
further investigations and the process of interpretation and theoretical derivation,
just as Kitchin (2014b, p. 6) proposed: “Many supposed relationships within data
sets can be quickly dismissed as trivial or absurd by domain experts, with others
fagged as deserving more attention.”

Quantifying elusive indicators for theory testing


Two studies using textual analysis tools merit close inspection to show how big
data analysis can help theory testing. One is Jockers and Mimno’s (2013) study on
themes of 3,000 nineteenth-century works of fction from the United Kingdom
and the United States, using a topic model to reveal the topics of historic litera-
ture. The other is Bail’s (2012) investigation on how fringe anti-Muslim organi-
zations infuenced media discourse and became part of mainstream media, using
discourse frames in the news media to quantify certain variables for further theory
testing after a distant reading of the meaning of the large volumes of text. In both
studies, textual analysis served merely as an instrument to quantify variables that
are essential for model regression as the primary analysis.
Jockers and Mimno (2013) investigated the relationship between literary
themes and sociodemographic attributes, such as authors’ gender, using an assem-
bled corpora containing 3,279 works of fction from the United States and Great
Britain (including Ireland, Scotland, and Wales) from 1750 to 1899. They found
that when themes had been identifed through topic-modeling technology and
assigned to each work, some themes exhibited a one-gender-dominant feature of
the authors, suggesting that men and women might have chosen different themes
in composing their fction. For example, the authors of works categorized under
the theme “female fashion” were mostly females, while the authors of works cat-
egorized as “enemies” were mainly males (the gender ratio of a given theme can
be computed by comparing the proportions of words written by female and male
authors that are assigned to the same theme).
However, to assert the presence of a skewed gender ratio for a certain theme,
one needs more information about the range of proportions of male and female
authors for this theme, because even if there were no underlying gender differ-
ence in topic use, it is still unlikely to observe an evenly divided (50:50) distribu-
tion. In the language of statistics, one needs to test for the null hypothesis that
there is no gender distinction by estimating the probability of observing a gender
difference under the framework of randomness. Therefore, having identifed a
range of topics as themes of the works of fction on the corpora, Jockers and
Mimno (2013) further utilized randomized permutation (shuffing the gender of
real authors to create a virtual topic without a skewed gender ratio to compare
with the original topic) to show that the observed gender difference in topic use
was unlikely to have been caused by chance. In addition, a bootstrap method
Big data and quantitative macrosociology 9
was used to calculate the confdence intervals. They even tested for whether the
topical proportions of any given text could be used to predict author gender. As
noted above, Jockers and Mimno’s (2013) work stands out because they used the
result obtained from big data analytics—namely, themes identifed by topic mod-
els—to serve as a vehicle to further construct quantifed variables to test theories
of interest by null hypotheses testing.
Bail (2012), whose work is more theory-driven, asked whether and how small
associations (fringe civil society organizations) create cultural changes to the
same extent that large associations do (as shown by previous studies). Under this
framework of empirically testing theory, Bail (2012) employed data assemblage
to construct measures for media infuence of 120 anti-Muslim fringe organiza-
tions as the dependent variable and measured the distance of their 1,084 news
releases from average mainstream media frames. Specifcally, he resorted to a
plagiarism-detection tool to compare each of 1,084 press releases of the fringe
organizations to 50,407 newspaper articles from six national media sources. The
number of words reproduced verbatim or paraphrased in national media was thus
used as the proxy of media infuence of anti-Muslim fringe organizations, which
would be used as the outcome, Y, in the regression model.
Bail also manually coded the media releases of all civil organizations and iden-
tifed fve Muslim-related frames, before proceeding to measure the value X (the
extent to which voices of anti-Muslim organizations were “fringed” from voices
of mainstream civil society) by calculating the Euclidean distance from the fve
frames in each press release to the average for all other civil society organiza-
tions. After this painstaking textual analysis on assembled data to construct reli-
able measures for the dependent and the main explanatory variable, Bail ft a
regression model on 1,084 press releases as samples to test the hypothesis, con-
trolling for a constellation of factors such as interorganizational networks, nar-
rowness of mission, and displays of fear/anger. Like Jockers and Mimno (2013),
Bail (2012) positioned textual analysis as a part of the process of testing the
hypothesis, rather than at the core of the research as a whole. Unlike Jockers and
Mimno (2013), however, Bail (2012) tested a well-derived and explicit theory
and organized his paper in a way familiar to all quantitative sociologists: using a
negative binomial regression model to see whether and how media infuence of
fringe organizations is shaped by its position within the discursive feld, as well
as various factors.
Some recent studies have adopted this strategy to make the best use of big
data for sociological inquiries in addressing such core concepts as social class, class
awareness, socioeconomic interactions, and mobility beyond frames and topics.
For example, Chen and Yan (2016a) aimed to address an intriguing proposal that
can be traced as far back as the Marxist theory of class: the plausible association
between macroeconomic conditions—particularly inequality—and public percep-
tions of social class. Despite the great number of arguments about the effects of
economic conditions, few empirical studies have attempted to use a formal model
to explore the link between economic condition and class consciousness/concern
over a temporal span as long as a century; given the yawning wealth gaps in the
10 Big data and quantitative macrosociology
United States over the past 30 years, this is somewhat surprising (Piketty, 2013;
Piketty & Zucman, 2014).
But measuring people’s concerns about social class over a hundred years is not
easy. Survey questionnaires and online information, the common tools of socio-
logical research, are limited to capturing social attitudes of contemporary subjects
only. To address this, Chen and Yan (2016a) used Google N-gram book corpora
to generate an index measuring concerns about class among the American public
between 1900 and 2000 by computing the normalized annual appearance of
class-related words in books published in the United States.
Quantifying class concerns over 100 years is straightforward: glean the trends
(time series) of as many class-related words as possible over the twentieth century
from the book corpora, and then use principal components analysis to construct
an overall curve for word appearance to capture the shared trend commanding all
curves of class words. Although this exhaustive approach to quantifying class con-
cerns is not error-proof, it provides a possible path to help sociologists measure
those unmeasurable variables using survey data. With measured class concerns,
Chen and Yan (2016a) used a Granger test to check the economic misery index
to predict an index of class concerns in a traditional way, by testing null hypoth-
esis to validate their theoretical proposal.
In this vein, Chen, Yan et al. (2020) further revisited the theory of the
Werther effect (copycat suicide) using a similar strategy. Their study is also the-
ory-driven, because while most extant studies have suggested that the Werther
effect can be found via news and electronic media, it is unknown whether in
contemporary cultural contexts, books can have a similarly independent role
on suicide suggestion. This is of theoretical importance because the original
idea of the Werther effect was derived from a famed late 18th-century novel,
published at a time when books were the main conveyer of cultural informa-
tion—including suicidal suggestion. To determine whether books, now only
one component of a mass media that includes flm, TV, magazines, newspapers,
and social media, could have a similar effect in the twentieth century, Chen, Yan
et al. (2020) exploited the Google Books N-gram corpus to construct the index
of suicides via books (ISB) and examined the potential association between ISB
and actual completed suicides in the United States between 1950 and 2000.
To rule out the possible infuence of alternative mass media, they constructed
relevant measures by using the New York Times corpus and the Internet Movie
Database (IMDb). The results from a Granger causality test revealed that sui-
cides in non-fction rather than fction books signifcantly predicted actual
suicide rates during the second half of the twentieth century, extensively con-
trolling for suicides in other media as well as contextual factors. With such a
novel application of massive content analysis using data of an unprecedentedly
large size, this kind of big data research contributed to the extant literature an
understanding not only of the important role of media in shaping the process of
suicide contagion but also of the dynamics of social behavior and the relevant
cultural trends that may not be adequately measured using traditional social
science methods.
Big data and quantitative macrosociology 11
Another work using big data as the source of measuring social indicators
is by Askin and Mauskapf (2017), who exploited the data collected by the
Whitburn Project, an online repository containing 26,800 songs from antholo-
gies of Billboard’s Hot 100 charts between 1958 and 2016. After extracting the
relevant information from the Whitburn data to construct measures for song
popularity (peak position and weeks on charts in the Billboard Hot 100) and
musical genre crossover, they matched and assembled these variables with a
series of song-feature indicators (e.g., typicality, length, artists’ visibility) con-
structed from the other big dataset, “Echo Nest sonic feature data,” encom-
passing the feature information of some 30 million songs. They then ft pooled
cross-sectional ordered logit and negative binomial models, using song popu-
larity as Y, song sonic feature typicality as X, and other variables as controls.
The results show that a song’s typicality, or perceived proximity to peer songs,
is non-linearly associated with its position and longevity on Billboard’s Hot
100 charts. By painstakingly assembling variables from multiple big data sets
and revealing the relationship between measures of song popularity and typical-
ity, the authors provided persuasive empirical proof supporting their theory that
a song’s relative position in musical feature space can signifcantly infuence its
market success, enhancing on our understanding of “how content organizes
product competition and audience consumption behavior” (Askin & Mauskapf,
2017, p. 910).
In addition to the aforementioned cases focusing on national- or societal-level
analysis, this strategy has also been used for regional-level analyses at the city and
province levels. For example, Chen and Yan (2018) quantifed international vis-
ibility of Chinese provinces by extracting instances of provincial names appearing
in English-language books and ft panel models to support the hypothesis that
visibility matters for foreign direct investment infows. Li and Yan (2020) used
the revolutionary songs search volumes on Baidu to proxy regional revolutionary
nostalgia and ft panel models to show whether and how nostalgia as a local ideol-
ogy is shaped at the provincial level in China. Likewise, Chen, He et al. (2020)
explored the relationship between exposure to air pollutants and depression in
China using panel data analysis and correlating the weekly provincial-level online
search volumes of depression-related symptoms and medicines with the average
weekly provincial ambient air quality as measured by the concentration of 2.5-µm
particulate matter (PM2.5), controlling for a battery of provincial socioeconomic
factors. In much the same way, He et al. (2018) quantifed prefecture-level online
searches for AIDS-related terms and used it as dependent variables to describe
demands for AIDS medication and the local socioeconomic factors shaping such
demands among the Chinese.
The above big-data-based analyses exploring social processes and transactions
at the local level can be seen as part of the family of quantitative macrosociology
with a theoretical orientation. These approaches exploit big data to measure a
certain elusive aggregate-level attribute and then use it as a key variable to test
for some hypothesized social theory at the aggregate level, although the analytic
level, whether province or prefecture, is not as “macro” as a national-level analysis
12 Big data and quantitative macrosociology
shown in typical macro-research, as exemplifed by Chen and Yan (2016a) and
Chen and Yan et al. (2020).

Toward a theory-guided quantitative macrosociology


Sociology examines how social environment and context affect how people think
and act. Since human society comprises individuals interacting in social relations,
sociological study is naturally to be conducted at both macro- and micro-levels,
focusing simultaneously on broad large-scale features of social processes and on
individual or small-group social interactions. Although macro-level study has
always been central to this feld, which is deeply rooted in the founding work of
such fgures as Emile Durkheim and Max Webber, since the mid-twentieth cen-
tury, it has been micro-level sociology—along with its pivotal theoretical frame-
works (such as symbolic interactionism) and its statistically rigorous techniques
(such as regression)—that has predominated. In this regard, sociology has been
an incompletely dimensioned social science. After all, neither on-the-ground
analysis nor individual samples from social surveys can help sociologists make
powerful macro-level inductions and deductions to analyze social entities larger
than individual lives.
However, quantitative approaches in macrosociology have been hampered by
the diffculties of accessing data at the macro-level. The lack of data and the
strong theoretical tradition in macrosociology are mutually reinforcing, disadvan-
taging macrosociology by the diffculties of performing data-based tests. In this
regard, one contribution of big data to quantitative research is that the unprec-
edentedly large temporal range and the spatial coverage of novel data provide a
rare opportunity to construct valid measurements for macro-social indicators that
are often impossible to measure using conventional survey methods. However,
because sociological applications of big data still resort to conventional regres-
sion approaches rather than exploring novel features (e.g., fancy visualization or
machine-learning algorithms), the potential of this contribution has been largely
unrealized. This makes these applications appear to be trapped between conven-
tional regressions on survey samples and some brand new paradigm fully resting
on big data, even though the main explanatory or dependent variables are indeed
constructed from novel sources. Put simply, proponents of big data see this kind
of application as not a genuine big data approach, while followers of conventional
methods regard it as a valid alternative.
We argue that although this type of research can indeed be seen as a hybrid,
mingling conventional regression with new variables derived from big data, it is
not a compromise. Rather, it offers a new method for empirically testing various
hypotheses, especially macro-level theories, which are often hard to examine sta-
tistically. The advantage of big data in these studies is striking, because sampled
data from conventional survey techniques is not capable of providing valid meas-
ures for core social variables of interest (e.g., people’s class concerns or exposure
to suicide in books), which are necessary for refecting grand arguments and clas-
sical theories at the macro-level.
Big data and quantitative macrosociology 13
What defnes a theory-guided quantitative macrosociology? We argue that it
is a type of sociological research focusing on macro-level social process and phe-
nomena using quantitative models to statistically test for potential social asso-
ciations and causality suggested by a clearly hypothesized social theory. Three
features defne this type of research. First, the aim of this genre of quantitative
macrosociology goes beyond using big data to describe a social process or identify
a certain structure or trend lurking within it. Rather, theory-guided quantitative
macrosociology seeks to verify proposed sociological hypotheses by examining
how a social process unfolds and is shaped by an array of macro-social factors. In
this line of research, formal statistical models are ft either to samples or to the
population. For samples, results can be generalized to the population as long as
samples are representative. For populations, there is no need for statistical infer-
ence to generalize the fndings. Theory-guided quantitative macrosociology is
differentiated from data-mining approaches since it builds on preexisting theo-
ries; it must be performed with guidance from the theory, and it is always pre-
sented in testing hypotheses to answer “how” and “why” questions rather than
“what” questions (Luo et al., 2019; Mayer-Schönberger & Cukier, 2014). As a
result, it does more than chart the structure and trends of social milieu.
Second, the analytical level of this genre of quantitative macrosociology is
one of aggregation—at the global, societal, and national levels—and is larger
than the individual level. As a result, it deals with themes linking one aggrega-
tion factor to another. For instance, to explore whether and how happiness in
developed countries is associated with exposure to ambient air quality, a quan-
titative microsociology approach would use individual samples from developed
countries with questionnaires surveying level of happiness for each respond-
ent and air quality measurements in his or her neighborhood or county. The
model, therefore, should be multilevel, predicting each individual’s subjective
well-being. In contrast, a quantitative microsociology perspective would tend to
focus on city-level or nation-level correlations, treating cities, nations, or even the
whole developed world as analytical units. When studying a city-level association,
for example, panel data could be obtained from average indicators of emotional
status extracted from online content for each city using textual analysis, and aver-
age citywide PM2.5 levels from online archived weather information from relevant
websites. To assess the panorama, time series of daily emotional state and air
quality data could also be used and analyzed. It is easy to see that with a limited
number of analytical units, aggregated time series or panel data are the main for-
mations of macro-data.
Third, the key variables constructed from big data for this genre of quantitative
macrosociology cannot be obtained from conventional methods, including sam-
ple surveys or in-depth interviews. Specifcally, there are three types of variables
that are hard for conventional methods to access: (1) social indicators of large
temporal or spatial scale, such as public concerns about social class in the United
States over the twentieth century (Chen & Yan, 2016a) or suicides in American
books in the second half of twentieth century (Chen, Yan & et al., 2020); (2)
social indicators at regional levels that are unavailable in regional statistics books,
14 Big data and quantitative macrosociology
such as the international reputation of a Chinese province in the Western world
(Chen, Yan & Zhang, 2017; Chen & Yan, 2018); and (3) social indicators at the
aggregate level that people often decline to report in conventional social surveys,
such as ideological preference in a region (Li & Yan, 2020) or demands for AIDS
medication in Chinese provinces (He et al., 2018). This is especially important
for sociological studies in authoritarian states and developing societies, where key
information for social studies is subject to suppression or lack of offcial records
and administration.
Theory-oriented quantitative macrosociology can be expected to change soci-
ology in fundamental ways, helping sociologists better understand the relation-
ships among macro-social indicators and inspiring new macro-level theories. By
embracing the paradigm of theory testing with the help of statistical inference
and data analytics, macrosociology is made more empirical and scientifc. By
deploying theory-oriented quantitative macrosociology where it can best assure
macro-level robustness and reliability in sociology research, big data applications
become more relevant to and guided by sociological theory. And by introduc-
ing big data–related approaches and tools, as well as underutilized models and
presentation forms such as time series regression, panel analysis, and visualization
tools, theory-oriented quantitative macrosociology makes a vital methodological
contribution to sociology.

Causality, ecological fallacy, and macrosociology


rejuvenation
A typical quantitative sociological analysis normally takes the form of model
regression on samples, or more specifcally, hypothesis-testing using appropriate
statistical models on individual samples from a designed survey. In addition, with
the introduction of counterfactual frameworks and concerns about endogeneity,
quantitative sociologists are turning more frequently to sophisticated regression
methods such as panel analysis, instrumental variables, propensity score match-
ing, regression discontinuity, and difference in difference (Burrows & Savage,
2014; Mouw & Verdery, 2012; Tinati et al., 2014; Winship & Morgan, 1999).
We believe that big data, especially big data at the aggregate level, can make three
signifcant contributions to these approaches and to our understanding of causal-
ity in sociological analysis.
First, the sheer depth and uniqueness of information available through macro-
level big data enable seemingly naïve quantitative vehicles (e.g., descriptive tables,
graphs, maps) to open new paths of sociological inquiry, uncovering hidden social
structures, correlations, and intriguing trends and offering candidate interpreta-
tions. As a result, big data applications—in particular, theory-guided quantitative
macrosociology—have the capacity to uncover and inspire new causality-bearing
sociological inquiries.
Second, robust and scientifc explanations—even causal interpretations—can
be carefully derived through a skillful weaving of varied data, recurring simple
statistical patterns (e.g., association regression), and rich theoretical awareness
Big data and quantitative macrosociology 15
using the symphonic approach described by Halford and Savage (2017), just like
the mode of induction and deduction seen in Piketty (2013), Putnam (2000),
and Wilkinson and Pickett (2011). In this regard, big data analytics in quantita-
tive sociology can also be causality-oriented and make powerful arguments about
social change by harnessing diverse data and repeating simple correlation analysis
within a clear theoretical framework. In fact, for both data assemblage and big
data, correlation just “displaces rather than replaces causality, and the weight of
causal claims is shifted from inferential statistics to sociological concepts and the-
ories which link together recurring motifs into a symphonic narrative” (Halford
& Savage, 2017, p. 8). Taking the above two points together, the strength of big
data for quantitative sociology lies in its combination of large size, macro-level
origin, and innate ability to allow repurposing of each part of its immense data
sources. By commanding panorama-level arguments using big data, sociologists
can anticipate an exciting renewal of the sociological imagination, which has long
been lacking in quantitative analysis.
Third, if macro-level analysis using big data can be integrated with micro-level
analysis using survey data, sociologists can help establish causality by addressing
the long-standing question of ecologic fallacy, which refers mainly to the seeming
paradox that relationships between two social indicators can be different if they
are observed at micro- and macro-levels. More precisely, for quantitative sociol-
ogy, a robust causal claim can only be achieved by simultaneously examining the
potential causal relationship at both the micro- and macro-level. This is because
causal relations at the individual level must have a result at the macro-level,
whether through emergence or transition. For example, to examine whether the
role of religiosity in improving happiness is causal, we must frst turn to regression
analysis at the micro-level by using surveyed samples to establish individual-level
causality. In this case, causal-inference methodology such as instrumental vari-
ables or difference-in-difference method can be used to help identifying causality.
To reinforce the causal inference regarding the link between religion and happi-
ness, one can further resort to big data analysis at the macro-level for associat-
ing average city-level happiness with average city religiosity, two aggregate-level
factors that can be measured using big data analytics, such as by extracting key
information from online platforms. A city-level panel analysis, therefore, could
help establish causality at the city level. Once the relation can be verifed causally
at both the individual and the city level, we can be more confdent about the
robustness of this relationship. If the relation cannot be verifed causally, we can
then dig deeper to see whether there is ecological fallacy, which also helps soci-
ologists to better understand the association being investigated.

Understanding changing China through big data


China has always been an exciting research site for sociological inquiry, given its
vast area and the population, and unique political and cultural context. However,
the quality of the published statistics might come with a large question mark in
China. Statistics on key issues are collected by the local bureau of statistics at
16 Big data and quantitative macrosociology
different levels of the government. Errors and exaggerations of the data would
therefore be aggregated at the national level. Besides, the bureau of statistics
tends to focus on collecting economic data, leaving data of social and cultural
indicators of interest often unavailable from the offcial platform (for instance,
none of the offcial statistics are available regarding local people’s happiness, trust,
depression, suicide, or guanxi networks). Consequently, the lack of reliable sta-
tistical data would not only lead to some embarrassing misinterpretations of the
social problem of Chinese society, it would also hurt the capacity of China’s social
elite to obtain an unbiased view of what was going on in the country (Heimer &
Thogerson, 2006).
When discussing the Chinese cases based on big data analytics, current
works normally concentrate on one specifc area or topic, such as urban plan-
ning, regional development, and scientifc advancement. For example, Li (2017)
uses big data and relevant data mining technologies to review urban planning
in China. Wu and Wang (2020) examine geographical implications of mobility,
well-being, and development within and across Chinese cities through location-
based big data perspectives.
In this book, we exploit the richness of big data to investigate social processes
and cultural practices in various areas in China, ranging from traditional focuses
of sociology, such as social stratifcation, ideological transformation, cultural dis-
course, social networks, and social mobility, to expanded spheres of social transac-
tions and processes, such as public health, online behaviors, and the development
of scientifc knowledge. Each chapter highlights an application of theory-guided
quantitative macrosociology that has the potential to reinvigorate the ambitious,
open-minded, and bold research culture that had driven sociology since the mid-
dle decades of the twentieth century.
In particular, Part II investigates how the massive culturewide content analy-
sis using data of unprecedented size can help unpack the changing patterns of
social stratifcation and people’s conception of social class. Specifcally, this sec-
tion includes four chapters. Chapter 2 examines the changing modes of social
stratifcation and their infuence on Chinese public discourse from 1949 to 2008.
Chapter 3 explores the growing public concerns about social immobility from
2008 to 2012. Chapter 4 further investigates the relationship between public
concerns over class solidifcation and individuals’ perception of their own class
mobility in contemporary China. Chapter 5 provides the frst large-scale portrait
of revolutionary nostalgia among the Chinese, undertaking an empirical analysis
of how the aggregate level of nostalgia is stratifed among the provinces.
Part III investigates how the big data analytics can help explain social trans-
formations and cultural changes at a macro-level and over a long-term temporal
and spatial scale. This section includes four chapters. Chapters 6 and 7 mine the
Google Books N-gram corpus to examine the historical formation of global fame
of cities in China and how such global fame infuences the investment decisions
of foreign frms with spatial and temporal dynamics. Chapter 8 studies the extent
to which tourists are culturally familiar with a given destination would have a sig-
nifcant impact on inbound tourism to China. Chapter 9 constructs the coauthor
Big data and quantitative macrosociology 17
networks in various subfelds in humanities and social sciences in China from
2007 to 2017.
Part IV investigates how the big data analytics help unpack long-term tempo-
ral links between real-world events and aggregate social phenomena, including
public health, community wellness, and gender issues. Specifcally, this section
includes four chapters. Chapter 10 examines how environmental risks such as
PM2.5 exposure have affected the local weekly index of suicidal ideation in China.
Chapter 11 uses online query volume data to predict the prevalence and incidence
of HIV/AIDS in China. Employing the same method, Chapter 12 explores the
vaping epidemic and public favorites in e-cigarette products across the differ-
ent provinces in China. Chapter 13 further presents the frst representative and
longitudinal portrait of public interest in LGBT-related issues across China from
2009 through 2015.
Overall, we believe that big data applications are of special importance in stud-
ying authoritarian states such as China, since they help researchers access new and
valuable information that is otherwise unavailable or diffcult to uncover when
using conventional approaches in a constrained and censored context.

Conclusion
We are aware that big data approaches alone, just like statistical methods alone,
will not make the most of these data; our call for greater engagement with both
computational technology and sociological expertise, following Halford and
Savage (2017), is based on this awareness. Yet, we also know that neither socio-
logical imagination nor theoretical awareness alone is suffcient for studying social
processes in this digital era of big data. By engaging quantitative macrosociology
and big data aesthetics, we hope to build a credible and robust voice to advocate
for big data analytics. Meanwhile, theory-guided quantitative macrosociology is
certain to allow sociologists to fulfll the promise of big data for sociology with-
out compromising the tradition of and commitment to theoretical imagination,
critical thinking, and rigorous methodology.
Although theory-guided quantitative macrosociology has its limits, it also pro-
vides a feasible path for empirical sociologists to use big data to link theory, data,
and computational approaches in a way that is both convenient and familiar,
and for which researchers have been specifcally trained. Furthermore, we envi-
sion macrosociology as an audacious move into big data and new methods. This
move has ontological and epistemological implications: if we change the source
of information and the tools for analysis, we change the object of knowledge
(Boyd & Crawford, 2012). Such a change makes sociology both more robust
within its social science context and more infuential beyond the academy.
Furthermore, we stress that the unfamiliar and unconventional protocols of
data production inherent in big data analytics cannot be excuses for rejecting
big data. Echoing Halford and Savage (2017), we must resist the tendency to be
complacent and conservative, resting on stale late twentieth-century relationships
between data, method, and theory. Rather, we must recognize that big data is a
18 Big data and quantitative macrosociology
new gateway to a powerful, multidimensional, and multilevel new integration of
theory, data, and method. It is not the bigness of big data that holds its great-
est potential, but rather the unfolding revolution in sociological thinking and
imagination that is unleashed by big data that holds the most promise for new
understandings of social process and context.
Still, we should note that while we confrm the great potential of big data
applications in macrosociology, big data analytic research in the feld is still greatly
hampered by its relative inaccessibility to the very social scientists for whom the
tool would be most useful; in response to big data analytics’ bewildering array of
fast-changing computational technologies and its reliance on non-human appa-
ratuses, these researchers tend to reject the new approach in favor of more con-
ventional methods, such as manually coding small-scale archives or performing
regressions. These barriers have limited big data’s proliferation to a small circle
of scholars able to navigate big data’s complexities. In this regard, one of the
preconditions for acceptance of big data analytics in sociological research is the
lowering of this threshold of accessibility such that even a student with general
training of sociological methodology could utilize it. This obviously involves the
availability of more powerful personal computers, more user-friendly compu-
tational software with extensive analytical tools, and more structured or even
second-hand big data. Finding a middle path between adherence to traditional
approaches to sociological research and adoption of the new resource of big data
is critical.
Part II

Mapping public discourse


and social stratifcation
2 Social stratifcation as a public
discourse in China, 1949–2008

Introduction
Since the opening-up reforms of 1978, the vicissitudes of social structure have
introduced numerous challenges and problems, thereby propelling the study of
China’s social stratifcation. During the preliminary stage, researchers focused
on the description and analysis of the objective stratum framework, structural
characteristics, and fow mechanisms (Li, 1993). However, since the late 1990s,
stratum consciousness has become an important area of study. Several scholars
have attempted to further examine the infuence of macro-social structure on
microlevels by reviewing the subjective comments of individuals or groups on
their social and economic status (Chen & Fan, 2015). These studies have thor-
oughly investigated the changing track of China’s social structure, the evolu-
tion of interest relationships, and the underlying structural logic before and after
1978. They have also created a new dimension of research analysis focusing on
social structural changes from the perspective of individual cognition.
However, room remains for the study of social stratifcation from a subjective
perspective. First, due to limited historical data, current literature on stratum
consciousness lacks a panoramic description of social stratum consciousness over
time and has only collectively analyzed individuals’ recent stratum positioning
in market transformation terms. Second, explanations of the changing form and
mechanisms of stratum consciousness mainly derive from individual factors, such
as the objective, relative, and evolving socioeconomic status (SES). Although
some recent literature has begun to debate the association between subjective
strata and income inequality (Chen & Fan, 2016), the discussion of macro-fac-
tors is not full-fedged. Third, previous studies of stratum consciousness have
investigated how individual citizens understand their own and others’ socioeco-
nomic status but have failed to identify the origins of this cognition and how it
emerged as a discourse defning social structure.
In fact, before and after the 1978 opening-up reforms, the nature of China’s
social structure in public discourse had transformed from “class” to “stratum.”
This change not only related to adjustments in China’s political and economic
system but also located a power shift in the discourse of social structure between
state will and conventional wisdom in the context of systemic transformation.
22 Social stratifcation as a public discourse
Although several political scientists have discussed this issue, its signifcance in
the study of social stratifcation has not been fully investigated or verifed with
solid evidence.
This chapter will extend previous studies on stratum consciousness to over-
come this defciency by raising two fundamental questions, both of which we
will review: frst, we explore whether the defnition of social stratifcation in pub-
lic discourse has shifted from “class” to “stratum” since 1949 by asking what
roles the state and the public have played during this shift in discourse from a
historical perspective. Second, we ask what the intrinsic connection is between
the macro-political and economic effects of structural reform and the shift in
discourse about social structure. The academic answers to these questions must
frst guarantee that the analytical data is in scale and well-represented in time and
space. Additionally, the explanatory framework must be optimized based on the
unique societal transformation in China. By interrogating big data using analytic
logic, we answer these questions and offer an explanation based on the Chinese
experience, in order to introduce macro-level cases to the social stratifcation
literature.

Literature review
During the early stage of stratum consciousness studies, researchers attempted to
depict the overall characteristics of stratum structure through individuals’ cog-
nition of their social status. Many empirical studies, whether from developed
countries in Europe and America or from Eastern Europe and South Asia, indi-
cated that most people have a clear consciousness of their “stratum” (Jackman &
Jackman, 1983; Evans & Kelley, 2004; Shirahase, 2010). When asked about the
potential infuence of their social and economic backgrounds, most people tend
to consider themselves as members of the middle class (Evans, Kelley & Kolosi,
1992). However, when studying the stratum consciousness of Chinese people,
domestic scholars found that Chinese stratum positioning is apparently lower
than that of European countries and the United States (Chen & Fan, 2016).
Moreover, there is a greater cognitive discrepancy between people’s objective
social and economic status and their subjective stratum positioning in both urban
and rural areas (Chen & Fan, 2015).
When investigating how the mechanisms underpinning subjective stratum
consciousness form, researchers have produced solid evidence from three per-
spectives. First, an individual’s actual social and economic resources will deci-
sively infuence their cognition of stratum positioning, which is indicated by
different objective status indicators, such as education, income, and occupation
(Hodge & Treiman, 1968). Next, an individual’s stratum positioning is also
affected by subjective factors. For example, a study of Chinese cities indicated
that apart from party membership, educational background, income, housing,
property, and other objective social and economic factors, the sense of equality,
survival anxiety, and social mobility also impact an individual’s stratum identifca-
tion (Weng, 2010; Chen & Fan, 2015). Finally, some macro-factors, including
Social stratifcation as a public discourse 23
income inequality, appear to exert a negative infuence on an individual’s stratum
identifcation (Chen & Fan, 2016).
However, the studies mentioned above remain incomplete. First, the former
analytic target of stratum consciousness study focuses mainly on the individual.
Although experiences and results were derived from a national investigation,
defects in the sample survey make it diffcult to explain to the general public.
Moreover, most domestic scholars in this area have concentrated on post-reform
studies. Their historical analysis spans a brief period of one to ten years; there-
fore, a historical view of either individuals’ or the general public’s stratum con-
sciousness before the reform cannot be verifed. Second, when explaining the
vicissitudes of stratum consciousness, domestic researchers primarily adopt a
micro-theoretical sociological paradigm that emphasizes individual perceptions of
social and economic status, subjective psychology and attitudes, and the function
of comparison with others. However, the repercussions of macro-structural fac-
tors on stratum positioning have been overlooked (Chen & Fan, 2016). Overseas
empirical studies in recent years have indicated that the formation of people’s
stratum consciousness has profound social and economic origins. Certain mac-
roeconomic indicators (GNP and unemployment rate), including the degree of
social inequality and guidance of public opinion, have signifcant impacts on indi-
vidual stratum positioning (Andersen & Curtis, 2012; Curtis, 2015).
Based on this literature review and comments on the history of stratum con-
sciousness, this study conducted subjective social stratifcation research from a
discourse construction perspective. We analyzed the vicissitudes of defnitions
of social structure in public discourse from a macro-historical perspective since
the establishment of the People’s Republic of China. Emphasis was laid on the
signifcant role that state and public attitudes have played in the formation of
social stratifcation discourse against the backdrop of institutional transformation.
Consequently, we can see how opening-up reforms have greatly affected China’s
national development.

Theoretical background
The institutional reforms that began in 1978 pushed China into a new era char-
acterized by dramatic change, and the social structure has diversifed rapidly
accordingly.
First and foremost, the most signifcant fruit of China’s market-oriented trans-
formation has been the stable and high-speed growth of China’s economy for
more than three decades. Due to incentives provided by the macroeconomic
boom and diversifed economic structures, occupational status that stresses the
individual has gradually become the mechanism of social division. Consequently,
access to social and economic resources has widened, leaving the stratum struc-
ture to rapidly divide following the opening-up reforms. This change has not only
led to distinctions in living standards and lifestyles for different strata but also
delimited the public in terms of values and emotions. The most typical example is
the sudden emergence of the middle class after the reform (Zhou, 2002).
24 Social stratifcation as a public discourse
Second, China’s political discourse has also evolved along with the support for
institutional reform and market-oriented transformation. On the one hand, the
state and political life before the reform were run by a system of political mobiliza-
tion constructed by class discourse, meaning most citizens were involved in political
practice. Economy, culture, ideology, and other realms were all affected by politi-
cal guidance, with “class struggle” at the core (Guo, 2003). However, as election
and community self-governance systems were introduced into China’s rural and
urban communities, members of the public were increasingly drawn into politics
(Hu, 2008). On the other hand, the repercussions of the political restructuring of
public ideology have manifested more in changes in the mainstream guidance of
public opinion. Before 1978, class-related social issues were the foundation of the
general public’s daily life (Zhang, 2004). Although China has embraced a new era
of political restructuring since 1978, the transformation of its ideological system
retained the original authoritative system and cultural resources. The guidance of
public opinion during this period was planned and dynamically regulated, in real
time, in accordance with specifc political and economic changes. Consequently,
its direction switched constantly between reform and stability.
Overall, during the period of social transition in China, institutional transfor-
mation can be presumed to have radically altered the methods used to defne the
social structure in public discourse, which is also closely associated with macro-
economic development, income inequality, political involvement, and the guid-
ance of public opinion. This study investigated these issues empirically.
Historical big data were analyzed to illustrate the shifting trajectory of two
defning social stratifcation discourse patterns in Chinese society: class and stra-
tum. We aimed to identify the varying repercussions of state will and public
attitudes on the changing social structure discourse. On that basis, we analyzed
long-term microdata in the context of China’s institutional transformation
to identify the causal connections over time and the macro-structural factors
affecting views of social stratifcation. This was the frst time Chinese social
scientists had conducted an econometric model regression analysis of this issue
using big data.

Data, variables, and analysis strategies


This study made use of Google N-gram Corpora as the data source to analyze the
defnition of social structure in public discourse. In Table 2.1, we present class
categories and 20 related search keywords. Two prerequisites were necessary to
determine the specifc vocabulary: (1) the lexicons of class and stratifcation as
extracted from Google N-gram Corpora either refect the public’s concerns with
these two issues or simply reference a collection of academic texts pertaining to
the political or social sciences; (2) a representative issue of vocabulary. In other
words, could a small number of occupations fully represent China’s social restruc-
turing since the opening-up reforms?
Therefore, we attempted to compile an etymological vocabulary for this chap-
ter. To choose vocabulary, we referenced four books: a professional dictionary
Table 2.1 Statistics indicating vocabulary about class and stratum in Google N-gram (simplifed Chinese) (1949–2008)

Keywords of class Statistics Keywords of stratum Statistics


Mean Std. Dev. Coeffcient Mean Std. dev. Coeffcient

Class struggle 9.465 12.220 1.291 Social status 0.965 0.433 0.449
Class oppression 0.320 0.334 1.041 Stratum consciousness 0.000 0.001 .1.308
Class status 0.199 0.208 1.045 Social stratifcation 0.000 0.001 1.956
Class line 0.160 0.156 9.767 × 107 Stratum cognition 0.002 0.004 2.521
Class dictatorship 6.947 11.120 1.601 Stratum identifcation 0.001 0.001 1.331
Anti-revolution 86.380 46.260 0.535 Stratum isolation 0.000 0.000 1.894
Revolution 6.220 6.373 1.025 Stratum confict 0.001 0.002 2.238
Rectifcation 7.027 6.308 0.898 Elite class 0.012 0.022 1.835
Left deviation 141.700 103.300 0.7288 Middle class 0.035 0.067 1.907
Right deviation 1.154 0.558 0.484 Poverty class 0.015 0.020 1.321
Proletariat 0.533 0.237 0.445 Executive 0.018 0.022 1.212
Working class 0.153 0.152 0.994 Blue collar 0.009 0.011 1.136
The masses 0.497 0.428 0.862 White collar 0.065 0.083 1.269
Leader 0.585 0.411 0.703 Manager 4.325 3.648 0.843
Right wing 60.330 34.280 0.568 Public servant 1.419 1.620 1.142
Capitalist 3.552 2.418 0.681 Scholar 8.338 5.875 0.705
Landlord 4.328 3.730 0.862 Peasant worker 0.678 1.590 2.346
Rich peasant 0.113 0.057 0.502 Entrepreneur 0.000 0.000 1.654
Poor peasant 0.533 0.408 0.765 Private entrepreneur 0.584 1.152 1.974
Middle peasant 0.276 0.168 0.608 Clerk 1.207 0.585 0.485
Note: The mean value and standard deviation of word frequency ratio is multiplied by a factor of 10,000 for readability.
Social stratifcation as a public discourse 25
26 Social stratifcation as a public discourse
(A Dictionary of Sociology, edited by John Scott and Gordon Marshall) and three
textbooks (Sociology, by Anthony Giddens; History of Foreign Sociology, third
edition, by Jia Chunzeng; and The Summary of the Famous Works on Western
Sociology, by Xie Lizhong). We also included an important investigative report
on social stratifcation (Research Report on Social Stratifcation in Contemporary
China, by Lu Xueyi) and a media source that refected the offcial position of the
state and public opinion (People’s Daily). In Figure 2.1, we present the descriptive
statistics generated by an analysis of vocabulary about different classes and strati-
fcations. As can be seen, some words with Chinese native characteristics (such as
peasant worker) account for much more than professional words (such as stratum
consciousness) in the corpora, which indicates that the selected vocabulary in this
chapter represents the public rather than the academic bias of professional books.
Next, we must emphasize that the focus of our study was the change in class
and stratum discourse before and since reform. Therefore, we paid special atten-
tion to vocabulary that indicated dramatic changes in the stratifcation structure
during those periods. For example, during rural economic reform and urbaniza-
tion, peasant workers emerged in large numbers in Chinese society as a historical,
post-reform phenomenon. Moreover, these stratum words were highly repre-
sentative, not only because they were frequently and repeatedly used in our four
vocabulary sources (professional dictionary, textbooks, professional investigation
reports, and news press), but also because they comprised the basic characteristics

Figure 2.1 Changing trends of social class concern by macroeconomics, income


inequality, political participation, and public opinion. LC: Literary
References to Class; PDI: Participatory Democracy Index; IO: Index of
Public Opinions.
Social stratifcation as a public discourse 27
of China’s occupations across all felds since the reform. Moreover, the PCA
(principal component analysis) of stratum vocabulary in Figure 2.1 and high
Kaiser–Meyer–Olkin (KMO) values also indicated that additional vocabulary
would not necessarily infuence the basic conclusion of this chapter.
We adopted the word-frequency method to compare time sequence data—
that is, the frequency of selected keywords pertaining to class or stratum in the
sample books versus the gross vocabulary in those same books in any year from
1949 to 2008. This meant that the higher the ratio of a certain keyword, the
more the attention it received in public social stratifcation discourse. See Figure
2.1 for the word frequency results.
Using the Google Book N-gram corpus, we construct an index of “literary
references to class” (hereafter LC) by tracking frequencies per year of a set of
class keywords in the dataset. The analysis results are presented in Table 2.2.
The Kaiser–Meyer–Olkin (KMO) and squared multiple correlations (SMC)
results both indicated that the selected vocabulary data were applicable to PCA.
According to the capacity, fag value, and accumulating contribution rate of the

Table 2.2 PCA result of stratum vocabulary

Component 1 Component 2

Eigenvalue 16.82089 2.03105


Cumulative variance contribution 0.8410 0.9426
Class-related vocabulary [in KMO SMC
Chinese]
Social status 0.6763 0.5129 0.8009 0.9968
Class consciousness 0.9867 0.0289 0.8353 0.9992
Social stratifcation 0.9288 −0.3595 0.7885 0.9994
Class cognition 0.9136 −0.3800 0.8315 0.9997
Class identity 0.9825 0.0197 0.8953 0.9988
Class isolation 0.9121 −0.3615 0.8474 0.9985
Class confict 0.9110 −0.3728 0.7991 0.9998
Elite class 0.9800 −0.1888 0.8467 0.9999
Middle class 0.9361 −0.3417 0.8614 0.9993
Poverty class 0.9385 0.1363 0.9048 0.9997
Executive 0.9232 0.3391 0.8211 0.9997
Blue collar 0.9427 0.2645 0.8796 0.9989
White collar 0.9794 0.0421 0.8808 0.9999
Manager 0.8264 0.5191 0.8699 0.9991
Public servant 0.8594 0.4329 0.8975 0.9967
Scholar 0.9898 0.1163 0.8981 0.9998
Peasant worker 0.8858 −0.4202 0.8655 0.9998
Entrepreneur 0.9783 0.0436 0.9342 0.9997
Private entrepreneur 0.9528 −0.1939 0.8900 0.9997
Clerk 0.7728 0.4234 0.8648 0.9982
28 Social stratifcation as a public discourse
explained variance of the principal component, we extracted two principal com-
ponents from 20 stratum words to synthesize LC.
We created an explanatory framework from three perspectives using other
control variables—namely market-oriented transformation, political participa-
tion, and the innovative guidance of public opinion—to measure the variables
and construct indicators empirically. Based on the positive market-oriented trans-
formation results, we estimated the overall economic trends from 1978 to 2008
using the gross domestic product (GDP) (USD) data promulgated by the World
Bank. Considering the infuence of price changes, we converted the data into
comparable prices (the comparable price in 1978 was obtained by revised con-
sumer price index) to contrast the economic aggregate indicators over different
periods. These indicators are shown in “GCPcp.”
To estimate the income inequality incurred by market-oriented reform, we
then applied the mainstream calculation method, using the Gini coeffcient as
the indicator. However, the present data in China is incomplete. The offcial data
from 2003 to 2015 released by the National Bureau of Statistics is complete, but
data from other years can only be found sporadically in the statistical yearbook.
Therefore, we used the World Income Inequality Database Version 3.3 to sup-
plement missing data, here called the GINI index.
Next, to estimate the political participation of the Chinese public during 1978–
2008, we used the Participatory Democracy Index in Varieties of Democracy
Version 6.21 called the PDI index. The value interval of this index is between 0
and 1, with 1 standing for the highest political participation and 0 for the lowest.
Finally, as mentioned above, changes in the national guidance of public opin-
ion since the opening-up reforms basically revolved around reform and stability.
Therefore, we calculated the number of annual articles and reports from 1978
to 2008, with reform or stability in the titles, employing the “Full Text Retrieval
System for the People’s Daily,” and rendering the difference in value between the
two as the index of change in the national guidance of public opinion, namely
IO. If this variable was positive, it indicated that the offcial guidance of public
opinion for that year was prone to reform; otherwise, a negative value indicated
that stability prevailed.
Figure 2.1 presents the changing historical trend in the “Literary References
to Class” index and the foregoing macro-indices from 1978 to 2008.2 In gen-
eral, “Literary References to Class,” economic growth, and the Gini coeffcient
all indicated steady growth, whereas the curve indicating the change in national
guidance of public opinion fuctuated dramatically. During the 1980s, reports
on reform prevailed in offcial news media. During the 1990s, however, the dif-
ference between reform-oriented guidance and stability-oriented guidance was
slight, and both began to fuctuate frequently.
The data analysis in this chapter is divided into two parts. The frst part selected
the vocabulary about class and stratum from 1949 to 2008 from Google N-gram
Corpora, estimated the annual word frequency, and visualized the total. We
focused on the changes in the two vocabulary types before and after the critical
historical milestone of the opening-up reforms in 1978 to show the vicissitudes
Social stratifcation as a public discourse 29
and trajectories of the two concepts of social stratifcation since 1949. The second
part explores the infuence of “Literary References to Class” changes on public
discourse after 1978 using time sequence regression.

Results
(I) Historical vicissitudes of class and stratum in public discourse
(1949–2008)
We calculated the total value of the annual word frequency ratio in class- and stra-
tum-related vocabulary. The totaling of original word frequency (Figure 2.2, left)
and the standardized value of total word frequency (Figure 2.2, right) both indi-
cated that class-related vocabulary rose rapidly from 1949 to 1976 but plunged
soon after. Then, the trajectory of stratum-related vocabulary steadily grew after
1978. We can see in Figure 2.2 (right) that by the end of the 1950s, the total
proportion of class-related vocabulary increased rapidly in books, reaching a peak
in the mid-1970s, while stratum-related vocabulary bottomed out over the same
period. Since the 1980s, the status of these book vocabularies reversed. It is
noteworthy that in the twenty-frst century, especially since 2002, the attention
paid to stratum issues has jumped.3 Some specifc stratum defnitions—scholar,
peasant worker, executive, white collar, and public servant—have shown dramatic
growth.
These results may answer the frst question at the beginning of this chapter,
which is that from 1949 to 2008, the defnition of social stratifcation in China’s
public discourse has transformed signifcantly from class to stratum, refecting the
weakening of the state will in shaping and controlling the discursive system. The
public’s needs and attitudes have become a more important force in the construc-
tion of new discourse. To be specifc, defnitions of social structure in Chinese
public discourse since 1949 indicate that class-related discourse before reform
refected the powerful position of offcial ideology, while the shift to stratum after
reform indicated that public attitudes had become the main driver of discourse

Figure 2.2 The total value of the annual word frequency ratio in class- and stratum-
related vocabulary. Solid line: class; dotted line: stratum.
30 Social stratifcation as a public discourse
construction. Moreover, this transformation has been highly associated with the
critical historical event of opening-up reforms of 1978. Nevertheless, this descrip-
tion is of no empirical signifcance.
Next, we aimed to verify the rules by which macro-changes functioned.

(II) Causal connection between “Literary References to Class” and


macro-structural factors
In this chapter, we used the augmented Dickey-Fuller and Philips-Peron tests
(PP test) to conduct unit root tests on all variables. The results indicated that
LC, GDPcp, GINI, and IO were integrated in order of time sequence. PDI had a
stable time sequence.4 To interpret the Granger causality result, we adopted a dif-
ferent method to calculate PDI. In other words, we endeavored to fnd the con-
nection between changes in GDPcp, GINI, PDI, IO, and LC. To guarantee the
stability of results under multiple variable conditions, in this chapter we adopted
the Granger causality test. Therefore, when we analyzed the causal connection
between “Literary References to Class” and a certain variant, other variants were
included in the analysis as control variables.
Table 2.3 presents the detailed results of the Granger causality test, yielding
fve observations:

1) The change in GDPcp, GINI, and PDI can all be explained by the time-
sequenced change in LC (p < 0.05). Since these variables were included in
the model in the form of initial differences, the growth in GDP since the
previous year, the increase in income inequality, and the rise of public politi-
cal participation all could explain the growth in public attention to stratum
over the following year.
2) In terms of statistical signifcance, the infuence of income inequality (GINI)
on public LC (p < 0.01) was apparently greater than macroeconomic devel-
opment (GDPcp) (p < 0.05).

Table 2.3 Granger causality test (1978–2008)

Null hypothesis N Chi2 p-Value

fd_GDPcp is not the Granger cause of fd_LC 31 11.382** 0.023


fd_GINI is not the Granger cause of fd_LC 31 34.596*** 0.000
fd_PDI is not the Granger cause of fd_LC 31 26.347*** 0.000
fd_IO is not the Granger cause of fd_LC 31 1.790 0.774
fd_LC is not the Granger cause of fd_GDPcp 31 36.731*** 0.000
fd_LC is not the Granger cause of fd_GINI 31 20.26*** 0.000
fd_LC is not the Granger cause of fd_PDI 31 41.042*** 0.000
fd_LC is not the Granger cause of fd_IO 31 13.182*** 0.000
Notes: (i) “fd” means frst difference. (ii) According to the information criteria AIC,
SBIC, and HQIC, we select four orders lag. (iii) *p < 0.1, **p < 0.05, ***p < 0.01.
Another random document with
no related content on Scribd:
137 138 139
140 141 142 143

[98]

HORIZONTAL

Phrygian
1 shepherd Move 78
changed into a fir tree Dissipate
79
God of5 love Pert.81
to the eyes
Parched
9 Federations
83 of workers
Domesticates
12 Hostelries
86
Unusual
17 A lizard
87
Was18 contiguous Unit 89
Kind19of compass used Propeller
90
in surveying A pledge
92
Considerable
20 in degree Past93
Rounded
22 protuberance One94 who compiles
A drop
23 of saline fluid material for publication
Domestic
25 animal Placed
96 at intervals
Indian
26tribe Japanese
99 plant
To inspire
27 deep Implore
100
reverence Barrier
102
Bordered
29 Abaft
104
Exist31 Tree105
Small33mound of earth Males
107
Goddess
34 of the moon Well-being
109
Receptacle
37 for ashes Wrath
112
Grecian
38 sub-prefect Phœnician
114 princess
Obtain
40 carried away by Zeus
Make 41less bitter Bronze
116 or copper
Overmuch
44 Tottered
117
Pore46over Watering
121 place
One47 International
123 language
Part 48
of a circle Broke
124
A small
50 hole Tiny
126
Ventilate
53 Sharp-pointed
127 tool
Groups55 of three Vedic
129god of the sun
Trial58impression Not131
good
Loose 60end Persia
133
Web-footed,
62 tailless Pert.
135 to national sea
amphibian forces
Minor 64devil Sea137eagle
Seaman65 Brittle,
138 salty biscuit
To be 67obliged for Languish
139
Certain
68 Irascible
140
In the69direction of Be 141
conscious of
Writer71of a kind of Those
142 snaky fish
plaintive verse A donkey
143
To seek
74 something
lofty
Floating
76 vegetation in
the Nile
Vapid 77

VERTICAL

Hundred-eyed
1 son of Once66more
Zeus A small
68 shoot or twig
Acid 2 Regret
70
Instruments
3 for Edge72
preserving the form of A high
73 wave
shoes Wickedness
75
An affirmative
4 The 79
moon
Took its
6 way One80who meanly
Wind 7instrument admires wealth or
Quiet 8 social position
Heedful
9 One81of a tribe of
Long,10slender plant Algonquin Indians
Mountain
11 near ancient Idler82
Troy Solemn84 affirmation
Flurry
13 To shut
85 close
Merriment
14 Falsehood
88
Comfort
15 Deed 91
Smooth
16 A nymph
94 who was
Ancient
21 coin mentioned changed into a fountain
in the Bible Metal-bearing
95 rock
Immediate
22 knowledge Equality
97 as to value
Tap 24 Make 98less light
The 25
brave That101overworked
Saturated
28 Australian bird
Enlarged
30 The103southwest wind
Rodent
32 The106papal court
Essential
35 self The108land where the
Elude36 sandman lives
To implant
38 deeply The110bay tree
Demure
39 A judicial
111 inquest
Marry42 To chill
112
A discriminating
43 sense A piece
113 of property
of hearing Merits
114
Imbecile
45 An 115
effervescent drink
Expression
46 of Possess
118
exaltation A king
119 of Israel
Intention
47 A narrow
120 passage
A female
49 quadruped Unadulterated
122
Case51for small articles A tiresome
124 person
A kind
52 of molding A large
125 plant
Smoothing
54 Staff
128
Full 56
of perforations Still130
Person
57 with a powerful Open132
voice Total
134
Noblemen
58 of high rank Imitate
136
A marked
59
characteristic
To adjust
61
Pants
63

[101]

[Contents]
Puzzle No. 94
LITTLE BUT A BAD ’UN
By Frank C. White

Mr. White must be one of those mild-seeming people


who fool you so often—at least that’s the impression
his construction gives the editor. It is little. It looks
harmless. But don’t start it if you have an appointment
two hours hence. But maybe we shouldn’t give this
away.
1 2 3 4 5 6 7 8 9
10 11 12
13 14 15 16
17 18 19 20
21 22 23 24
25 26 27 28
29 30
31 32 33 34 35 36
37 38 39 40
41 42 43 44
45 46 47 48
49
50 51

[100]

HORIZONTAL
Original
1 man of Horus An exclamation
30 of
The first
6 Jewish high surprise
priest Leisurely
31 (obs.)
A genus
11 of Am. plants Any 34
sirenian, esp. the
A member
13 of the Creek rytina
Indian tribe A pliable
37 strip of oxhide
Metal 15for casting in (So. Africa)
pigs Contrary
38 to good in a
A game17 played for moral sense
stakes A recess
40 in the shore
A valley
18 in N. W. Anything
41 steeped in
Argolis liquid
Sooner20 than A wind
42 (musical)
Track 21of a deer instrument
One23 of a tribe of A fairy
44 queen
Algonquians A genus
45 of No. Am.
In the24thing or herbs
individual A cloud
47
Permit25 (3rd per. sing.) Reprinting
49 and
(pl.) publishing of a work
A workman
27 on shoes Lamellirostral
50 birds
Denoting
29 the infinitive Am. 51
wild plums

VERTICAL

Tutor 2 Son 24
of Abraham and
A garden
3 implement Sarah (Bib.)
Monitor
4 lizard Sorrow
26
Restores
5 to existence A beverage
28
Invoke6 Burning
31
An Eastern
7 continent Be present
32 (obs. var.)
To clothe
8 in an odd Long33speech
manner
Mien 9 Slope 34(3rd per. sing.)
A town
10 in Oklahoma (pl.)
To insinuate
12 contempt Common35 (obs. var.)
Light14repast A German
36 philosopher
Absence
16 of removal Abbr.39Life Guards
(rare) Compounds
42 containing
Objective
19 case of I metal
Natural
22 kind or class, An English
43
esp. of animals mathematician
A word46 of command in
driving animals
Hoot48

[103]

[Contents]
Puzzle No. 95
MODERATION
By Porter R. Lee

Never too much of anything is the slogan of this


constructor. He gives you a puzzle of moderate
difficulty, but a set of easy words and then a terror
hidden among them like a serpent in the grass. The
puzzle runs its way at an even tenor, like an honest
man’s life.
1 2 3 4 5 6 7 8 9 10 11
12 13 14
15 16 17 18 19
20 21 22
23 24 25 26 27 28
29 30 31
32 33 34 35 36 37 38
39 40
41 42 43 44
45 46
47 48 49 50 51 52
53 54
55 56 57 58 59 60 61
62 63 64 65
66 67 68

[102]
HORIZONTAL

Snake1 Country
40 in N. America
Rainy4 Legendary
41 king
Make9hum by spinning Legal42science
Russian
12 money Cause44 of anything
Steep
13 Set again
45
Fat 14 Son 47
of Odin
Notable
15 period Something
49 to follow
For instance
16 Secondary
51
Toward
18 A root
53
Mend19 Sound54
Confronted
20 In such
55 circumstances
Lariat
22 Atop57
Speck
23 You 59
and I
Pasturage
25 Torn60piece
Impair
27 Unripe
62
Public
30passages Ventilate
63
Raves
32 Product
65 of digestion
Layer
35 Slippery
66 fish
Length
36 of the forearm Bearing
67 buds
Member
39 of cat family Influenced
68

VERTICAL

Progeny1 of one stock Man’s33nickname


Pert. to
2 us Distress
34 signal
Behind 3 Cage 36of an elevator
Edible4 drupe Hurtful
37
Ancient5 Biblical country Strain
38
Afflict6 Hires42
That 7 Fancy43
Release8 Precipitation
45
Cleansing
9 agency Suppose
46
Conduct
10 oneself Color47
toward Toothed
48 wheel
Cat 11 Sharp-pointed
49
Prophet
17 instrument
Follows
18 closely Relative
50
Discontinue
21 Precious
51 stone
Mistress
22 of a Sharpened
52
household Raw56 metal
Obscurity
24 Fade58
Source
26 of stability Beverage
61
Strong
28 appetite Exist63
Topmost
29 bone of spine Egyptian
64 god
Kind31
of eclipse
Gypsy
32

[105]

[Contents]
Puzzle No. 96
JUST FOR TWO!
By Charley Cinderclass and Stoney Lonesome

These two contributors send in Cross Words by the


dozen, so you can imagine what splendid vocabularies
they must have. This is one of their best, and it’s not
an easy one, either.
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15
16 17 18 19 20 21 22 23
24 25 26 27
28 29 30 31 32 33
34 35 36 37 38 39
40 41 42 43 44
45 46 47 48 49 50 51 52
53 54 55 56 57
58 59 60 61
62 63 64 65 66 67 68
69 70 71 72 73 74
75 76 77 78
79 80 81 82 83 84 85 86 87
88 89 90 91 92
93 94 95 96 97
98 99 100 101 102 103 104 105
106 107 108
109 110 111 112

[104]
HORIZONTAL

Tatter1 Squeeze
60
Dressed,
3 as a stone Planet
61
A color
7 Silicate
62 of magnesia
Guided
11 Note63
State13 Battering
65 engine
Danger
14 Chinese
67 coin
Murmur
15 Voter69
Jutting
16 rock Earthwork
71
Weird18 Newness
73
Select
20 set Invisible
75 fluid
Fortify
22 Dromedary
76
Jejune
24 Meadow
78
Stranded
25 Grain79
Single
27 Java82pepper
Game 28at cards Cleanser
84
Cone 29bearing tree Iniquity
86
Scrutinize
30 Bark88
Cereal
32 grain Vassal
89
Craft34 Exclude
91
Mercenary
35 Trick92
Mohammedan
37 religion Disturbance
93
Foxy39 Become
94 acclivitous
Winnow
40 Protrude
97
Darns41 Roadhouse
98
Racing
43 boat Lance99
Rhythm
45 Extreme
101
Also49 Sore104
Six angled
50 figure Bond106
Throw53 A relative
107
Wander
54 Wing-shaped
108
Cad 56 Edible
109 tuber
Desire
57 Toughen
110
Rake58 Hydrocarbon—radical
111
Plunder
59 At a112
distance

VERTICAL

Groove1 Not divine


50
Torniquet
2 Nepaul51 antelope
Compensation
3 Charge52
Unit of4 measure Tomfoolery
55
Pithy 5
phrase Vehicle
56
Ruminant
6 Heavy 62 corded pile
Stead7 fabric
Sickness
8 Large 63drinking bowl
Likely9 Zzzzzzz
64
Hint 10 Grinder
66
Grassy
11 plains Reclining
68
To fade
12 Neuralgia
70
Scent17 Tap 71
Precipitation
19 Nib 72
Poem 21 Irritate
74
True23 Mythical
76 being, half
Bovine
26 man, half horse
Coniferous
28 tree Small 77lug
Enclose
29 Doubtful
80 story
Earnest
31 Name 81of era
Process
33 of washing Foundation
83
ores Bluster
85
Large35wagon Hasten86
Permit
36 Bit 87
A universal
37 language Exist90
Amalgamate
38 Bird 95
of prey
Sailing
40 vessel of the Whine 96
Mediterranean Evergreen
98 vine
Fop 42 Kind99 of snow shoe
Cotton
44 fabric Write100
Pertaining
46 to the ear Beam
102
Clown
47 Total
103
Mistake
48 Tumor
105

[107]

[Contents]
Puzzle No. 97
A CRISS CROSS
By Jack Barrett

Only eight unkeyed letters in this, so you’ll jolly well


have to fill in every white square before the struggle is
over. You may chew an inch off your pencil before you
finish, though.
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16
17 18 19 20 21 22
23 24 25 26 27
28 29 30 31
32 33 34 35 36 37 38
39 40 41 42 43 44 45
46 47 48 49 50
51 52 53 54 55 56
57 58
59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74
75 76 77 78 79
80 81 82 83 84 85
86 87 88 89 90 91 92
93 94 95 96 97 98
99 100 101 102 103
104 105 106 107 108
109 110

[106]
HORIZONTAL

A play1 A rule
58
Certify6 Ceremony
59
A dandy
11 Away 60from
Tackle12 A strong
63 horse
A state
14 Distant
66
To content
15 Luminary
68
Melodies
17 A stipulation
69
Combination
19 of things Vigilant
71
Exposes
21 Affirmative
73
Cover 23 Sidle75
Reproachful
25 Destroy
77
expression Stay78
Sincere
26 Observe
79
An ancestor
28 Ourselves
80
Later29 Ravishes
81
Loiters
31 Mercenary
83
From32 Pair 85
(abbr.)
Rates 34 Finished
86
Window36 Annoy87
Upon 38 Passenger
91 car
Sprightly
39 Deduces
93
A jot41 Constituent
96 of soap
Continent
43 Part 97
Man’s 44name Smooth
99 fabric
A stamp
46 Generous
100
Scorches
47 Conceals
103
Obtuse48 Make104of auto
Mexican
50 gum tree Sharp
105
End 51 Floating
106 ice
Siesta53 Propeller
108
Inquire
54 Derisive
109 expressions
Rabbit55 Clothes
110 protectors
A title
57

You might also like