Manual QCA With R - v170407
Manual QCA With R - v170407
Manual QCA With R - v170407
QCA with R
Authors:
Stefan Wittwer
Florence, April 17
Eva Thomann and Stefan Wittwer April 17
Table of contents
How to (not) understand and use this manual – please read! .............................................. 1
1 Introduction to R .............................................................................................................. 4
1.1 Basics ........................................................................................................................... 4
1.2 Working directory........................................................................................................ 5
1.3 Help! ............................................................................................................................ 6
1.4 Packages ...................................................................................................................... 6
1.5 Workspace, operators and objects ............................................................................... 7
1.5.1 Workspace ............................................................................................................ 7
1.5.2 Operators .............................................................................................................. 9
1.5.3 Classes of objects ............................................................................................... 10
1.6 Working with real data .............................................................................................. 11
1.6.1 Open and save datasets ....................................................................................... 11
1.6.2 Inspect and describe data .................................................................................... 12
1.6.3 Recoding, renaming and deleting variables ....................................................... 15
5 XY-plots........................................................................................................................... 32
References ............................................................................................................................... 59
III
Eva Thomann and Stefan Wittwer April 17
Infoboxes
Box 1: Update R and Rstudio from time to time ...................................................................................... 5
Box 2: Working with the visual GUI interface ...................................................................................... 19
Box 3: Rounding sets (and any other variable) ..................................................................................... 23
Box 4: A hands-on template code for calibration and its diagnostic .................................................... 29
Box 5: Aggregating sets when you have missing values ....................................................................... 31
Box 6: More options for making XY-plots ............................................................................................. 34
Box 7: Useful tools for truth table analysis ........................................................................................... 40
Box 8: Default settings for logical minimization ................................................................................... 41
Box 9: Identifying untenable assumptions contradicting the statement of necessity............................. 43
Box 10: Alternative options for flexibly coding and omitting truth table rows ..................................... 46
Box 11: Useful tools for interpreting QCA outputs ............................................................................... 49
Box 12: A hands-on template code for Enhanced Standard Analysis ................................................... 50
IV
Eva Thomann and Stefan Wittwer April 17
As such this manual neither provides a full nor systematic guide to all the relevant packages
(see the respective CRAN package documentations for this), nor does it cover everything that
could be done. Additional packages relevant for QCA (e.g., QCApro and VennDiagram) are
not covered here (see http://www.compasss.org/software.htm). There are often many ways in
which things can be done with R – perhaps more elegant or parsimonious than the ones
presented here. Given the continuous and rapid development of the QCA methodology and
software, the advice presented here is not set into stone. Commands and functions can change
and expand quickly. However, the packages we use should offer full backwards compatibility.
The manual follows the structure of a QCA analysis, and is preceded by a short and selective
introduction to basic features of R which you should read before getting started. In this manual
we use commands in order to “talk” with R and tell it what we want it to do.1 In the introduction,
we outline a few basic features of the “language” that R understands. However, the good news
is that you neither need to learn this whole language, nor all the commands by heart, to perform
QCA with R. You can simply look up the commands for what you want to do in the respective
chapter of this manual, copy-paste the command, and replace the interchangeable parts of the
command with the respective features of your own research. Throughout the manual, R
commands are listed in grey font colour. The commands are numbered, e.g. c 1. Those parts of
the commands that you need to replace to make the commands fit your own purposes are
marked bold. For example:
1
The QCA package does offer a graphical, user-friendly GUI interface to perform QCA, see Box 2. In this manual
we have chosen to work with written commands instead since these commands offer users the full functionality
and flexibility for performing even advanced analytic steps.
1
Eva Thomann and Stefan Wittwer April 17
This should be self-explanatory; copy-paste the command you need and simply insert the name
of your variable, your outcome, your dataset, etc. Importantly, note that sets, conditions and
outcomes are always denoted with UPPERCASE LETTERS. Make sure you also label them
with uppercase letters in your dataset. This is because some commands interpret sets written in
lowercase letters as the negation of the set. If sets, conditions or outcomes are written in
lowercase letters in this manual, they denote the negation of the set, condition or outcome. Also,
make sure that you perform the analytic steps in sections 6-8 with a dataset that only contains
the calibrated sets. Some of these commands do not work if the variables in the dataset do not
range from 0-1.
This manual is intended as an “open source” working document that is continuously improved
based on new developments and the feedback of users. As such, it is and remains work in
progress. We do our very best to ensure the accuracy of the listed commands, but errors (e.g.
typos in commands) remain possible. The manual is updated at relatively regular intervals (ca.
4 months). Suggestions how to improve this manual or notifications on errors are more than
welcome – please send them via e-mail to Eva Thomann (escriba[ at ]hotmail.ch). Please note
that we only consider those suggestions that we deem relevant for the analytic steps described
in this manual, and that are formulated in a concrete manner. So far, we would like to
wholeheartedly thank the following colleagues for their useful input:
2
Eva Thomann and Stefan Wittwer April 17
Adrian Dusa
Nena Oana
Carsten Q. Schneider
Yixian Sun
Koen van der Krieken
What is written in this manual represents the opinion of the authors and not the opinion of those
who helped us with their feedback. All remaining errors are our own responsibility.
The latest version of the manual is available online at http://www.evathomann.com/links/qca-
r-manual.
Eva Thomann
Stefan Wittwer Florence, 7 April 2017
3
Eva Thomann and Stefan Wittwer April 17
1 Introduction to R2
1.1 Basics
Performing the operations described in this manual requires downloading the following
software:
On the upper right, the workspace tab shows all the active objects (see 1.5). On the lower right,
the history tab shows a list of commands used so far. The files tab shows all the files and folders
in your default workspace. The plots tab will show all your graphs. The packages tab will list a
series of packages or add-ons needed to run certain processes. For additional info see the help
tab. On the left-hand side, you see the console, which is where you can see output (i.e. the
results of what you do). You can also type commands there. However, you don’t only want to
give instructions to R, but you also want to save these instructions, so that you can repeat them
any time you want, continue your work the next day, check and modify what you already did,
show what you did to colleagues, make everything you did fully replicable, etc. Therefore, you
want to save your commands in a script, which is written with the editor. Always use the editor,
not the console, to enter code. Once code grows, a good and clear editor becomes indispensable
(e.g. to identify errors or to allow comments). To open the editor and start a new script, click
file -> new file – R script. Now, the editor window appears in the upper left part of the interface;
and the console is now below the editor. The editor is where you enter your codes and
documentation. Save your script frequently: file -> save or save as.
It is advisable to use comments extensively in your script to document what you do and why.
Comments in the script start with # (on Mac: "alt+3"): everything after # in the line is ignored
by R, that is, it is not treated as a command.
c1
#whatever
If you use several #’s before and after the comment, Rstudio recognizes it as a section title to
2
This introductory chapter is partly based on material from the Seminar "Data Analysis with R" by Rudi Farys
and Paul Bauer (University of Bern, WISO) and Chapter 2 of Thiem and Dusa (2012).
4
Eva Thomann and Stefan Wittwer April 17
c2
##### mytitle ######
To navigate to a title, use the tiny bar on the lower end of the editor.
Once you have typed a command, R will not run it until you send the code to the Console by
marking it and then pressing "STRG + R" (Windows) or "cmd + Enter" (Mac). Within the
RStudio script, you can also just press STRG and Enter to run the command line where your
cursor lies. You can also mark those commands you want to run and press STRG and Enter.
New versions of R and Rstudio are issued regularly. Situations can arise in which packages
rely on these new versions, while you still have an older version installed. This may be one
possible reason in case you encounter problems. The easiest way to avoid this is to download
the most recent versions before you get going, if you haven’t used R for a while.
c3
getwd()
Then, set your working directory with setwd(""). The easiest way to do this is to click session
-> set working directory -> choose directory. Browse to the file location where you want to
store the relevant files for your analysis. The path to the working directory will then appear in
the console. Copy-paste it into the script (without the >), so you can set the working directory
just by running that line the next time:
c4
setwd("/Users/mypath")
You can also simply copy-paste the path to the location from the header of your windows folder
into the quotation mark of this command. If you do this, remember to replace \ by / or \\ in the
copy-pasted folder location.
c5
dir()
1.3 Help!
Especially in the beginning, things often do not work immediately in R. The most common
cause for this is that R is case-sensitive. This means that as soon as something is not typed the
way R expects it, it does not recognize it. Check uppercase and lowercase notation; whether
commas, dots and quotation marks are set correctly; and generally for typos. R uses different
quotation marks than word. If R studio keeps crashing down, check whether changing the file
location, splitting up your codes into several, smaller scripts, or enlarging the “plots” window
(lower right) helps.
To get help, just put “?” in front of a command. The ?command gives you the relevant
information and examples for a specific command. For example, to get help for the function
getwd():
c6
?getwd
The ?? command gives you a list of possible help sources for some keyword (in case you don't
know the command, but need help on a topic):
c7
??qca
c8
??"descriptive statistics"
R offers excellent online documentation: help yourself with google! You are never alone with
your R-problems.
1.4 Packages
If R is the kitchen, packages are the kitchenware. There are packages for everything. For
example, the packages QCA and SetMethods can be used for performing QCA with R. Some
packages are loaded permanently by default („base“ packages) while others must be installed:
c9
install.packages("packagename")
So, after starting R studio and opening the editor, copy-paste the following command into the
editor, mark it, and hit STRG and Enter:
6
Eva Thomann and Stefan Wittwer April 17
c 10
install.packages(c("arm", "car", "gmodels", "Hmisc", "MASS", "memisc", "polycor", "psych",
"reshape", "VIM", "lattice", "XML", "xtable", "foreign", "directlabels", "betareg", "plyr",
"dplyr", "QCA", "SetMethods"), dependencies = TRUE)
Probably a window will pop up, where you need to choose a server to download the packages.
The dependencies option ensures that, if a package depends on the existence of another package,
that other package is installed too.
You only need to install a package once. But every time you want to use it, you have to load it
when starting the R session. Load an installed package (without quotes!):
c 11
library(packagename)
To work with this manual, you will have to load the following packages:
c 12
library(lattice); library(arm); library(xtable); library(foreign); library(psych);
library(directlabels); library(betareg); library(VIM); library(base); library(plyr);
library(dplyr); library(QCA); library(SetMethods)
c 13
search()
R works with objects. Objects can be everything. For example, datasets are objects, variables
are objects. You can even store your results as an object, and recall them any time you want.
Data are contained in objects of different size and format (object classes) such as data sets, a
list of numbers, or only a single number or a name. Functions use the content of an object and
produce results.
1.5.1 Workspace
Objects are stored in the workspace. There (upper right), you can see which data sets are loaded
and which results or other objects you stored. Always keep it clear and well-arranged.
7
Eva Thomann and Stefan Wittwer April 17
As R is object-oriented, not only functions can be treated as objects, but also results from
operations using the functions can themselves be saved as objects again. To generate an object,
we first give it a name and then use a backward arrow <- to define its content and store it in the
workspace. For example, here we create an object that consists of the phrase “hi there”:
c 14
myobject <- "hi there"
You can always display the content of the object in the console by simply typing its name:
c 15
myobject
But the content of the object can also be numerical, of course (for numbers, you don’t need the
quotation marks):
c 16
myobject <- 37
myobject
R tells you: 37
c() is a function which combines its arguments (c stands for "concatenate"). This helps you to
list different elements. This function is often used with R. In the example below, we create an
object that consists of the numbers 1, 2, and 3.
c 17
myobject <- c(1, 2, 3)
You can also list words or phrases, instead of numbers, using quotation marks:
c 18
myobject <- c("hi", "what’s up?", "coffee please.")
The dollar sign (see also 1.6.3) is needed if you want to “tie” an object (e.g. a variable) to
another object (e.g., the dataset). For example, you create a new variable (=the tied object) in
the dataset (the object to which it is tied). Without doing this, the variable will only be in the
workspace but not in the dataset. You have to specify the content of the variable. Here we assign
it a value of NA (no answer):
c 19
mydata$newvar <- NA
8
Eva Thomann and Stefan Wittwer April 17
c 20
save(myobject1, myobject2, file="filename" )
c 21
rm(myobject)
c 22
rm(list=ls())
1.5.2 Operators
You can use R much like a calculator. We might be using the following operators:
Arithmetic operators:
Plus: +
Minus: –
Multiplication: *
Division: /
To the power of: ^
Logical operators:
AND: &
OR: |
EQUALS: == (attention: you need a double equal sign!)
NOT: !
DOES NOT EQUAL: !=
SMALLER THAN: <
GREATER THAN: >
SMALLER THAN OR EQUAL TO: <=
GREATER THAN OR EQUAL TO: >=
To use the results of operations for later analysis, store them as an object (see 1.5.1). Otherwise,
the result only appears in the console. For example, here we create an object that is the result
of the operation ((1+3)/7*18)^2:
c 23
myobject <- ((1+3)/7*18)^2
You can also use the operators to compare differing objects or perform operations on them.
9
Eva Thomann and Stefan Wittwer April 17
a <- (5+5)/2*2
a > 10
a == 10
c 24
class(myobject)
Of course, some commands only work with some object classes. For example, you cannot
calculate the mean of a logical object, but only of a numerical one.
Logical objects typically take on the values TRUE or FALSE. It is nice to know that you can
use the command as.numeric() to display logical values as numerical values (1 for TRUE, 0 for
FALSE).
c 25
as.numeric(logicalobject)
This enables you to, for example, count the number of cases that fulfil a certain logical
condition, see e.g. section 3.2.
10
Eva Thomann and Stefan Wittwer April 17
To import a data set, make sure to set the working directory right (getwd()). Before you start
working with a new dataset, you want to clear workspace to avoid that the software “gets
confused”.
c 26
rm(list=ls())
To load the dataset, we build an object that reads the .csv file:
c 27
mydata <- read.csv("mydata.csv", row.names=1, header = TRUE, sep = ";", dec = ",")
mydata
This is the option for excels from countries that use a comma as decimal point and a semicolon
as field separator (sep = ";", dec = ","), the excel convention for CSV files in some Western
European locales. If this does not work, it may be that your excel is formatted differently (e.g.
US version). Try the option sep = "", dec = "." instead, or sep = ",", dec = ".". If you want to
find out how your CSV file is structured, open it in your working directory with the “Editor”
(Windows) or “TextEdit” (Mac).
You can also read the dataset in other formats, e.g. simply as a table:
c 28
mydata <- read.table ("mydata.csv", sep =";", dec=",", header=TRUE)
For STATA files (package "foreign"), use read.dta; for SPSS, read.spss .
c 29
write.csv2(mydata, "mydata.csv")
write.csv2 uses a comma for the decimal point and a semicolon for the separator. If you want
to save your csv in the US format, use the write.csv instead of write.csv2. This uses "." for the
decimal point and a comma for the separator.
Check
?write.table
11
Eva Thomann and Stefan Wittwer April 17
You can save calibrated sets in a new dataset, as a subset of your raw dataset (see also sections
1.6.2 and 4.1). Assume, for example, that you only want to save three calibrated sets from your
raw dataset as a new dataset “fuzzydata”. You may want to do this because QCA has some
commands that only work when all data ranges from 0-1. One possibility is using the subset()
command (package: base):
c 30
myfuzzydata <- subset(myrawdata, select = c("MYSET1", "MYSET2", "MYSET3"))
write.csv2(myfuzzydata, "myfuzzydata.csv")
You can also save a subset of your data that fulfils certain conditions – you are incredibly
flexible there, using the logical operators in 1.5.2. Note that if you have a variable consisting of
character vectors (=words rather than numbers), you should use quotation marks for indicating
which value you mean. Say, for example, you want to create a new dataset that contains only
those cases that have the value “Eastern Europe” (=character vector) for the variable “region”,
and that either have a value of bigger than 0.5 on variable 2, or a value of 0 on variable 3:
c 31
mynewdata <- subset(myolddata, region == "Eastern Europe" & (var2 >= 0.5 | var3 ==
0))
mynewdata
write.csv2(mynewdata, "mynewdata.csv")
Alternatively, using colnames(), in this example you saw that the calibrated sets are the columns
8 to 10. You can tell R to save columns 8 to 10 in a new dataset:
c 32
write.csv2(myrawdata[,8:10], "myfuzzydata.csv")
Much of what is discussed in this section presumes you have installed the package “dplyr”.
The View() (attention: “V” as a capital letter) command makes you see your dataset as if you
opened it normally.
c 33
View(mydata)
Attention: you cannot make any changes in the dataset using the cursor!
12
Eva Thomann and Stefan Wittwer April 17
c 34
length(mydata)
c 35
names(mydata)
You can also use colnames() or rownames() to get the names of the columns (variables) or rows
(cases), respectively:
c 36
colnames(mydata)
c 37
rownames(mydata)
Each column and each row has a number. In the console, R lists the number of the first element
(row or column, depending on which command you chose above) in every new line in square
brackets. This helps you figure out which number a specific case or variable has in the dataset.
Get a first impression of how your dataset looks like (first 6 rows):
c 38
head(mydata)
Look up specific variables in your dataset (here: 2 variables, but it can be less or more, of
course):
c 39
select(mydata, var1, var2)
Apart from select (which only works for variables), you have three basic options to access
elements or subsets of your dataset:
1. By using $
The $ sign means something like “and therein”, for example “variable in mydata”:
c 40
mydata$var
2. By using objectname[rows,columns]
You can use square brackets to specify which part of the dataset you want to see. Before the
13
Eva Thomann and Stefan Wittwer April 17
comma, you specify the number of the row(s), and after the comma, the column(s). If
unspecified, all rows / columns are displayed. Here, we look up the third variable of mydata:
c 41
mydata[,3]
Now, we look up the value of the first case for the third variable:
c 42
mydata[1,3]
You can do the exactly same thing by specifying the name of the variable in the dataset:
c 43
mydata$var[1]
You can also list several elements you want to see. For example, you can look up how the first,
second and eighth variable look like for rows (cases) number 2-4:
c 44
mydata[2:4, c(1,2,8)]
3. By using subset().
The subset command is extremely useful to identify subsets of your data that fulfil certain
logical conditions. For example, we want to see only those cases that have a value of greater
than 5 in the first variable and a value smaller than or equal 60 in the second variable.
c 45
mynewdata <- subset(mydata, var1 > 5 & var2 <= 60)
mynewdata
You could save these cases in a new dataset:
c 46
write.csv2(mynewdata, "mynewdata.csv")
c 47
mydata[order(mydata$var, decreasing=FALSE), ]
c 48
mydata[order(mydata$var, decreasing=TRUE), ]
14
Eva Thomann and Stefan Wittwer April 17
c 49
describe(mydata)
c 50
mean(mydata$var)
median(mydata$var)
sd(mydata$var)
Check if there are missings in your dataset or a specific variable (see also section 3.2). R will
give you, for each case, the answer TRUE (is missing) or FALSE (not missing).
c 51
is.na(mydata)
c 52
is.na(mydata$var)
You can attribute values to cases. For example, you want the first case to have a missing value
(NA, ‘Not Available’) for the third variable:
c 53
mydata[1,3] <- NA
The recode function allows you to tell R which variable you want to recode, and according to
what rules you want to attribute (=) which value. Several rules can be combined using
semicolons ; . Rules can be
In the example below, we recoded var into newvar, such that values of 1, 3 and values between
5 and 7 obtain the value 1; values of 4, 5, 8 and 9 obtain value 2; and all other values retain the
same values as the old variable (else=copy).
15
Eva Thomann and Stefan Wittwer April 17
c 54
mydata$newvar <- recode(mydata$var, "1,3,5:7=1; 4,5,8,9=2; else=copy")
This second option looks a bit more complicated than the first option, but you are more flexible
in the rules you can use. You can create a new variable in the dataset to recode another variable.
For example, we dichotomize a variable: values above 6.33 are recoded into 1, all others into
0. We use $ to ensure that the new variable (object) is tied into the dataset.
c 55
mydata$newvar <- NA
Instead of creating a new variable with missings, you can also create a new variable that equals
the old variable:
c 56
mydata$newvar <- mydata$var
Assign the new variable a value of 0 if the values of the old variable are lower than or equal to
6.33:
c 57
mydata$newvar[mydata$var <= 6.33] <- 0
Then, we assign the value 1 to values of the old variable that are higher than 6.33:
c 58
mydata$newvar[mydata$var > 6.33] <- 1
Needless to say, you can use all logical operators (see 1.5.2) for this and any values you like.
You can also perform operations with variables to create a new variable. For example, we create
a new variable which divides variable 2 by variable 1.
c 59
mydata§newvar <- mydata$var1/mydata$var2
Always check whether the recoding worked. Compare the old and the new, recoded variable:
c 60
select(mydata, var, newvar)
If you wish to rename a variable (here: rename “var” with the new variable name “newvar”):
16
Eva Thomann and Stefan Wittwer April 17
c 61
names(mydata)[names(mydata)=="var"] <- "newvar"
or
c 62
mydata <- rename(mydata, newvar = var)
Alternatively, to rename a variable, you can simply create a new variable that equals the old
variable, and then remove the old variable:
c 63
mydata$newvar <- mydata$var
mydata$var <- NULL
And in fact, you can delete any variable any time (attention: you won’t be able to undo this!):
c 64
mydata$var <- NULL
17
Eva Thomann and Stefan Wittwer April 17
c 65
install.packages(c("arm", "car", "gmodels", "Hmisc", "MASS", "memisc", "polycor", "psych",
"reshape", "VIM", "lattice", "XML", "xtable", "foreign", "directlabels", "betareg", "plyr",
"dplyr", "QCA", "SetMethods"), dependencies = TRUE)
Within the RStudio script, you can also just press STRG and Enter to run the command line
where your cursor lies.
Probably a window will pop up, where you need to choose a server to download the packages.
Click file, new file, R script to start writing an R code (upper left side). You can save the script
clicking file, save or file, save as… .
c 66
rm(list = ls())
c 67
library(lattice); library(arm); library(xtable); library(foreign); library(psych);
library(directlabels); library(betareg); library(VIM); library(base); library(plyr);
library(dplyr); library(QCA); library(SetMethods)
Set your working directory (see 1.2). It should be set to the same folder to which you safe your
R script. All datasets used should be saved in that folder, too. For example:
c 68
setwd("C:/Users/Eva/Work/ Data")
18
Eva Thomann and Stefan Wittwer April 17
Some people find it complicated to work with command lines; they prefer having a visual
interface where they can click buttons to go through their analysis. In fact, the QCA package
offers this option, using the runGUI() command. The interface is very intuitive and has many
similarities with that of Ragin’s fs/QCA software. You will be able to do a basic QCA and
produce nice graphs with it. However, it is also still in development and therefore does not
yet offer the full functionality we employ in this manual (and according to the QCA manual,
there can still be bugs). To find out more about this, type:
?runGUI
c 69
mydata <- read.csv("mydata.csv", row.names=1, header = TRUE, sep = ";", dec = ",")
c 70
names(mydata)
c 71
rownames(mydata)
c 72
mydata[order(mydata$var, decreasing=FALSE), ]
c 73
select(mydata, var1, var2)
19
Eva Thomann and Stefan Wittwer April 17
c 74
describe(mydata)
c 75
write.csv2(mydata, "mydata.csv")
Save a subset of your dataset (here: sets 1, 2 and 3; for other options, see 1.6.1):
c 76
mynewdata <- subset(myolddata, select = c("MYSET1", "MYSET2", "MYSET3"))
write.csv2(mynewdata, "mynewdata.csv")
R only recognizes values of “NA” for missing values. If missings are denoted by a different
sign – e.g., “-99”, you can (and have to) convert these into NAs:
c 77
mydata[mydata==-99] <- NA
QCA cannot attribute cases with missing values to the truth table, so you have to exclude cases
with missing values from the analysis.
You can check whether there are missing observations in your dataset (TRUE = missing,
FALSE = not missing):
c 78
is.na(mydata)
c 79
is.na(mydata$var)
And you can check whether the cases are complete, i.e. do NOT contain missings (TRUE = no
missings, FALSE = has one or several missings):
c 80
complete.cases(mydata)
You can obtain the number of cases with missing values (0) and with no missing values (1):
20
Eva Thomann and Stefan Wittwer April 17
c 81
nomissings <- as.numeric(complete.cases(mydata))
table(nomissings)
and based on this you can obtain the percentage of cases with (left-hand side of crosstable) and
without (right-hand side) missings:
c 82
prop.table(table(nomissings))
Obtain the number of cases with missings on a specific variable. This time, these will be cases
with value 1 on the object, because we are asking for those cases with the value NA (“is.na”):
c 83
c 84
Identify the names of the cases that have missing values on the variable:
c 85
varmiss <- as.numeric(is.na(mydata$var))
rownames(subset(mydata, varmiss==1))
You can also check this for several variables simultaneously. The example below presumes that
we first used as.numeric(is.na()) (see above) for identifying the cases with missing values for
variables 1, 2 and 3. So you can then identify those cases that have missings on one or several
variables of interest to you.:
c 86
rownames(subset(mydata, var1miss==1 | var2miss==1 | var3miss==1))
If you wish to exclude cases with missings from your dataset, you can simply only include cases
with no missings on any variable in the dataset:
c 87
nomissings <- as.numeric(complete.cases(mydata))
mydata = mydata[nomissings== 1,]
OR you can first identify cases with missings for certain variables (here: variables 1, 2 and 3),
21
Eva Thomann and Stefan Wittwer April 17
and then only include only those cases without missings in the dataset:
c 88
var1miss <- as.numeric(is.na(mydata$var1))
var2miss <- as.numeric(is.na(mydata$var2))
var3miss <- as.numeric(is.na(mydata$var3))
mydata = mydata[var1miss == 0,]
mydata = mydata[var2miss == 0,]
mydata = mydata[var3miss == 0,]
Package: QCA; see also section 1.6.3. Say you want to recode values of 4 to values of 3 for a
given variable. You can EITHER create a new variable:
c 89
mydata$varnew <- recode(mydata$var, "4=3; else=copy")
OR you could simply overwrite the existing variable (but then, the old values are lost):
c 90
recode(mydata$var, "4=3; else=copy")
Ragin (2008b, 2009), Schneider and Wagemann (2012: 32-52, 232-244), Thiem and Dusa
(2012: 27-32, 51-62, 2013b: 89).
4.1 Calibration
In order to decide about the appropriate calibration threshold (section 4.1.3), you want to look
at your data (section 1.6.2), play around with it (section 1.6.3), and visualize different
calibration options and how they affect skewness (section 4.2). The nice thing about R is that
you can very flexibly try out different things and go back and forth between these different
steps. After calibration, you want to save what you did (section 1.6.1). Always label your sets
using uppercase notation.
22
Eva Thomann and Stefan Wittwer April 17
Simply enter the thresholds for full nonmembership “e” (here: 1), the crossover point “c” (here:
2.5), and full membership “i” (here: 4).
c 91
mydata$MYFUZZYSET <- calibrate(mydata$rawvar, type = "fuzzy", thresholds = "e=1,
c=2.5, i=4", logistic = TRUE)
By default this command uses a logistic function and values of 0.05 and 0.95 as thresholds for
full (non)membership. If you set logistic=FALSE, then a linear function is used for
transforming the raw data into a set (Package: QCA).
You can additionally tell the software to code cases with certain values on the raw variable as
a certain set membership. Here, for example, after calibration we want to code cases with a
value of 3 on a five-point Likert scale (3 = neither agree nor disagree) of the raw variable as
fully out of the (already calibrated) fuzzy set (0.05):
c 92
mydata$MYFUZZYSET[mydata$rawvar == 3] <- 0.05
By default, R calibrates sets with as many decimals as there may be. However, you can round
calibrated sets (here: 2 decimals):
c 93
MYFUZZYSETROUNDED <- round(MYFUZZYSET, digits = 2)
or you can directly round when calibrating, e.g.:
c 94
mydata$MYFUZZYSET <- round(calibrate(mydata$myrawvar, type = "fuzzy",
thresholds = "e=1, c=2.5, i=4", logistic = TRUE), digits=2)
Note that whether or not you round the sets slightly affects the parameters of fit.
As described in Ragin (2009: 91), especially when we are not dealing with fine-grained interval-
level data, we often resort to multi-value fuzzy sets (e.g., 5-value fuzzy sets or 7-value fuzzy
23
Eva Thomann and Stefan Wittwer April 17
sets), based on a qualitative classification of observations into fuzzy set membership scores –
in other words, a procedure of coding / recoding raw data. This procedure is often referred to
as “indirect calibration”, but to be precise, it describes only the first step of a two-step,
quantitative procedure which is rarely applied in practice (see also Ragin 2008b; Schneider and
Wagemann 2012: 35ff). When using this procedure, be careful not to assign values of 0.5 to
empirical cases.
In order to code multi-value fuzzy sets, you can assign a fuzzy set membership for specific
values on the raw variable – essentially, recode a variable into a set (see for more details section
1.6.3). Here, for example, we create a four-value fuzzy set (0, 0.3, 0.7, 1) out of a raw variable
that ranges from 0 to 3:
c 95
mydata$MYFUZZYSET <- recode(mydata$rawvar, "0=0; 1=0.3; 2=0.7; 3=1; else=NA")
mydata$MYFUZZYSET
We can also assign certain fuzzy set values to cases that fall within a given range of the raw
variable that you can define as you see fit. In the example here, raw values from 1 to 3 result in
a fuzzy value of 0, raw values greater than 3 but smaller than 5 are assigned a fuzzy value of
0.33, and raw values equal or greater than 5 get a fuzzy value of 1:
c 96
mydata$MYFUZZYSET <- NA
mydata$MYFUZZYSET[(mydata$rawvar >= 1)&(mydata$rawvar <=3)] <- 0
mydata$MYFUZZYSET[(mydata$rawvar >3)&( mydata$rawvar < 5)] <- 0.33
mydata$MYFUZZYSET(mydata$rawvar >= 5) <- 1
mydata$MYFUZZYSET
Depending on what you prefer, you can do this using the recode function or using logical
operations (see 1.6.3 and 3.3). The calibrate() function also offers a subsets of the possibilities
described here if the option type= "crisp", see the QCA package manual.
We can calibrate a crisp set by simply indicating the crossover point (here: 50) (package: QCA):
c 97
mydata$MYCRISPSET <- calibrate(mydata$rawvar, type = "crisp", thresholds = 50,
include = TRUE)
If you set include = TRUE, values of 50 will be calibrated as set membership 1. If it is set to
24
Eva Thomann and Stefan Wittwer April 17
Calibrating crisp sets can also be done by recoding the data, see 4.1.1 (and 1.6.3). Again, we
can either assign concrete raw values to concrete crisp set memberships. For example, we assign
raw values of 1 and 3 a crisp set membership of 0, and raw values of 2, a crisp set value of 1:
c 98
mydata$MYCRISPSET <- recode(mydata$rawvar, "1=0; 3=0; 2=1; else=copy")
Or we can define how specific ranges of the raw values result in crisp set memberships. For
example, we assign raw values between 0 and 2 a crisp set membership 0, and raw values above
2, a crisp set membership 1:
c 99
mydata$MYCRISPSET <- recode(mydata$rawvar, "0:2=0; else=copy")
mydata$MYCRISPSET(mydata$rawvar > 2) <- 1
1. Labelling sets
Future versions of QCA will have a lot of commands that allow you to negate sets using
lowercase notation. In this scenario, whether you use upper- or lowercase letters is really
decisive. To avoid problems and confusion, always use uppercase letters for labelling your sets,
and lowercase letters for their negation.
QCA analyses are often perceived as very complex and hard to understand by outsiders. It
considerably reduces the perceived complexity of QCA results if you use short labels for your
sets. It also makes it easier to fit results into tables and figures. Ask yourself: do I need more
than 1 letter to describe my set, and how many more do I minimally need?
The QCA package does offer data-driven ways to find calibration thresholds. They are not
included in this manual because the resulting sets are very hard to interpret. The most important
analytic choice is that of the threshold that establishes the difference in kind. Whenever possible,
use conceptual and theoretical criteria for the crossover point and avoid using purely empirical
criteria such as descriptive statistics. Avoid using the median as crossover point: it can usually
not be interpreted other than the set of “cases with values equal as or higher than 50% of the
25
Eva Thomann and Stefan Wittwer April 17
other cases”. Similarly, if you use the sample mean (e.g., for unemployment) as crossover point,
the conceptual meaning of the set is “unemployment above average in the cases observed”. All
this does not mean that empirical criteria are not important for determining calibration
thresholds. In particular, you should avoid overly skewed sets (4.2.2 and 8.1) and empirical
cases on the crossover point (4.2.3).
After calibrating your raw data, save the calibrated sets in a new dataset. Some commands do
not work if the dataset does not contain only variables that range from 0-1. We have seen in
1.6.1 how this works:
c 100
myfuzzydata <- subset(myrawdata, select = c("MYSET1", "MYSET2", "MYSET3"))
write.csv2(myfuzzydata, "myfuzzydata.csv")
During calibration, you want to see how your data is distributed, and know if you have (and
avoid) overly skewed sets and empirical cases with set membership 0.5.
4.2.1 Visualization
You can visualize the calibration with an XY plot, which will equally show you not only if
there are cases on the 0.5 threshold, but also how the cases distribute in the set (see also section
5 on graphs).
A basic way of doing this can be to plot a set against its raw scores and set a horizontal line at
the crossover point, as well as a vertical line at the raw value that indicates the crossover point
(here: 25) (package: graphics).
26
Eva Thomann and Stefan Wittwer April 17
c 101
plot(mydata$rawvar, mydata$MYSET, pch=18, col="black",
main='MYSET',
xlab=' Raw score ',
ylab=' Fuzzy score ')
abline(h=0.5, col="black")
abline(v= 25, col="black")
In the example below, we plot the calibration of “MYSET”. The crossover point is set at 25,
and indicated by a vertical black line (v= 25); we have also added a horizontal black line at set
membership 0.5 (h=0.5 – optional). In addition, the last two command lines (optional) add two
dotted vertical lines to the graph to indicate two alternative plausible crossover points (18 and
50) that we decided could be tested for robustness. The plot shows us whether, for example,
changing the crossover point from 25 to 50 would change the qualitative set membership of an
empirical case (indicated by a dot in-between the black and the left-hand side dotted line).
c 102
plot(mydata$rawvar, mydata$MYSET, pch=18, col="black",
main='MYSET',
xlab=' Raw score ',
ylab=' Fuzzy score ')
abline(h=0.5, col="black")
abline(v= 25, col="black")
abline(v= 18, col="black", lty="dotted")
abline(v= 50, col="black", lty="dotted")
In the following plot, we only test for one possible alternative crossover point (25 = regular
crossover point, 18 = alternative crossover point). The line for the alternative crossover point is
shaded grey, and we also add the data curve for the alternative calibration using the points
option – again, shaded grey.
c 103
plot(mydata$rawvar, mydata$MYSET, pch=18, col="black",
main=' MYSET ',
xlab=' Raw score ',
ylab=' Fuzzy score ')
points(mydata$rawvar, mydata$MYALTERNATIVESET, pch=18, col="grey",
lty="dotted")
abline(h=0.5, col="black")
abline(v= 25, col="black")
abline(v= 18, col="grey", lty="dotted")
27
Eva Thomann and Stefan Wittwer April 17
4.2.2 Skewness
To check the skewness of your set, you can identify the number of cases that have set
membership above 0.5:
c 104
skewMYSET <- as.numeric(mydata$MYSET > 0.5)
sum(skewMYSET)
Obtain the percentage of cases with set membership above 0.5 (right-hand side value):
c 105
prop.table(table(skewMYSET))
Identify the names of the cases with set membership above 0.5:
c 106
rownames(subset(mydata, MYSET > 0.5))
You can check the number of cases that have membership of 0.5 in a set:
c 107
checkMYSET <- as.numeric(mydata$MYSET == 0.5)
sum(checkMYSET)
c 108
prop.table(table(checkMYSET))
It should be 0. If it is not, find out which cases have membership 0.5 on a set:
c 109
rownames(subset(mydata, checkMYSET==1))
c 110
rownames(subset(mydata, checkMYSET1==1 | checkMYSET2==1 | checkMYSET3==1))
You can exclude all cases that have values of 0.5 on one of your sets (here: outcome and 3
conditions):
28
Eva Thomann and Stefan Wittwer April 17
c 111
mydata = mydata[mydata$OUTCOME != 0.5,]
mydata = mydata[mydata$COND1 != 0.5,]
mydata = mydata[mydata$COND2 != 0.5,]
mydata = mydata[mydata$COND3 != 0.5,]
mydata
To cut a long story short, for your convenience, here is a ready-made, hands-on template code
for what you could use as a standard procedure when calibrating a fuzzy set using the direct
calibration method:
c 112
# descriptive statistics
describe(mydata$var)
# visualize calibration
plot(mydata$var, mydata$MYSET, pch=18, col="black",
main='MYSET',
xlab=' Raw score ',
ylab=' Fuzzy score ')
abline(h=0.5, col="black")
abline(v= 25, col="black")
29
Eva Thomann and Stefan Wittwer April 17
If you create a new set (a negated set, a disjunction or a conjunction, or a combination of these),
you may want to store it as an object (NEWSET <- operation), so that you can use it for further
analysis. You can also (but don’t have to) tie it to the dataset by using the dollar sign
(mydata$NEWSET <- operation), so that it appears as a new variable in the dataset.
You have several options to calculate the cases’ membership in combined sets.
c 113
sol <- compute("myset1*MYSET2 + myset3*MYSET4*myset5 + MYSET6",
data=mydata)
To calculate the cases’ membership in a negated set, you can simply subtract the set from 1:
c 114
mydata$myset <- 1-mydata$MYSET
Note that here we use lowercase notation to label the negated set.
This way, you can also directly negate the set within a different command, without previously
creating the negated set as a separate object. We do this with set 3 in the next example.
You can combine several sets (e.g., SET1*SET2*set3 that together form path 1 of the solution
formula) with the logical AND (which implements the minimum rule):
c 115
path1 <- fuzzyand(mydata$MYSET1, mydata$MYSET2, 1-mydata$MYSET3)
You can combine several sets (here: path1 + path2) with the logical OR using the maximum
rule:
c 116
myunion <- fuzzyor(path1, path2)
30
Eva Thomann and Stefan Wittwer April 17
You can use both fuzzyand() and fuzzyor() with either sets from your dataset (like
mydata$MYSET1) or sets you created as objects (like path1).
You can also combine these commands in the same expression. For example, here we calculate
membership in “set1*SET2 + set3*SET4*set5 + SET6” (but note: compute() does the same
much faster!):
c 117
fuzzyor(fuzzyand(1 - MYSET1, MYSET2), fuzzyand(1 - MYSET3, MYSET4, 1 -
MYSET5), MYSET6)
You may encounter a situation in which you want to combine different sets using the logical
OR or the logical AND, but one (or several) of the component sets has missing values. In this
case, you may want the composed set to take on the (minimum or maximum) value of the
component set(s) that does not have a missing value. In fact, this way, building composed
sets can be a nice way to reduce sample dropout due to missing data.
The functions fuzzyand() and fuzzyor() , however, will not do that for you. You can use the
functions pmin() (logical AND) and pmax() (logical OR) instead (package: base) and specify
the option na.rm=TRUE:
c 118
c 119
If you have calculated a QCA solution using eqmcc (see section 7) and stored the solution as
an object (e.g. psOUTCOME), then the cases’ membership in the different paths of the solution
term are stored in the solution object and can be looked up using
c 120
psOUTCOME$pims
You can select the path you wish to work with and, for example, store it as an object for further
31
Eva Thomann and Stefan Wittwer April 17
c 121
path1 <- psOUTCOME$pims$`COND1*cond2`
You could do this for all paths of the solution term and then calculate membership in the whole
solution term sol:
c 122
sol <- fuzzyor(path1, path2, path3)
Based on all this, you can make XY-plots of truth table rows or of your necessary or sufficient
conditions and the outcome, see section 5.
5 XY-plots3
References:
Schneider and Rohlfing (2013), Schneider and Wagemann (2012: 305-312), Thiem and Dusa
(2012: 80-83).
See also section 4.2.1 for visualizing calibration, and 8.1 for visualizing skewness checks.
You have four commands at disposal to produce xy-plots with R. For QCA, we will usually
work with either xy.plot() (for plotting single conditions or customized predefined sets;
XYplot() from the QCA package equally works) or pimplot() (for plotting the solution term).
See Box 6 for other options for making XY plots.
The xy.plot command has the advantage that it enables us to identify cases in the plot (case.lab
= TRUE):
c 123
xy.plot(mydata$myx, mydata$myy, case.lab = TRUE, labs = rownames(mydata),
necessity=TRUE)
It also automatically integrates dotted lines for the quadrants, and a black diagonal; and it
indicates the parameters of fit (consistency, coverage, PRI or RoN), which can be set to
3
Venn-Diagrams can be performed with R, but are not covered in this manual. See https://cran.r-
project.org/web/packages/VennDiagram/VennDiagram.pdf .
32
Eva Thomann and Stefan Wittwer April 17
c 124
xy.plot(mydata$myx, mydata$myy, ylab = "Label of my Y", xlab = "label of my X",
case.lab = TRUE, labs = rownames(mydata))
You can directly plot a solution term that was calculated with the eqmcc() function and stored
as an object, e.g. named as “psOUTCOME” (see section 7).
c 125
pimplot(data=mydata, results=psOUTCOME, outcome = "OUTCOME", intermed=FALSE,
sol=1)
Specify intermed=TRUE if you are using the intermediate solution; the option sol= can be used
to specify which model you want to plot in case of model ambiguities. This is very useful as a
diagnostic tool. It will give you an XY plot of each path of the solution term, and of the whole
solution term, including parameters of fit (consistency, coverage, PRI, Haesebrouck’s (2015)
consistency) and case labels. In order to display the different plots, move the cursor again to
the command and hit enter again. You can also use pimplot() to plot truth table rows, see Box
7, and to plot compound necessary conditions, see section 6.2.
It is useful to know that you can combine several plots in one graph. Just specify the rules before
typing the command for the first plot. For example, here we specify that we want to have 3 rows
of plots, each row displaying 2 plots:
c 126
par(mfrow=c(3, 2))
And then simply list the commands for your 6 plots below that. Note that R will keep applying
this rule until you undo it. If you want to return to a single plot, just specify par(mfrow=c(1, 1))
before running the command for that plot.
33
Eva Thomann and Stefan Wittwer April 17
c 127
plot(mydata$myx, mydata$myy, pch=18, col="black")
You can label the plot (main) and the axes (xlab and ylab):
c 128
plot(mydata$myx, mydata$myy, pch=18, col="black",
main=' Title of my plot ',
xlab=' Label of my X ',
ylab=' Label of my Y ')
You can add a number of features. For example, using xlim and ylim, you can define that the
x-axis and/or the y-axis has a certain range (by default, the axis uses the sample range).
Furthermore, with the abline option you can add horizontal (h) and vertical (v) lines to the
xy-plot, which can be full or dotted. In the example below, we define a range for the x-axis
from 0 to 3, and a range for the y-axis from 10-150. We add a horizontal, full line at where
Y equals 80; and a vertical, dotted line where X equals 2.75 (for another example see section
4.2.1).
c 129
plot(mydata$myx, mydata$myy, pch=18, xlim=c(0, 3), ylim=c(10, 150), col="black",
main=' Title of my plot ',
xlab=' Label of my X ',
ylab=' Label of my Y ')
abline(h=80, col="black")
abline(v= 2.75, col="black", lty="dotted")
c 130
XYplot(myx, myy, mydata, relation = "nec", mguides = TRUE, jitter = TRUE, xlab =
"label of my x", ylab = "label of my Y", clabels =rownames(mydata))
The great thing about R is that you can directly export the graphs, good-looking and ready-
34
Eva Thomann and Stefan Wittwer April 17
made. Above the graph (which is displayed in the lower right quadrant of the R Studio
interface), click the option export -> save as image… and then you can specify the size of the
graph as well as its format (JPEG, TIFF, etc.). The disadvantage of all the ways to produce xy-
plots with R is that if one data point describes several cases, the case labels will overlap and
hence become illegible. In this case, use the separate excel template provided in the course
material.
6 Analysis of necessity
References:
Goertz (2006), Ragin (1987, 2000, 2006), Schneider and Wagemann (2012: 69-76), Thiem
(2016), Thiem and Dusa (2012: 32-38, 62-68, 2013b: 90-91).
If you want to “deductively” test the necessity for single necessary conditions or theoretically
defined disjunctions representing higher-order constructs, the pof command (package: QCA)
can be used if the option relation = "nec" is set (if it is set to "suf", then the command tests for
the sufficiency of the listed conditions). If you want to test for several conditions, create the list
the conditions for which you want to test first. For negating conditions or the outcome, use
either a tilde or 1-. This command gives you the consistency, coverage and relevance (RoN) of
necessity for each listed condition.4 In the example below, we test the necessity of conditions
1, 2, and 3 for the negated outcome.
c 131
conds <- subset(mydata, select = c("COND1", "COND2", "COND3"))
pof(conds, ~OUTCOME, mydata, relation = "nec")
You can also use this command for testing the necessity of only one condition:
c 132
pof(COND, ~OUTCOME, mydata, relation = "nec")
If you want to test the necessity of the negated conditions for this outcome, you can simply
4
Similarly, you can use the QCAfit() command (package: SetMethods), which equally works for both necessity
and sufficiency, check ?QCAfit and section 7.1.
35
Eva Thomann and Stefan Wittwer April 17
subtract the conditions from 1. Here we check the necessity of the negated conditions for the
positive outcome:
c 133
pof(1-conds, OUTCOME, mydata, relation = "nec")
You can also use pof() for complex condition sets. You can use either lowercase notation or
tildes to negate conditions. For example, we want to know whether the condition “COND1 +
cond2*COND3” is necessary (<=) for the outcome:
c 134
pof("COND1 + cond2*COND3 <= OUTCOME", data = mydata)
You can also make an XY plot that integrates the parameters of fit. This can be done for
necessity (necessity=TRUE), but also for sufficiency (necessity=FALSE). This has the
advantage that you can make a visual diagnostic of contradictory cases (in the upper left
quadrant) and trivialness.
c 135
xy.plot(mydata$COND, mydata$OUTCOME, necessity=TRUE)
c 136
xy.plot(mydata$COND, mydata$OUTCOME, case.lab = TRUE, labs =
rownames(mydata), necessity=TRUE)
The super-/subset command (package: QCA) “inductively” identifies all supersets of the
outcome, both single conditions, conjunctions and disjunctions (unions) of sets. So using this
command, you can basically skip the steps described in 6.1. The option incl.cut serves to specify
a minimal consistency threshold that the conditions need to pass (for fuzzy sets, usually 0.9;
typically 1.0 for crisp sets). Using cov.cut, you can also specify a coverage cutoff (here: 0.6),
below which the necessary conditions and are deemed trivial and will not appear in the list. Use
a tilde ~ to negate the outcome, e.g. outcome = "~OUTCOME".
36
Eva Thomann and Stefan Wittwer April 17
c 137
superSubset(mydata, outcome = "OUTCOME",
conditions = "COND1, COND2, COND3",
incl.cut = 0.9, cov.cut = 0.6)
This command will give you a lot of supersets, but to decide whether they can be deemed
necessary, you will still have to 1) check for deviant cases consistency in kind (plot the result),
2) check for empirical relevance / trivialness (Goertz 2006), and 3) identify whether the superset
makes theoretical sense as a necessary condition, that is, whether the sets combined with the
logical OR represent some higher-order concept (see Schneider and Wagemann 2012). Note
also that the different supersets produced by this command are alternatives to each other: for
example, if A*B is necessary, in the output you will find A*B, but also A, and also B. In
summary, this command is useful because it gives you all potential necessary conditions in one
go; but do not use it for mindless data-mining.
If you store the results of superSubset() as an object (here: nec), you can plot all compound
necessary conditions by using pimplot():
c 138
nec <- superSubset(mydata, outcome = "OUTCOME",
conditions = "COND1, COND2, COND3",
incl.cut = 0.9, cov.cut = 0.6)
c 139
pimplot(data=mydata, results=nec, outcome= "OUTCOME", necessity=TRUE)
Use the backward arrow above the plot window to view all plots. The latter also indicate all
parameters of fit (including RoN) and the cases with membership > 0.5 in the outcome are
labelled.
Alternatively, you can also use command c 113 for calculating membership in the compound
condition, and command c 136 for the plot.
7 Analysis of sufficiency
References:
Baumgartner (2015), Baumgartner and Thiem (2015), Ragin (1987, 2000, 2008a, 2009),
Rihoux and Ragin (2009), Schneider and Wagemann (2012: 91-220, 2013, 2015, Thiem and
Dusa (2012: 38-49, 68-80, 2013a: 513-519, 2013b: 91-95).
37
Eva Thomann and Stefan Wittwer April 17
The sufficiency (consistency, coverage and PRI) of single conditions can be (“deductively”)
tested using the pof() command of the QCA package (just specify relation = "suf"), and
visualized using, for instance, the xy.plot() command (again, specify necessity=FALSE) (see
section 6.1). Alternatively, you can use QCAfit() of the SetMethods package (which
additionally gives you the Haesebrouck consistency and also works for necessity, specify
necessity=TRUE). You can negate the outcome by specifying neg.out=TRUE. If you don’t
want to label your condition, skip cond.lab= "COND".
c 140
QCAfit(mydata$COND, mydata$OUTCOME, cond.lab= "COND", necessity=FALSE,
neg.out=FALSE)
You can also do this for several conditions at once. Just build an object (here: conds) first that
contains your conditions:
c 141
conds <- subset(mydata, select = c("COND1", "COND2", "COND3"))
QCAfit(conds, mydata$OUTCOME, cond.lab= c("COND1", "COND2", "COND3"),
necessity=FALSE, neg.out=FALSE)
First you want to figure out where to set the raw consistency threshold. For this you can produce
a basic truth table, with raw consistencies (sorted by descending) and indicating individual cases
(package: QCA).
c 142
ttOUTCOME <- truthTable(data=mydata, outcome = "OUTCOME",
conditions = "COND1, COND2, COND3",
incl.cut=1.00, sort.by="incl, n", complete=FALSE, show.cases=TRUE)
ttOUTCOME
If you want to analyze the negated outcome, simply set a tilde before the outcome, e.g.,
"~OUTCOME".
The truth table command provides several options. We set the raw consistency threshold using
incl.cut. If you skip this option, then all outcomes are coded 0 except for rows with consistency
1. n.cut is used to specify a frequency threshold (by default: 1). The sort.by option can be set
38
Eva Thomann and Stefan Wittwer April 17
such that the rows are ordered by raw consistency ("incl"), or by the N ("n"), or by both (as
done here). With decreasing = FALSE, the truth table rows would be ordered in ascending
(instead of descending) order, e.g., by ascending raw consistencies. With the show.cases option,
we can choose whether we want to indicate the single cases contained in a truth table row
(TRUE) or not (FALSE). If the complete option is set to TRUE, then all logical remainders are
also displayed; if FALSE, then only empirically observed truth table rows are displayed. The
command below produces a truth table for the negated outcome, with raw consistency threshold
0.828, frequency threshold 1, sorted by descending raw consistency (and, raw consistency being
equal, by N), showing only empirically observed rows, and showing individual cases.
c 143
ttoutcome <- truthTable(mydata, outcome="~OUTCOME",
conditions = "COND1, COND2, COND3",
incl.cut=0.828, n.cut=1, sort.by="incl, n", decreasing=TRUE,
complete=FALSE, show.cases=TRUE)
ttoutcome
Note that we used lowercase letters here for labelling the truth table object because it analyzes
the negated outcome. You will know that ttOUTCOME is the truth table for the positive
outcome, and ttoutcome, for the negated outcome.
The eqmcc() command (package: QCA) performs logical minimization of the truth table (i.e.,
the object we created with the truth table command, below: ttOUTCOME). First we calculate
the conservative solution and obtain all details (consistency, raw and unique coverage). We
also want to display the cases (show.cases) contained in the prime implicants (optional). The
use.tilde option can be set to FALSE if you prefer using uppercase and lowercase notation for
sets and their negation; and to TRUE if your prefer to denote negated sets with a tilde. It can be
skipped and then upper-/lowercase notation is used.
c 144
csOUTCOME <- eqmcc(ttOUTCOME, details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
csOUTCOME
To calculate the parsimonious solution, we tell the software to include logical remainders
(those with outcome “?”) representing simplifying assumptions.
39
Eva Thomann and Stefan Wittwer April 17
c 145
psOUTCOME <- eqmcc(ttOUTCOME, include="?", details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE)
psOUTCOME
Once you see the truth table, in order to find the appropriate raw consistency threshold, you
may want to plot different truth table rows. You can plot truth table rows easily using
pimplot(), including the names of cases with membership > 0.5 in the row and all parameters
of fit (consistency, coverage, PRI, Haesebrouck’s consistency). However, note that this
requires you to already have calculated a solution which you stored as an object (here:
psOUTCOME; you could use any raw consistency to begin with, just to get such a provisional
solution object to work with). Here we plot the truth table rows number 1 and 26. Use the
backward arrow above the plot window in order to get inspect all the different plots:
c 146
pimplot(data=mydata, results=psOUTCOME, ttrows=c("1", "26"), outcome=
"OUTCOME")
You can also simply plot all truth table rows above a certain raw consistency level (here:
0.8) at once. The resulting plots have the row number as label of the X axis:
c 147
pimplot(data=mydata, results=psOUTCOME, incl.tt=0.8, outcome= "OUTCOME")
If, for some reason, you do not wish to calculate a solution object yet, you can also build the
sets of different truth table rows (see section 4.3) and plot these rows (see section 5). For your
convenience, here is a repetition on how you can do this. Here we plot the truth table row
number 1, which is COND1*cond2*COND3:
c 148
row1 <- compute("COND1*cond2*COND3", data=mydata)
xy.plot(row1, mydata$OUTCOME, case.lab = TRUE, labs = rownames(mydata))
c 149
ttoutcome$excluded
40
Eva Thomann and Stefan Wittwer April 17
The row.dom option for logical minimization, if set to TRUE, is used to further eliminate
redundant prime implicants when solving the PI chart, applying the principle of row
dominance: if a prime implicant X covers the same configurations as another prime implicant
Y and in the same time covers other configurations which Y does not cover, then Y is
redundant and eliminated. By setting all.sol=TRUE, you can derive all possible solutions,
irrespective of the number of prime implicants.
To obtain a subset of the solution space, set row.dom=TRUE and all.sol=FALSE. This is
the default setting that the QCA package implements if you do not specify these two options
(as done here in this manual). Conversely, for revealing the full extent of model ambiguity,
set row.dom=FALSE and all.sol=TRUE (see Baumgartner 2015 and Baumgartner and
Thiem 2015). The usage of all.sol = TRUE does not represent the opinion of the QCA
package author, where the default option is FALSE.
By presenting the templates using the default options, we do not intend to make a
recommendation. It is good to be aware of the fact that there are often many possible
solutions, and that you have several possibilities how to deal with this.
To derive an intermediate solution using Standard Analysis (Ragin 2008), we may want to
specify directional expectations. Just like with the parsimonious solution, we tell the software
to include simplifying assumptions; but then we specify the directional expectations (dir.exp)
for each condition in the same order as the conditions were listed when creating the truth table
(see section 7.2). In the example, we assume that condition 1 contributes to the outcome when
present (1); condition 2 contributes to the outcome when absent (0); and we have no directional
expectation (“-“) for condition 3.
c 150
isOUTCOME <- eqmcc(ttOUTCOME, include = "?", details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE, dir.exp = "1, 0, -")
isOUTCOME
implausible (the “pregnant man”). You have several possibilities to do this. In the running
text, we discuss two of them; see Box 10 for more options.
We can build a new truth table where we simply tell the software to code those rows with
outcome 0 that display a certain configuration of conditions – we can do this only for logical
remainders, or for all truth table rows, whether they are empirically observed or logical
remainders. This possibility, which we apply in class, is an easy and transparent way if you
think that neither your observations nor your simplifying assumptions should contradict prior
findings and / or logic, and you want to see what you did to the truth table before turning to
logical minimization.
In a first step, we build the truth table (here: for the negated outcome and for 5 conditions) that
also displays the logical remainders (complete=TRUE):
c 151
ettoutcome <- truthTable(mydata, outcome="~OUTCOME",
conditions = "COND1, COND2, COND3, COND4, COND5",
incl.cut=0.8, n.cut=1, sort.by="incl, n", decreasing=TRUE,
complete=TRUE, show.cases=TRUE)
ettoutcome
Now we tell the software which truth table rows we want to exclude. In the example, we want
to exclude the configurations “cond1*COND2 + COND1*COND2*cond3” that were sufficient
for “OUTCOME” and hence, constitute an untenable assumption.
Furthermore, we found that “COND4 + COND5” is necessary for “outcome”. Hence, we want
to exclude the negation of this necessary condition (“cond4*cond5”) from the sufficient
conditions for outcome.
In a next step, you identify all truth table rows that display these configurations (package:
QCA). You can use either lowercase notation or the tilde sign to negate the conditions
(remember: label your conditions with uppercase letters to indicate their presence, already when
calibrating). You have to use the Boolean AND (*) for conjunctions. You can choose if you
want to do this only for logical remainders (remainders = TRUE) or for all truth table rows,
also empirically observed ones (remainders = FALSE).
42
Eva Thomann and Stefan Wittwer April 17
c 152
rows <- findRows("cond1*COND2 + COND1*COND2*cond3 + cond4*cond5",
ettoutcome, remainders = FALSE)
rows
This gives you the numbers of all the truth table rows that display one of these combinations.
If you have identified a necessary condition, and you don’t want to employ deMorgan’s law
yourself when negating it, R’s deMorgan() command can do that for you (package: QCA).
You can use either the tilde or lowercase notation for negated conditions, so long as you are
consistent. Equally, you can either use the Boolean AND (*) for conjunctions, or you can
skip the *, but be consistent. Obviously, you do not have to do this if you did not find a
necessary condition.
c 153
nec <- deMorgan("COND4 + COND5")
nec
This will give you the untenable assumption, cond4*cond5.
Note that, if you identified several (single or compound) necessary conditions, in order to
avoid that they ”disappear” from the sufficient solution term, you have to assume that they
are connected with the logical AND – although with less-than-perfect subset relations, their
conjunction may empirically no longer pass the consistency threshold for necessity. So, if
you found that A Y and that B+C Y, type:
c 154
nec <- deMorgan("A * (B + C)")
nec
After that, you can first code these truth table rows with outcome 0:
c 155
In the new truth table, these truth table rows should now have the outcome 0.
Now, using the new, coded truth table, we can use the eqmcc command to calculate the
enhanced conservative solution:
43
Eva Thomann and Stefan Wittwer April 17
c 156
The enhanced parsimonious solution (which, in fact, is one of many possible intermediate
solutions):
c 157
c 158
Sometimes you may want to exclude only empirically observed rows, or only logical
remainders; or you do not wish to apply the same rules to them. For example, you want to 1)
exclude remainders that contain a path of the solution for OUTCOME “cond1*COND2 +
COND1*COND2*cond3”; 2) exclude remainders that contradict the necessary condition for
outcome, “COND4”; and 3) exclude truth table rows number 5 and 6, which contain cases that
you do not want to include into the analysis.
First, you will build the truth table (in the example, again, for the negated outcome):
c 159
ttoutcome <- truthTable(mydata, outcome="~OUTCOME",
conditions = "COND1, COND2, COND3, COND4, COND5",
incl.cut=0.8, n.cut=1, sort.by="incl, n", decreasing=TRUE,
complete=TRUE, show.cases=TRUE)
ttoutcome
Then, you can use the esa() function to specify these coding rules (package: SetMethods).
Below we use the above truth table (oldtt=ttoutcome), and code it into a new enhanced truth
44
Eva Thomann and Stefan Wittwer April 17
table ettoutcome.
The option “nec_con” allows you to simply insert the necessary condition; R automatically
negates it for you without further ado. The option imposs_LR allows you to insert your
untenable assumptions (note: these can be impossible remainders or otherwise contradictory
assumptions – despite its label, all that this option does is to code these logical remainders with
outcome 0). For both options, use the tilde sign for negating conditions (lowercase letters do
not work), and use the * sign for the logical AND. These two options are only applied to logical
remainders, not to empirically observed rows. Finally, the option contrad_rows allows you to
exclude specific empirically observed rows. You will have to identify their number first, and
insert the respective truth table row numbers - you cannot apply a “rule” here.
c 160
ettoutcome <- esa(oldtt = ttoutcome, nec_cond = "COND4", imposs_LR =
"~COND1*COND2 + COND1*COND2*~COND3", contrad_rows=c("5", "6"))
ettoutcome
If you have more than one necessary condition, you can enter them like this, for example:
nec.cond= c("COND1", "COND2"). Negated necessary conditions are combined with the
logical OR; that is, it is assumed that the necessary conditions are combined with the logical
AND. (Note however that with less-than-perfect subset relations, sometimes the conjunction of
two individually necessary conditions may actually no longer pass the consistency threshold
for necessity). Equally, you can enter compound necessary conditions. For example, if your
necessary condition is COND1 + COND2, enter: nec.cond= "COND1 + COND2".
Based on this new truth table, you can then perform logical minimization, see commands c 156,
c 157 and c 158.
45
Eva Thomann and Stefan Wittwer April 17
Box 10: Alternative options for flexibly coding and omitting truth table rows
c 161
ttoutcome <- truthTable(mydata, outcome="~OUTCOME",
conditions = "COND1, COND2, COND3, COND4, COND5",
incl.cut=0.8, n.cut=1, sort.by="incl, n", decreasing=TRUE,
complete=TRUE, show.cases=TRUE)
ttoutcome
Identify those rows that you wish to omit (see example above):
c 162
rows <- findRows("cond1*COND2 + COND1*COND2*cond3 + cond4*cond5",
ttoutcome, remainders = FALSE)
rows
Again, this can be done for all truth table rows (remainders = FALSE), or only for logical
remainders (remainders = TRUE).
Omit these rows when calculating the enhanced conservative solution:
c 163
ecsoutcome <- eqmcc(ttoutcome, details=TRUE, show.cases=TRUE, row.dom=TRUE,
all.sol=FALSE, omit = rows)
You can also use omit for calculating the enhanced intermediate and the enhanced
parsimonious solution:
c 164
eisoutcome <- eqmcc(ttoutcome, include = "?", details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE, dir.exp = "0, 1, -", omit = rows)
eisoutcome
c 165
epsoutcome <- eqmcc(ttoutcome, include="?", details=TRUE, show.cases=TRUE,
row.dom=TRUE , all.sol=FALSE, omit = rows)
epsoutcome
46
Eva Thomann and Stefan Wittwer April 17
c 166
ttoutcome <- truthTable(mydata, outcome="~OUTCOME",
conditions = "COND1, COND2, COND3, COND4, COND5",
incl.cut=0.8, n.cut=1, sort.by="incl, n", decreasing=TRUE,
complete=TRUE, show.cases=TRUE)
ttoutcome
By checking the truth table, you can identify those rows you wish to exclude – let us say
these are rows 8, 13 and 27. Now you have two options.
You can code these rows with outcome 0 in the truth table – which enables you to visually
check the truth table first:
c 167
And then you can calculate the different solution terms based on this new truth table, without
using the omit option; see commands c 156, c 157 and c 158.
If you want to skip this step, you can also directly calculate a new solution that omits these
rows from the truth table during logical minimization. Here we omit rows 8, 13 and 27 and
calculate the enhanced conservative solution:
c 168
ecsoutcome <- eqmcc(ttoutcome, details=TRUE, show.cases=TRUE, row.dom=TRUE,
all.sol=FALSE, omit = c(8, 13, 27))
You can do this for the enhanced intermediate and the enhanced parsimonious solution:
c 169
eisoutcome <- eqmcc(ttoutcome, include = "?", details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE, dir.exp = "0, 1, -", omit = c(8, 13, 27))
eisoutcome
c 170
epsoutcome <- eqmcc(ttoutcome, include="?", details=TRUE, show.cases=TRUE,
row.dom=TRUE , all.sol=FALSE, omit = c(8, 13, 27))
epsoutcome
47
Eva Thomann and Stefan Wittwer April 17
With R we can easily identify the simplifying assumptions, that is, all remainders that were
assumed to contribute to the outcome for the parsimonious solution that we calculated above.
c 171
SAOUTCOME <- psOUTCOME$SA
SAOUTCOME
This will give you a table with all remainders that were assumed to be sufficient for the outcome
(for each parsimonious model, if there was ambiguity). If you had several models due to
ambiguity, then you can specify for which model you opt. You should choose the model that
also was the basis for your intermediate solution (the parsimonious model that was used will be
shown in upper part of the output when you calculate the intermediate solution). For example,
if your intermediate solution is based on the 8th parsimonious solution, then specify
c 172
psOUTCOME$SA$M08.
Now, not all of these simplifying assumptions are also necessarily easy counterfactuals, used
for calculating your intermediate solution (here: eisOUTCOME). The easy counterfactuals can
be obtained as follows:
c 173
ECOUTCOME <- eisOUTCOME$i.sol$C1P1$EC
ECOUTCOME
Here we calculated the easy counterfactuals for the intermediate model 1. If we had several
intermediate solutions, we could also do so, e.g., for model 8:
c 174
ECOUTCOME <- eisOUTCOME$i.sol$C1P8$EC
Having done this, we can check whether all simplifying assumptions are also easy
counterfactuals:
c 175
identical(rownames(ECOUTCOME), rownames(SAOUTCOME))
If this is not the case (FALSE), we can identify which counterfactuals are both simplifying and
easy:
48
Eva Thomann and Stefan Wittwer April 17
c 176
intersect(rownames(ECOUTCOME), rownames(SAOUTCOME))
This only works if you have chosen the simplifying assumptions and easy counterfactuals of a
specific model.
With the same command, we could also identify whether some logical remainders were used
both for the calculation of the solution for the positive outcome and for the solution for its
negation (we would first have to calculate the simplifying assumptions SAoutcome and easy
counterfactuals ECoutcome for the negated outcome, of course). We could see whether we
made such untenable assumptions for the parsimonious solution, and which rows are concerned,
by intersecting the simplifying assumptions for the positive outcome with those for the negated
outcome:
c 177
intersect(rownames(SAOUTCOME), rownames(SAoutcome))
If this reveals such contradictory assumptions, we could then check whether we successfully
eliminated this problem for the enhanced intermediate solution, by intersecting the easy
counterfactuals for the positive outcome with those for the negated outcome:
c 178
intersect(rownames(ECOUTCOME), rownames(ECoutcome))
A tool is the sop() function (package: QCA). It will help you to minimize any logical
expression into its simplest equivalent logical expression:
c 179
The function factorize() will give you all the possible ways in which elements of the results
can be factored out (package: QCA):
c 180
factorize ("Ac + Abc + bc + abC + ab", snames = "A, C, B")
snames can be useful here to ensure that a certain order of the conditions is preserved; without
specifiying snames, the conditions are listed alphabetically.
49
Eva Thomann and Stefan Wittwer April 17
You can also use the factorize function directly on a solution term, for example:
c 181
csOUTCOME <- eqmcc(ttOUTCOME, details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
factorize(csOUTCOME)
c 182
####Analysis of sufficiency#######
# Build the truth table and set consistency threshold
ttOUTCOME <- truthTable(mydata, outcome="OUTCOME",
conditions = "COND1, COND2, COND3",
incl.cut=0.828, n.cut=1, sort.by="incl, n", decreasing=TRUE,
complete=TRUE, show.cases=TRUE)
ttOUTCOME
50
Eva Thomann and Stefan Wittwer April 17
# Parsimonious solution
psOUTCOME <- eqmcc(ttOUTCOME, include="?", details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE)
psOUTCOME
# Intersect easy counterfactuals used for intermediate solution for positive and negative
outcome
intersect(rownames(ECOUTCOME), rownames(ECoutcome))
51
Eva Thomann and Stefan Wittwer April 17
8 Advanced stuff
Schneider and Wagemann (2012: 244-250) outline how the clustering of cases in particular
intersecting areas of the two diagonals of the XY plot can lead to flawed causal conclusions
(see also Cooper and Glaesser 2011). To check if this is a problem in your analysis, you can
easily produce such plots yourself, indicating the percentage of case that cluster in each
intersection. Just use the command below – you can e.g. copy-paste the template into a word
file and search and replace the items marked bold with the features of your own analysis.
First, you will have to calculate the cases’ membership in the particular condition set
(configuration, solution term, path of the solution term… here: “MYSET”) for which you want
to perform the test, see section 4.3. Then you calculate the percentage of cases situated in each
of the four intersecting areas (myN denotes the number of cases).
c 183
A1 <- round(sum(as.numeric((MYSET <= mydata$OUTCOME) & (MYSET >= (1-
mydata$OUTCOME))))/(myN/100), digits = 1)
A2 <- round(sum(as.numeric((MYSET >= mydata$OUTCOME)&(MYSET >= (1-
mydata$OUTCOME))))/( myN /100), digits = 1)
A3 <- round(sum(as.numeric((MYSET >= mydata$OUTCOME)&(MYSET <= (1-
mydata$OUTCOME))))/( myN /100), digits=1)
A4 <- round(sum(as.numeric((MYSET <= mydata$OUTCOME)&( MYSET <= (1-
mydata$OUTCOME))))/( myN /100), digits = 1)
Then you produce the plot with the two diagonals and integrate these percentages.
c 184
plot(mydata$OUTCOME, MYSET, pch=18, xlim=c(0, 1), ylim=c(0,1), col="black",
xaxs="i", yaxs="i",
main= ' MYSET ',
xlab= ' condition ',
ylab=' OUTCOME ')
abline(0,1, col="black")
abline(1, -1, col="black")
text(x=0.5, y=0.8, labels = A1)
text(x=0.8, y=0.5, labels = A2)
text(x=0.5, y=0.2, labels = A3)
text(x=0.2, y=0.5, labels = A4)
52
Eva Thomann and Stefan Wittwer April 17
Note that, if some cases are placed exactly on one of the diagonals, the percentages will add up
to more than 100.
Below you can find ways how to do several steps of formal theory evaluation with R (see Ragin
1987, Schneider and Wagemann 2012)
To perform the Boolean intersections of the hypotheses, the results, and their negations (Ragin
1987), you can use the commands intersection() and deMorgan() offered by QCA. At the end
of this step, you will have obtained the intersection of your hypotheses T with your solution S
(T*S), the intersection of the theory with what you did not observe (T*s), the intersection of
what you did not expect with what you observed (t*S), and what you neither expected not
observed (t*s).
Negate a Boolean expression – for example, we negate our hypotheses T: "~A*B + C*A". To
use these objects later combined with the theory.evaluation function, use the tilde for negated
conditions and use the Boolean AND (*) for conjunctions. The resulting expression “t” here are
those results that we should not observe according to theory:
c 185
t <- deMorgan("~A*B + C*A")
t
You can also use this command to directly negate the solutions you obtained (see 7.2-7.4). In
the example here, the solution has been stored as an object labeled “csOUTCOME”:
c 186
s <- deMorgan(csOUTCOME)
s
In the example, this would give you everything that you did not observe in your results.
You can then intersect different statements (package: QCA). Here you have to insert the full
Boolean expression inside the quotation marks – you cannot work with objects such as solution
terms. Use the snames option to specify the names of the conditions you use.
c 187
53
Eva Thomann and Stefan Wittwer April 17
This command can only intersect two expressions and not more at once.
In order to figure out whether and how these intersections reflect your hypotheses, you can use
the factorize function to group the intersections, see command c 180.
Schneider and Wagemann (2012: 300-305) show how accounting for coverage can add leverage
to these evaluations. Here we show how you can calculate the number and percentage of cases
in certain intersections (but this is not the coverage measure – see below). Once you have
obtained the relevant intersections, you can calculate the cases’ membership in these
intersections (package: SetMethods). 5
The theory.evaluation() function (package: SetMethods) offers a fast way for calculating the
cases’ membership in the four intersections. However, it will not enable you to check for
membership in the single configurations, and it is mainly useful for being used in combination
with the cases.theory.evaluation() and theory.fit() functions. On its own, it goes like this. It
relies on an object that is your solution as calculated with eqmcc() (here: csOUTCOME), and
you will have to store your hypotheses (here: myset1*MYSET2 + myset3*MYSET4*myset5
+ MYSET6) as an object, here: T.
c 188
T <- "~MYSET1*MYSET2 + ~MYSET3*MYSET4*~MYSET5 + MYSET6"
TE <- theory.evaluation(theory= T, empirics = csOUTCOME, outcome = “OUTCOME”,
intermed = FALSE, sol = 1)
TE
Negate sets using the tilde sign, and specify intermed = TRUE if you are using the intermediate
solution. The option sol = 1 can be used to specify which model you are referring to in case of
model ambiguities (e.g., specify sol=2 if you want to use model 2). It will give you a list of the
cases’ membership in the solution, the theory, and the different intersections.
In a next step, you want to identify the cases in these intersections and calculate the percentage
5
Each of these intersections may consist of several configurations. Some of these may cover logical remainders
and hence, not represent empirical evidence. The SetMethods package currently does not allow you to check this.
If you need a code for this, contact the author of this manual.
54
Eva Thomann and Stefan Wittwer April 17
of all cases that display the outcome, and the percentage of cases that display its negation, in
each of the four intersections. For fuzzy sets, this means that the cases’ membership in the set
is higher or smaller than 0.5. This is done using the cases.theory.evaluation() function.
c 189
cases.theory.evaluation(TE)
This will give you a list of the covered cases, including information on whether they are most
or least likely cases for the posited set relation, whether they have membership above or below
0.5 in the outcome, and their percentage relative to the total number of cases (package:
SetMethods).
The SetMethods package also offers you the possibility to calculate the parameters of fit
(consistency, coverage, PRI, Haesebrouck’s consistency) of the different intersections (see
Schneider and Wagemann 2012):
c 190
theory.fit(TE)
Note however that these parameters must be interpreted cautiously, because the cases’
membership in these intersections is typically highly skewed. The parameters of fit will not
necessarily tell you beyond any doubt if an intersection is populated by deviant cases
consistency in kind, for example. Hence, reporting the percentage of cases with membership
above or below 0.5 in the outcome may be an attractive or even necessary complement to the
parameters of fit.
The SetMethods package allows for a targeted identification of most typical, most deviant and
individually irrelevant cases (see Schneider and Rohlfing 2013) for the analysis of of
sufficiency. You need to have the results stored in an object, for example, the parsimonious
solution psOUTCOME and the intermediate solution isoutcome for the negated outcome.
Identify the deviant cases consistency for sufficiency for the parsimonious solution for
OUTCOME. Set intermed=FALSE and neg.out=FALSE:
c 191
cases.suf.dcn(results = psOUTCOME, outcome = "OUTCOME", neg.out=FALSE,
intermed=FALSE, sol=1)
55
Eva Thomann and Stefan Wittwer April 17
Equally, you could do that for the conservative solution. This will give you a list of the cases
that have membership in the different paths of the solution term, their membership in these,
their membership in the outcome set, and whether they are the most deviant cases (TRUE or
FALSE). The sol= option allows you to specify which model you want to choose, in case of
model ambiguities (e.g., sol=3 would give you model 3).
You can also do this for an intermediate solution. Set intermed=TRUE. In the example here,
the outcome is negated (neg.out=TRUE):
c 192
cases.suf.dcn(results = isoutcome, outcome = "OUTCOME", neg.out=TRUE,
intermed=TRUE, sol=1)
Along the same lines, you can identify the deviant cases for coverage sufficiency, here, for
psOUTCOME:
c 193
cases.suf.dcv(results = psOUTCOME, outcome = "OUTCOME", neg.out=FALSE,
intermed=FALSE, sol=1)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
c 194
cases.suf.iir(results = psOUTCOME, outcome = "OUTCOME", neg.out=FALSE,
intermed=FALSE, sol=1)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
c 195
cases.suf.typ.most(results = psOUTCOME, outcome = "OUTCOME", neg.out=FALSE,
intermed=FALSE, sol=1)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
Identify uniquely typical cases for sufficiency (those cases that are not also covered by another
path of the solution term):
c 196
cases.suf.typ.unique(results = psOUTCOME, outcome = "OUTCOME", neg.out=FALSE,
intermed=FALSE, sol=1)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
56
Eva Thomann and Stefan Wittwer April 17
Most conveniently, the SetMethods package also allows you to match cases for targeted post-
QCA case analysis (process tracing), see Schneider and Rohlfing (2013). For this, you will need
to have your results of sufficiency stored in an object (here: the parsimonious solution
psOUTCOME).
Match deviant cases for coverage sufficiency with individually irrelevant cases:
c 197
matches.suf.dcviir(results = psOUTCOME, outcome = "OUTCOME", neg.out=FALSE,
intermed=FALSE, sol=1)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
Match typical cases for consistency sufficiency with deviant cases for consistency sufficiency:
c 198
matches.suf.typdcn(results = psOUTCOME, outcome = "OUTCOME", neg.out=FALSE,
intermed=FALSE, sol=1)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
Match typical cases for consistency and individually irrelevant cases for each path in a sufficient
solution term:
c 199
matches.suf.typiir(results = psOUTCOME, outcome = "OUTCOME", term=1,
neg.out=FALSE, intermed=FALSE, sol=1, max_pairs=5)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
With the term = option you can specify which path of the solution term you are interested in.
Match typical cases for consistency for each path in a sufficient solution term:
c 200
matches.suf.typtyp(results = psOUTCOME, outcome = "OUTCOME", term=1,
neg.out=FALSE, intermed=FALSE, sol=1, max_pairs=5)
Set intermed=TRUE for an intermediate solution, and neg.out=TRUE for negating the outcome.
With the term = option you can specify which path of the solution term you are interested in.
References:
The SetMethods package allows you to use QCA on panel data or other clustered data. The
theory and formulae underlying these functions are in Garcia-Castro and Ariño (2016). So don’t
believe anyone who tells you that QCA cannot be used on panel data!
The trick is to first calculate a solution (here: isOUTCOME) on the whole dataset. For example:
c 201
isOUTCOME <- eqmcc(ttOUTCOME, include = "?", details=TRUE, show.cases=TRUE,
row.dom=TRUE, all.sol=FALSE, dir.exp = "1, 0, -")
isOUTCOME
Obviously you can also perform ESA first, see section 7.4.
You can then get the pooled, within, and between consistencies for this solution. You simply
have to specify the variable that constitutes your units of analysis (here: COUNTRY) and the
variable that constitutes your clustering units (here: YEAR). If you have model ambiguity, you
could specify which model you want, for example, model 2 would be sol=2.
c 202
cluster.eqmcc(results= isOUTCOME, data=mydata, outcome= "OUTCOME",
unit_id="COUNTRY", cluster_id="YEAR", intermed=TRUE, sol=1)
The pooled consistency indicates the overall consistency observed in the sample when time and
individual effects are not taken into account. The between consistency is a measure of the cross-
sectional consistency for each year t in the panel. The within consistency measures how
consistent the set-subset relationship is across time for each particular case in the sample, in
other words, the longitudinal consistency of the set-subset connection for each individual i in
the panel over time (Garcia-Castro and Ariño 2016).
58
Eva Thomann and Stefan Wittwer April 17
References
Baumgartner, M. 2015. Parsimony and Causality. Quality & Quantity 49: 839-856.
Baumgartner, M., & A. Thiem, A. 2015. Model Ambiguities in Configurational Comparative
Research. Sociological Methods & Research. Advance online publication. DOI:
10.1177/0049124115610351.
Garcia-Castro, R.C. & M.A. Ariño. 2016. A General Approach to Panel Data Set-Theoretic
Research. Journal of Advances in Management Sciences & Information Systems 2: 63-
76.
Cooper, B. & J. Glaesser. 2011. Paradoxes and pitfalls in using fuzzy set QCA: Illustrations
from a critical review of a study of educational inequality. Sociological Research Online
16(3): 1-18.
Dusa, A. 2007. User Manual for the QCA(GUI) Package in R. Journal of Business Research
60(5):576-86.
Goertz, G. 2006. Assessing the Trivialness, Relevance, and Relative Importance of Necessary
or Sufficient Conditions in Social Science. Studies in Comparative International
Development 41(2): 88-109.
Haesebrouck, T. 2015. Pitfalls in QCA’s Consistency Measure. Journal of Comparative
Politics 2:65-80.
Hino, A. 2009. Time-Series QCA. Sociological Theory and Methods 24 (2): 247-265.
Hinterleitner, M., Sager, F. & E. Thomann. 2016. The Politics of External Approval: Explaining
the IMF’s Evaluation of Austerity Programs. European Journal of Political Research.
Advance online publication. DOI: 10.1111/1475-6765.12142.
Maggetti, M. & D. Levi-Faur. 2013. Dealing with Errors in QCA. Political Research Quarterly
66(1): 198-204.
Medzihorsky, J., Oana, I., Quaranta, M. and C.Q. Schneider. 2017. SetMethods: A Package
Companion to "Set-Theoretic Methods for the Social Sciences". R Package Version 2.1.
URL: http://cran.r-project.org/package=SetMethods.
Ragin, C.C. 1987. The Comparative Method: Moving Beyond Qualitative and Quantitative
Strategies. Berkeley and Los Angeles: University of California Press
Ragin, C.C. 2000. Fuzzy-Set Social Science. Chicago and London: University of Chicago Press.
Ragin, C.C. 2006. “Set Relations in Social Research: Evaluating Their Consistency and
Coverage.” Political Analysis 14(3): 291-310.
Ragin, C.C. 2008a. Easy Versus Difficult Counterfactuals. Redesigning Social Inquiry: Set
Relations in Social Research. Chicago: University of Chicago Press, chapter 9.
Ragin, C.C. 2008b. Measurement versus calibration: a set-theoretic approach. The Oxford
handbook of political methodology. Oxford Handbooks Online: 174-198.
Ragin, C.C. 2009. Qualitative Comparative Analysis Using Fuzzy Sets (fsQCA).
Configurational Comparative Methods. Qualitative Comparative Analysis (QCA) and
Related Techniques. Los Angeles, London, New Delhi and Singapore: Sage
Publications, 87-121.
Rihoux, B. & C.C. Ragin. Configurational Comparative Methods. Qualitative Comparative
Analysis (QCA) and Related Techniques. Los Angeles, London, New Delhi and
59
Eva Thomann and Stefan Wittwer April 17
60