This is the html version of the file https://arxiv.org/abs/2401.16215.
Google automatically generates html versions of documents as we crawl the web.
These search terms have been highlighted: learning big logical rules joining small rules
arXiv:2401.16215v1 [cs.LG] 29 Jan 2024
Page 1
Learning Big Logical Rules by Joining Small Rules
Céline Hocquette1, Andreas Niskanen2, Rolf Morel1, Matti Järvisalo2, Andrew Cropper1
1University of Oxford
2University of Helsinki
[email protected]
Abstract
A major challenge in inductive logic programming
is learning big rules. To address this challenge, we
introduce an approach where we join small rules to
learn big rules. We implement our approach in a
constraint-driven system and use constraint solvers
to efficiently join rules. Our experiments on many
domains, including game playing and drug design,
show that our approach can (i) learn rules with more
than 100 literals, and (ii) drastically outperform ex-
isting approaches in terms of predictive accuracies.
1 Introduction
Zendo is an inductive reasoning game. One player, the teacher,
creates a secret rule that describes structures. The other play-
ers, the students, try to discover the secret rule by building
structures. The teacher marks whether structures follow or
break the rule. The first student to correctly guess the rule wins.
For instance, for the positive examples shown in Figure 1, a
possible rule is “there is a blue piece”.
Figure 1: Two positive Zendo examples.
To use machine learning to play Zendo, we need to learn
explainable rules from a small number of examples. Although
crucial for many real-world problems, many machine-learning
approaches struggle with this type of learning [Cropper et
al., 2022]. Inductive logic programming (ILP) [Muggleton,
1991] is a form of machine learning that can learn explainable
rules from a small number of examples. For instance, for the
examples in Figure 1, an ILP system could learn the rule:
h1 = 1 zendo(S) ← piece(S,B), blue(B) l
This rule says that the relation zendo holds for the structure S
when there is a blue piece B.
Suppose we also have the three negative examples shown
in Figure 2. Our previous rule incorrectly entails the first and
second negative examples, as they have a blue piece. To entail
Figure 2: Three negative Zendo examples.
all the positive and no negative examples, we need a bigger
rule, such as:
h2 =
{ zendo(S) ← piece(S,B), blue(B),
piece(S,R), red(R),
piece(S,G), green(G)
}
This rule says that the relation zendo holds for a structure
when there is a blue piece, a red piece, and a green piece.
Most ILP approaches can learn small rules, such as h1.
However, many struggle to learn bigger rules, such as h2
1.
This limitation noticeably holds for approaches which pre-
compute all possible rules and search for a subset of them
[Corapi et al., 2011; Law et al., 2014; Kaminski et al., 2019;
Si et al., 2019; Raghothaman et al., 2020; Evans et al., 2021;
Bembenek et al., 2023]. As they precompute all rules, these
approaches struggle to learn rules with more than a few literals.
To address this limitation, we introduce an approach that
learns big rules by joining small rules. The idea is to first
find small rules where each rule entails some positive and
some negative examples. We then search for conjunctions of
these small rules such that each conjunction entails at least
one positive example but no negative examples.
To illustrate our idea, consider our Zendo scenario. We first
search for rules that entail at least one positive example, such
as:
r1 = 1 zendo1(S) ← piece(S,B), blue(B) l
r2 = 1 zendo2(S) ← piece(S,R), red(R) l
r3 = 1 zendo3(S) ← piece(S,G), green(G) l
r4 = 1 zendo4(S) ← piece(S,Y), yellow(Y) l
Each of these rules entails at least one negative example.
Therefore, we search for a subset of these rules where the
intersection of the logical consequences of the subset entails
at least one positive example and no negative examples. The
1A rule of size 7 is not especially big but for readability we do
not use a bigger rule in this example. In our experiments, we show
we can learn similar rules with over 100 literals.
arXiv:2401.16215v1 [cs.LG] 29 Jan 2024

Page 2
set of rules 1r1,r2,r3l satisfies these criteria. We therefore
form a hypothesis from the conjunction of these rules:
h3 = 1 zendo(S) ← zendo1(S), zendo2(S), zendo3(S) l
The hypothesis h3 entails all the positive but none of the
negative examples and has the same logical consequences
(restricted to zendo/1 atoms) as h2.
The main benefit of our approach is that we can learn rules
with over 100 literals, which existing approaches cannot. Our
approach works because we decompose a learning task into
smaller tasks that can be solved separately. For instance, in-
stead of directly searching for a rule of size 7 to learn h2, we
search for rules of size 3 (r1 to r3) and try to join them to
learn h3. As the search complexity of ILP is usually expo-
nential in the size of the program to be learned, this reduction
can substantially improve learning performance. Moreover,
because we can join small rules to learn big rules of a cer-
tain syntactic form, we can eliminate splittable rules from the
search space. We formally define a splittable rule in Section 4,
but informally the body of a splittable rule can be decomposed
into independent subsets, such as the body of h2 in our Zendo
example.
To explore our idea, we build on learning from failures
(LFF) [Cropper and Morel, 2021], a constraint-driven ILP
approach. We extend the LFF system COMBO [Cropper and
Hocquette, 2023] with a join stage to learn programs with big
rules. We develop a Boolean satisfiability (SAT) [Biere et al.,
2021] approach to find conjunctions in the join stage. We call
our implementation JOINER.
Novelty and Contributions. Our main contribution is the
idea of joining small rules to learn big rules which, as our
experiments on many diverse domains show, can improve
predictive accuracies. Overall, our main contributions are:
1. We introduce an approach which joins small rules to learn
big rules.
2. We implement our approach in JOINER, which learns
optimal (textually minimal) and recursive programs. We
prove the correctness of JOINER (Theorem 1).
3. We experimentally show on several domains, including
game playing, drug design, and string classification, that
our approach can (i) learn rules with more than 100 liter-
als, and (ii) drastically outperform existing approaches in
terms of predictive accuracy.
2 Related Work
Program synthesis. Several approaches build a program
one token at a time using an LSTM [Devlin et al., 2017;
Bunel et al., 2018]. CROSSBEAM [Shi et al., 2022] uses a
neural model to generate programs as the compositions of
seen subprograms. CROSSBEAM is not guaranteed to learn
a solution if one exists and can only use unary and binary
relations. By contrast, JOINER can use relations of any arity.
Rule mining. AMIE+ [Galárraga et al., 2015] is a promi-
nent rule mining approach. In contrast to JOINER, AMIE+
can only use unary and binary relations and struggles to learn
rules with more than 4 literals.
ILP. Top-down ILP systems [Quinlan, 1990; Blockeel and
De Raedt, 1998] specialise rules with refinement operators
[Shapiro, 1983]. Because they learn a single rule at a time
and add a single literal at each step, these systems struggle
to learn recursive and optimal programs. Recent approaches
overcome these issues by formulating the search as a rule
selection problem [Corapi et al., 2011; Evans and Grefenstette,
2018; Kaminski et al., 2019; Si et al., 2019; Raghothaman et
al., 2020; Evans et al., 2021; Bembenek et al., 2023]. These
approaches precompute all possible rules (up to a maximum
rule size) and thus struggle to learn rules with more than a few
literals. By contrast, we avoid enumeration and use constraints
to soundly prune rules. Moreover, our join stage allows us to
learn rules with more than 100 literals.
Many rules. COMBO [Cropper and Hocquette, 2023]
searches for a disjunction of small programs that entails all
the positive examples. COMBO learns optimal and recursive
programs and large programs with many rules. However, it
struggles to learn rules with more than 6 literals. Our approach
builds on COMBO and can learn rules with over 100 literals.
Big rules. Inverse entailment approaches [Muggleton, 1995;
Srinivasan, 2001] can learn big rules by returning bottom
clauses. However, these approaches struggle to learn optimal
and recursive programs and tend to overfit. Ferilli [2016]
specialises rules in a theory revision task. This approach can
learn negated conjunctions but only Datalog preconditions
and not recursive programs. BRUTE [Cropper and Dumancic,
2020] can learn recursive programs with hundreds of literals.
However, in contrast to our approach, BRUTE needs a user-
provided domain-specific loss function, does not learn optimal
programs, and can only use binary relations.
Splitting rules. Costa et al. [2003] split rules into conjunc-
tions of independent goals that can be executed separately to
avoid unnecessary backtracking and thus to improve execution
times. By contrast, we split rules to reduce search complexity.
Costa et al. allow joined rules to share variables, whereas we
prevent joined rules from sharing body-only variables.
3 Problem Setting
We now describe our problem setting. We assume familiarity
with logic programming [Lloyd, 2012] but have stated relevant
notation and definitions in the appendix.
We use the learning from failures (LFF) [Cropper and Morel,
2021] setting. We restate some key definitions [Cropper and
Hocquette, 2023]. A hypothesis is a set of definite clauses with
the least Herbrand model semantics. We use the term program
interchangeably with the term hypothesis. A hypothesis space
E is a set of hypotheses. LFF assumes a language L that
defines hypotheses. A LFF learner uses hypothesis constraints
to restrict the hypothesis space. A hypothesis constraint is a
constraint (a headless rule) expressed in L. A hypothesis h is
consistent with a set of constraints C if, when written in L, h
does not violate any constraint in C. We call EC the subset of
E consistent with C. We define a LFF input and a solution:
Definition 1 (LFF input). A LFF input is a tuple
(E,B, E,C) where E = (E+,E) is a pair of sets of ground
atoms denoting positive (E+) and negative (E) examples, B

Page 3
is a definite program denoting background knowledge, E is a
hypothesis space, and C is a set of hypothesis constraints.
Definition 2 (Solution). For a LFF input (E,B, E,C), where
E = (E+,E), a hypothesis h ∈ EC is a solution when h
entails every example in E+ (Ve ∈ E+, B U h |= e) and no
example in E(Ve ∈ E, B U h ̸|= e).
Let cost : E ↦→ N be an arbitrary cost function that measures
the cost of a hypothesis. We define an optimal solution:
Definition 3 (Optimal solution). For a LFF input
(E,B, E,C), a hypothesis h ∈ EC is optimal when (i) h
is a solution, and (ii) Vh∈ EC, where his a solution,
cost(h) ≤ cost(h).
Our cost function is the number of literals in a hypothesis.
A hypothesis which is not a solution is a failure. For a
hypothesis h, the number of true positives (tp) and false nega-
tives (fn) is the number of positive examples entailed and not
entailed by h respectively. The number of false positives (fp)
is the number of negative examples entailed by h.
4 Algorithm
To describe JOINER, we first describe COMBO [Cropper and
Hocquette, 2023], which we build on.
COMBO. COMBO takes as input background knowledge,
positive and negative training examples, and a maximum hy-
pothesis size. COMBO builds a constraint satisfaction problem
(CSP) program c, where each model of c corresponds to a hy-
pothesis (a definite program). In the generate stage, COMBO
searches for a model of c for increasing hypothesis sizes. If no
model is found, COMBO increments the hypothesis size and
resumes the search. If there is a model, COMBO converts it to
a hypothesis h. In the test stage, COMBO uses Prolog to test h
on the training examples. If h is a solution, COMBO returns
it. Otherwise, if h entails at least one positive example and no
negative examples, COMBO saves h as a promising program.
In the combine stage, COMBO searches for a combination
(a union) of promising programs that entails all the positive
examples and is minimal in size. If there is a combination,
COMBO saves it as the best combination so far and updates
the maximum hypothesis size. In the constrain stage, COMBO
uses h to build constraints and adds them to c to prune models
and thus prune the hypothesis space. For instance, if h does
not entail any positive example, COMBO adds a constraint
to eliminate its specialisations as they are guaranteed to not
entail any positive example. COMBO repeats this loop until it
finds an optimal (textually minimal) solution or exhausts the
models of c.
4.1 Joiner
Algorithm 1 shows our JOINER algorithm. JOINER builds on
COMBO and uses a generate, test, join, combine, and constrain
loop to find an optimal solution (Definition 3). JOINER differs
from COMBO by (i) eliminating splittable programs in the
generate stage (line 5), (ii) using a join stage to build big rules
from small rules and saving them as promising programs (line
16), and (iii) using different constraints (line 20). We describe
these differences in turn.
Algorithm 1 JOINER
1 def Joiner(bk, E+, E, maxsize):
2 cons, to_join, to_combine = {}, {}, {}
3 bestsol, k = None, 1
4 while k ≤ maxsize:
5
h = generate_non_splittable(cons, k)
6
if h == UNSAT:
7
k += 1
8
continue
9
tp, fn, fp = test(E+, E, bk, h)
10
if fn == 0 and fp == 0:
11
return h
12
elif tp > 0 and fp == 0:
13
to_combine += {h}
14
elif tp > 0 and fp > 0:
15
to_join += {h}
16
to_combine += join(to_join, bestsol,E+, E, k)
17
combi = combine(to_combine, maxsize, bk, E)
18
if combi != None:
19
bestsol, maxsize = combi, size(bestsol)-1
20
cons += constrain(h, tp, fp)
21 return bestsol
4.2 Generate
In our generate stage, we eliminate splittable programs be-
cause we can build them in the join stage. We define a split-
table rule:
Definition 4 (Splittable rule). A rule is splittable if its body
literals can be partitioned into two non-empty sets such that
the body-only variables (a variable in the body of a rule but
not the head) in the literals of one set are disjoint from the
body-only variables in the literals of the other set. A rule is
non-splittable if it is not splittable.
Example 1. (Splittable rule) Consider the rule:
1 zendo(S) ← piece(S,R), red(R), piece(S,B), blue(B) l
This rule is splittable because its body literals can be par-
titioned into two sets 1piece(S,R), red(R)l and 1piece(S,B),
blue(B)l, with body-only variables 1Rl and 1Bl respectively.
Example 2. (Non-splittable rule) Consider the rule:
{ zendo(S) ← piece(S,R), red(R), piece(S,B),
blue(B), contact(R,B)
}
This rule is non-splittable because each body literal contains
the body-only variable R or B and one literal contains both.
We define a splittable program:
Definition 5 (Splittable program). A program is splittable if
and only if it has exactly one rule and this rule is splittable. A
program is non-splittable when it is not splittable.
We use a constraint to prevent the CSP solver from considering
models with splittable programs. The appendix includes our
encoding of this constraint. At a high level, we first identify
connected body-only variables. Two body-only variables A
and B are connected if they are in the same body literal, or
if there exists another body-only variable C such that A and

Page 4
C are connected and B and C are connected. Our constraint
prunes programs with a single rule which has (i) two body-only
variables that are not connected, or (ii) multiple body literals
and at least one body literal without body-only variables. As
our experiments show, eliminating splittable programs can
substantially improve learning performance.
4.3 Join
Algorithm 2 shows our join algorithm, which takes as input a
set of programs and their coverage (P), where each program
entails some positive and some negative examples, the best
solution found thus far (bestsol), the positive examples (E+),
the negative examples (E), and a maximum conjunction
size (k). It returns subsets of P, where the intersection of the
logical consequences of the programs in each subset entails at
least one positive example and no negative example. We call
such subsets conjunctions.
We define a conjunction:
Definition 6 (Conjunction). A conjunction is a set of pro-
grams with the same head literal. We call M(p) the least
Herbrand model of the logic program p. The logical conse-
quences of a conjunction c is the intersection of the logical
consequences of the programs in it: M(c) = np∈cM(p). The
cost of a conjunction c is the sum of the cost of the programs
in it: cost(c) = ∑p∈c cost(p).
Our join algorithm has two stages. We first search for con-
junctions that together entail all the positive examples, which
allows us to quickly find a solution (Definition 2). If we have
a solution, we enumerate all remaining conjunctions to guar-
antee optimality (Definition 3). In other words, at each call
of the join stage (line 16 in Algorithm 1), we either run the
incomplete or the complete join stage (line 3 or 5 in Algorithm
2). We describe these two stages.
Incomplete Join Stage
If we do not have a solution, we use a greedy set-covering
algorithm to try to cover all the positive examples. We initially
mark each positive example as uncovered (line 8). We then
search for a conjunction that entails the maximum number
of uncovered positive examples (line 11). We repeat this
loop until we have covered all the positive examples or there
are no more conjunctions. This step allows us to first find
conjunctions with large coverage and quickly build a solution.
However, this solution may not be optimal.
Complete Join Stage
If we have a solution, we find all remaining conjunctions to
ensure we consider all splittable programs and thus ensure
optimality. However, we do not need to find all conjunctions
as some cannot be in an optimal solution. If a conjunction
entails a subset of the positive examples entailed by a strictly
smaller conjunction then it cannot be in an optimal solution:
Proposition 1. Let c1 and c2 be two conjunctions which do not
entail any negative examples, c1 |= E+
1 , c2 |= E+
2 , E+
2 Ç E+
1 ,
and size(c1) < size(c2). Then c2 cannot be in an optimal
solution.
The appendix contains a proof for this result. Following this
result, our join stage enumerates conjunctions by increasing
Algorithm 2 Join stage
1 def join(P, bestsol, E+, E, k):
2 if bestsol == None:
3
return incomplete_join(P, E+, E)
4 else:
5
return complete_join(P, E+, E,k)
6
7 def incomplete_join(P, E+, E):
8 uncovered, conjunctions = E+, {}
9 while uncovered:
10
encoding = buildencoding(P, E+, E, conjunctions)
11
conj = conj_max_coverage(uncovered, encoding)
12
if not conj:
13
break
14
uncovered -= pos_entailed(conj)
15
conjunctions += {conj}
16 return conjunctions
17
18 def complete_join(P, E+, E, k):
19 conjunctions = {}
20 while True:
21
encoding = buildencoding(P, E+, E, conjunctions)
22
encoding += sizeconstraint(k)
23
τ = find_assignment(encoding)
24
if not τ:
25
break
26
while True:
27
assignment = cover_more_pos(encoding, τ)
28
if not assignment:
29
break
30
τ = assignment
31
conjunctions += {conjunction(τ)}
32 return conjunctions
size. For increasing values of k, we search for all subset-
maximal coverage conjunctions of size k, i.e. conjunctions
which entail subsets of the positive examples not included
between each other.
Example 3 (Join stage). Consider the positive examples
E+ = 1f([a, b, c, d]),f([c, b, d, e])l, the negative examples
E= 1f([c, b]),f([d, b]),f([a, c, d, e])l, and the programs:
p1 = 1 f(S) ← head(S,a) l
p2 = 1 f(S) ← last(S,e) l
p3 = 1 f(S) ← tail(S,T), head(T,b) l
p4 =
{ f(S) ← head(S,c)
f(S) ← tail(S,T), f(T)
}
p5 =
{ f(S) ← head(S,d)
f(S) ← tail(S,T), f(T)
}
Each of these programs entails at least one positive and one
negative example. The incomplete join stage first outputs the
conjunction c1 = 1p3,p4,p5l as it entails all the positive ex-
amples and no negative example. The complete join stage then
outputs the conjunctions c2 = 1p1,p3l and c3 = 1p2,p3l.
The other conjunctions are not output because they (i) do
not entail any positive example, or (ii) entail some negative
example, or (iii) are subsumed by c2 or c3.

Page 5
Finding Conjunctions using SAT
To find conjunctions, we use a Boolean satisfiabil-
ity (SAT) [Biere et al., 2021] approach. We build a propo-
sitional encoding (lines 10 and 21) for the join stage as fol-
lows. Let P be the set of input programs. For each pro-
gram h ∈ P, the variable ph indicates that h is in a con-
junction. For each positive example e ∈ E+, the variable
ce indicates that the conjunction entails e. The constraint
F+
e
= ce → ∧h∈P|B∪h̸|=e ¬ph ensures that if the conjunc-
tion entails e, then every program in the conjunction en-
tails e. For each negative example e ∈ E, the constraint
F
e
= ∨h∈P|B∪h̸|=e ph ensures that at least one of the pro-
grams in the conjunction does not entail e.
Subset-maximal coverage conjunctions. For the complete
join stage, to find all conjunctions of size k with subset-
maximal coverage, we use a SAT solver to enumerate max-
imal satisfiable subsets [Liffiton and Sakallah, 2008] corre-
sponding to the subset-maximal coverage conjunctions on
the following propositional encoding. We build the con-
straint S = ∑h size(h) · ph ≤ k to bound the size of the
conjunctions and we encode S as a propositional formula
FS [Manthey et al., 2014]. To find a conjunction with subset-
maximal coverage, we iteratively call a SAT solver on the
formula F = ∧e∈E+ F+
e < ∧e∈EF
e < FS. If F has a
satisfying assignment τ (line 23), we form a conjunction
c by including a program h iff τ(h) = 1 and update F to
F < ∧e∈E+|B∪c|=e ce < ∨e∈E+|B∪c̸|=e ce to ensure that sub-
sequent conjunctions cover more examples (line 27). We
iterate until F is unsatisfiable (lines 26 to 30), in which
case c has subset-maximal coverage. To enumerate all con-
junctions, we iteratively call this procedure on the formula
F < ∧c∈C
e∈E+|B∪c̸|=e ce, where C is the set of conjunc-
tions found so far.
Maximal coverage conjunctions. For the incomplete join
stage, we use maximum satisfiability (MaxSAT) [Bacchus
et al., 2021] solving to find conjunctions which entail the
maximum number of uncovered positive examples (line 11).
The MaxSAT encoding includes the hard clauses ∧e∈E+ F+
e <
e∈EF
e to ensure correct coverage, as well as soft clauses
(ce) for each uncovered positive example e to maximise the
number of uncovered examples.
4.4 Constrain
In the constrain stage (line 20 in Algorithm 1), JOINER uses
two optimally sound constraints to prune the hypothesis space.
If a hypothesis does not entail any positive example, JOINER
prunes all its specialisations, as they cannot be in a conjunction
in an optimal solution:
Proposition 2. Let h1 be a hypothesis that does not entail any
positive example and h2 be a specialisation of h1. Then h2
cannot be in a conjunction in an optimal solution.
If a hypothesis does not entail any negative example, JOINER
prunes all its specialisations, as they cannot be in a conjunction
in an optimal solution:
Proposition 3. Let h1 be a hypothesis that does not entail any
negative example and h2 be a specialisation of h1. Then h2
cannot be in a conjunction in an optimal solution.
The appendix includes proofs for these propositions.
4.5 Correctness
We prove the correctness of JOINER:
Theorem 1. JOINER returns an optimal solution, if one exists.
The proof is in the appendix. To show this result, we show
that (i) JOINER can generate and test every non-splittable
program, (ii) each rule of an optimal solution is equivalent to
the conjunction of non-splittable rules, and (iii) our constraints
(Propositions 1, 2, and 3) never prune optimal solutions.
5 Experiments
To test our claim that our join stage can improve learning
performance, our experiments aim to answer the question:
Q1 Can the join stage improve learning performance?
To answer Q1, we compare learning with and without the join
stage.
To test our claim that eliminating splittable programs in
the generate stage can improve learning performance, our
experiments aim to answer the question:
Q2 Can eliminating splittable programs in the generate stage
improve learning performance?
To answer Q2, we compare learning with and without the
constraint eliminating splittable programs.
To test our claim that JOINER can learn programs with big
splittable rules, our experiments aim to answer the question:
Q3 How well does JOINER scale with the size of splittable
rules?
To answer Q3, we vary the size of rules and evaluate the
performance of JOINER.
Finally, to test our claim that JOINER can outperform other
approaches, our experiments aim to answer the question:
Q4 How well does JOINER compare against other ap-
proaches?
To answer Q4, we compare JOINER against COMBO [Crop-
per and Hocquette, 2023] and ALEPH [Srinivasan, 2001] on
multiple tasks and domains2.
Domains
We consider several domains. The appendix provides addi-
tional information about our domains and tasks.
IGGP. In inductive general game playing (IGGP) [Cropper et
al., 2020], the task is to learn rules from game traces from the
general game playing competition [Genesereth and Björnsson,
2013].
Zendo. Zendo is an inductive game where the goal is to
identify a secret rule that structures must follow [Bramley et
al., 2018; Cropper and Hocquette, 2023].
IMDB. The international movie database (IMDB) [Mihalkova
et al., 2007] is a relational domain containing relations be-
tween movies, directors, and actors.
Pharmacophores. The task is to identify structures responsi-
ble for the medicinal activity of molecules [Finn et al., 1998].
2We also tried rule selection approaches (Section 2) but precom-
puting every possible rule is infeasible on our datasets.

Page 6
Strings. The goal is to identify recursive patterns to classify
strings.
1D-ARC. This dataset [Xu et al., 2023] contains visual reason-
ing tasks inspired by the abstract reasoning corpus [Chollet,
2019].
Experimental Setup
We use 60s and 600s timeouts. We repeat each experiment 5
times. We measure predictive accuracies (the proportion of
correct predictions on testing data). For clarity, our figures
only show tasks where the approaches differ. The appendix
contains the detailed results for each task. We use an 8-core 3.2
GHz Apple M1 and a single CPU to run the experiments. We
use the MaxSAT solver UWrMaxSat [Piotrów, 2020] and the
SAT solver CaDiCaL 1.5.3 [Biere et al., 2023] (via PySAT [Ig-
natiev et al., 2018]) in the join stage of JOINER.
Q1. We compare learning with and without the join stage.
To isolate the impact of the join stage, we allow splittable
programs in the generate stage.
Q2. We compare the predictive accuracies of JOINER with
and without the constraint that eliminates splittable programs.
Q3. To evaluate scalability, for increasing values of k, we
generate a task where an optimal solution has size k. We learn
a hypothesis with a single splittable rule. We use a zendo task
similar to the one shown in Section 1 and a string task.
Q4. We provide JOINER and COMBO with identical input.
The only differences are (i) the join stage, and (ii) the elimi-
nation of splittable programs in the generate stage. However,
because it can build conjunctions in the join stage, JOINER
searches a larger hypothesis space than COMBO. ALEPH uses
a different bias than JOINER to define the hypothesis space.
In particular, ALEPH expects a maximum rule size as input.
Therefore, the comparison is less fair and should be viewed as
indicative only.
Reproducibility. The code and experimental data for re-
producing the experiments are provided as supplementary
material and will be made publicly available if the paper is
accepted for publication.
Experimental Results
Q1. Can the Join Stage Improve Performance?
Figure 3 shows that the join stage can drastically improve
predictive accuracies. A McNeymar’s test confirms (p < 0.01)
that the join stage improves accuracies on 24/42 tasks with a
60s timeout and on 22/42 tasks with a 600s timeout. There is
no significant difference for the other tasks.
The join stage can learn big rules which otherwise cannot
be learned. For instance, for the task pharma1, the join stage
finds a rule of size 17 which has 100% accuracy. By contrast,
without the join stage, no solution is found, resulting in default
accuracy (50%). Similarly, an optimal solution for the task
iggp-rainbow has a single rule of size 19. This rule is splittable
and is the conjunction of 6 small rules. The join stage identifies
this rule in less than 1s as it entails all the positive examples.
By contrast, without the join stage, the system exceeds the
timeout without finding a solution as it needs to search through
the set of all rules up to size 19 to find a solution.
The overhead of the join stage is small. For instance, for
the task scale in the 1D-ARC domain, the join stage takes less
than 1% of the total learning time, yet this stage allows us to
find a perfectly accurate solution with a rule of size 13.
Overall, the results suggest that the answer to Q1 is that the
join stage can substantially improve predictive accuracy.
50 60 70 80 90 100
50
60
70
80
90
100
without join stage
with
join
stage
50 60 70 80 90 100
50
60
70
80
90
100
without join stage
with
join
stage
Figure 3: Predictive accuracy (%) with and without join stage with
60s (left) and 600s (right) timeouts.
Q2. Can Eliminating Splittable Programs Improve
Performance?
Figure 4 shows that eliminating splittable programs in the gen-
erate stage can improve learning performance. A McNeymar’s
test confirms (p < 0.01) that eliminating splittable programs
improves performance on 8/42 tasks with a 60s timeout and
on 11/42 tasks with a 600s timeout. It degrades performance
(p < 0.01) on 1/42 tasks with a 60s timeout. There is no
significant difference for the other tasks.
Eliminating splittable programs from the generate stage
can greatly reduce the number of programs JOINER considers.
For instance, for iggp-rainbow, the hypothesis space contains
1,986,422 rules of size at most 6, but only 212,564 are non-
splittable. Likewise, for string2, when eliminating splittable
programs, JOINER finds a perfectly accurate solution in 133s
(2min13s). By contrast, with splittable programs, JOINER
considers more programs and fails to find a solution within
the 600s timeout, resulting in default accuracy (50%).
Overall, these results suggest that the answer to Q2 is that
eliminating splittable programs from the generate stage can
improve predictive accuracies.
50 60 70 80 90 100
50
60
70
80
90
100
with splittable
without
splittable
50 60 70 80 90 100
50
60
70
80
90
100
with splittable
without
splittable
Figure 4: Predictive accuracies (%) with and without generating
splittable programs with 60s (left) and 600s (right) timeouts.
Q3. How Well Does JOINER Scale?
Figure 5 shows that JOINER can learn an almost perfectly
accurate hypothesis with up to 100 literals for both the zendo
and string tasks. By contrast, COMBO and ALEPH struggle to
learn hypotheses with more than 10 literals.

Page 7
JOINER learns a zendo hypothesis of size k after searching
for programs of size 4. By contrast, COMBO must search for
programs up to size k to find a solution. Similarly, an optimal
solution for the string task is the conjunction of programs
with 6 literals each. By contrast, COMBO must search for
programs up to size k to find a solution. ALEPH struggles to
learn recursive programs and thus struggles on the string task.
Overall, the results suggest that the answer to Q3 is that
JOINER can scale well with the size of rules.
JOINER
COMBO
ALEPH
20
40
60
80 100
50
60
70
80
90
100
Optimal solution size
Accuracy
20
40
60
80
100
50
60
70
80
90
100
Optimal solution size
Accuracy
Figure 5: Predictive accuracies (%) when varying the optimal solution
size for zendo (left) and string (right) with a 600s timeout.
Q4. How Does JOINER Compare to Other Approaches?
Table 1 shows the predictive accuracies aggregated over each
domain. It shows JOINER achieves higher accuracies than
COMBO and ALEPH on almost all domains.
Task
ALEPH
COMBO
JOINER
iggp
78 ± 3
86 ± 2
96 ± 1
zendo
100 ± 0
86 ± 3
94 ± 2
pharma
50 ± 0
53 ± 2
98 ± 1
imdb
67 ± 6
100 ± 0
100 ± 0
string
50 ± 0
50 ± 0
100 ± 0
onedarc
51 ± 1
57 ± 2
89 ± 1
Table 1: Aggregated predictive accuracies (%) with a 600s timeout.
Figure 6 shows that JOINER outperforms COMBO. A Mc-
Neymar’s test confirms (p < 0.01) that JOINER outperforms
COMBO on 27/42 tasks with a 60s timeout and on 26/42 tasks
with a 600s timeout. JOINER and COMBO have similar perfor-
mance on other tasks.
JOINER can find hypotheses with big rules. For example,
the flip task in the 1D-ARC domain involves reversing the
order of colored pixels in an image. JOINER finds a solution
with two splittable rules of size 9 and 16. By contrast, COMBO
only searches programs of size at most 4 before it timeouts and
it does not learn any program. JOINER can also perform better
when learning non-splittable programs. For instance, JOINER
learns a perfectly accurate solution for iggp-rps and proves that
this solution is optimal in 20s. By contrast, COMBO requires
440s (7min20s) to find the same solution and prove optimality.
Figure 7 shows that JOINER outperforms ALEPH. A Mc-
Neymar’s test confirms (p < 0.01) that JOINER outperforms
ALEPH on 28/42 tasks with both 60s and 600s timeouts.
ALEPH outperforms (p < 0.01) JOINER on 4/42 tasks with a
60s timeout and on 2/42 tasks with a 600s timeout. JOINER
and ALEPH have similar performance on other tasks.
50 60 70 80 90 100
50
60
70
80
90
100
COMBO
J
O
IN
E
R
50 60 70 80 90 100
50
60
70
80
90
100
COMBO
J
O
IN
E
R
Figure 6: Predictive accuracies (%) of JOINER versus COMBO with
60s (left) and 600s (right) timeouts.
ALEPH struggles to learn recursive programs and there-
fore does not perform well on the string tasks. JOINER also
consistently surpasses ALEPH on tasks which do not require
recursion. For instance, JOINER achieves 98% average accu-
racy on the pharma tasks while ALEPH has default accuracy
(50%). However, for zendo3, ALEPH can achieve better accu-
racies (100% vs 79%) than JOINER. An optimal solution is
not splittable and JOINER exceeds the timeout.
Overall, the results suggest that the answer to Q4 is that
JOINER can outperform other approaches in terms of predic-
tive accuracy.
50 60 70 80 90 100
50
60
70
80
90
100
ALEPH
J
O
IN
E
R
50 60 70 80 90 100
50
60
70
80
90
100
ALEPH
J
O
IN
E
R
Figure 7: Predictive accuracies (%) of JOINER versus ALEPH with
60s (left) and 600s (right) timeouts.
6 Conclusions and Limitations
Learning programs with big rules is a major challenge. To ad-
dress this challenge, we introduced an approach which learns
big rules by joining small rules. We implemented our approach
in JOINER, which can learn optimal and recursive programs.
Our experiments on various domains show that JOINER can (i)
learn splittable rules with more than 100 literals, and (ii) out-
perform existing approaches in terms of predictive accuracies.
Limitations
Splittability. Our join stage builds splittable rules. The body
of a splittable rule is split into subsets which do not share body-
only variables. Future work should generalise our approach to
split rules into subsets which may share body-only variables.
Noise. In the join stage, we search for conjunctions which
entail some positive examples and no negative examples. Our
approach does not support noisy examples. Hocquette et al.
[2024] relax the LFF definition based on the minimal descrip-
tion length principle. Future work should combine our ap-
proach with their approach to learn big rules from noisy data.

Page 8
References
[Bacchus et al., 2021] Fahiem Bacchus, Matti Järvisalo, and
Ruben Martins. Maximum satisfiabiliy. In Armin Biere,
Marijn Heule, Hans van Maaren, and Toby Walsh, editors,
Handbook of Satisfiability - Second Edition, volume 336 of
Frontiers in Artificial Intelligence and Applications, pages
929–991. IOS Press, 2021.
[Bartha and Cheney, 2019] Sándor Bartha and James Cheney.
Towards meta-interpretive learning of programming lan-
guage semantics. In Inductive Logic Programming - 29th
International Conference, ILP 2019, Plovdiv, Bulgaria,
September 3-5, 2019, Proceedings, pages 16–25, 2019.
[Bembenek et al., 2023] Aaron Bembenek, Michael Green-
berg, and Stephen Chong. From SMT to ASP: Solver-based
approaches to solving datalog synthesis-as-rule-selection
problems. Proc. ACM Program. Lang., 7(POPL), jan 2023.
[Biere et al., 2021] Armin Biere, Marijn Heule, Hans van
Maaren, and Toby Walsh, editors. Handbook of Satisfi-
ability - Second Edition, volume 336 of FAIA. IOS Press,
2021.
[Biere et al., 2023] Armin Biere, Mathias Fleury, and Flo-
rian Pollitt. CaDiCaL, vivinst, IsaSAT, Gimsatul, Kissat,
and TabularaSAT entering the SAT Competition 2023. In
Proc. SAT Competition 2023: Solver, Benchmark and Proof
Checker Descriptions, volume B-2023-1 of Department of
Computer Science Report Series B, pages 14–15. University
of Helsinki, 2023.
[Blockeel and De Raedt, 1998] Hendrik Blockeel and Luc
De Raedt. Top-down induction of first-order logical deci-
sion trees. Artificial intelligence, 101(1-2):285–297, 1998.
[Bramley et al., 2018] Neil Bramley, Anselm Rothe, Josh
Tenenbaum, Fei Xu, and Todd M. Gureckis. Grounding
compositional hypothesis generation in specific instances.
In Proceedings of the 40th Annual Meeting of the Cognitive
Science Society, CogSci 2018, 2018.
[Bunel et al., 2018] Rudy Bunel, Matthew Hausknecht, Jacob
Devlin, Rishabh Singh, and Pushmeet Kohli. Leveraging
grammar and reinforcement learning for neural program
synthesis. arXiv preprint arXiv:1805.04276, 2018.
[Chollet, 2019] François Chollet. On the measure of intelli-
gence. CoRR, 2019.
[Corapi et al., 2011] Domenico Corapi, Alessandra Russo,
and Emil Lupu. Inductive logic programming in answer
set programming. In Inductive Logic Programming - 21st
International Conference, pages 91–97, 2011.
[Costa et al., 2003] Vıtor Santos Costa, Ashwin Srinivasan,
Rui Camacho, Hendrik Blockeel, Bart Demoen, Gerda
Janssens, Jan Struyf, Henk Vandecasteele, and Wim Van
Laer. Query transformations for improving the efficiency
of ILP systems. Journal of Machine Learning Research,
pages 465–491, 2003.
[Cropper and Dumancic, 2020] Andrew Cropper and Sebasti-
jan Dumancic. Learning large logic programs by going
beyond entailment. In Proceedings of the Twenty-Ninth
International Joint Conference on Artificial Intelligence,
IJCAI 2020, pages 2073–2079, 2020.
[Cropper and Hocquette, 2023] Andrew Cropper and Céline
Hocquette. Learning logic programs by combining pro-
grams. In ECAI 2023 - 26th European Conference on
Artificial Intelligence, volume 372, pages 501–508. IOS
Press, 2023.
[Cropper and Morel, 2021] Andrew Cropper and Rolf Morel.
Learning programs by learning from failures. Machine
Learning, 110(4):801–856, 2021.
[Cropper et al., 2020] Andrew Cropper, Richard Evans, and
Mark Law. Inductive general game playing. Machine
Learning, 109(7):1393–1434, 2020.
[Cropper et al., 2022] Andrew Cropper, Sebastijan Du-
mancic, Richard Evans, and Stephen H. Muggleton. In-
ductive logic programming at 30. Machine Learning,
111(1):147–172, 2022.
[Devlin et al., 2017] Jacob Devlin, Jonathan Uesato, Surya
Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and
Pushmeet Kohli. Robustfill: Neural program learning under
noisy i/o. In International conference on machine learning,
pages 990–998. PMLR, 2017.
[Dumancic et al.,2019] Sebastijan Dumancic, Tias Guns,
Wannes Meert, and Hendrik Blockeel. Learning relational
representations with auto-encoding logic programs. In
Proceedings of the Twenty-Eighth International Joint Con-
ference on Artificial Intelligence, IJCAI 2019, pages 6081–
6087, 2019.
[Evans and Grefenstette, 2018] Richard Evans and Edward
Grefenstette. Learning explanatory rules from noisy data.
Journal of Artificial Intelligence Research, pages 1–64,
2018.
[Evans et al., 2021] Richard Evans, José Hernández-Orallo,
Johannes Welbl, Pushmeet Kohli, and Marek J. Sergot.
Making sense of sensory input. Artificial Intelligence, page
103438, 2021.
[Ferilli, 2016] Stefano Ferilli. Predicate invention-based spe-
cialization in inductive logic programming. Journal of
Intelligent Information Systems, 47(1):33–55, 2016.
[Finn et al.,1998] Paul Finn, Stephen Muggleton, David
Page, and Ashwin Srinivasan. Pharmacophore discovery
using the inductive logic programming system progol. Ma-
chine Learning, 30(2):241–270, 1998.
[Galárraga et al., 2015] Luis Galárraga, Christina Teflioudi,
Katja Hose, and Fabian M. Suchanek. Fast rule mining
in ontological knowledge bases with AMIE+. VLDB J.,
24(6):707–730, 2015.
[Genesereth and Björnsson, 2013] Michael Genesereth and
Yngvi Björnsson. The international general game play-
ing competition. AI Magazine, 34(2):107–107, 2013.
[Hocquette et al., 2024] Céline Hocquette, Andreas Niska-
nen, Matti Järvisalo, and Andrew Cropper. Learning MDL
logic programs from noisy data. In AAAI, 2024.

Page 9
[Ignatiev et al.,2018] Alexey Ignatiev, Antonio Morgado,
and Joao Marques-Silva. PySAT: A Python toolkit for
prototyping with SAT oracles. In International Conference
on Theory and Applications of Satisfiability Testing, pages
428–437. Springer, 2018.
[Kaminski et al., 2019] Tobias Kaminski, Thomas Eiter, and
Katsumi Inoue. Meta-interpretive learning using hex-
programs. In Proceedings of the Twenty-Eighth Interna-
tional Joint Conference on Artificial Intelligence, IJCAI
2019, pages 6186–6190, 2019.
[Law et al., 2014] Mark Law, Alessandra Russo, and Krysia
Broda. Inductive learning of answer set programs. In
Logics in Artificial Intelligence - 14th European Conference,
JELIA 2014, pages 311–325, 2014.
[Liffiton and Sakallah, 2008] Mark H. Liffiton and Karem A.
Sakallah. Algorithms for computing minimal unsatisfiable
subsets of constraints. J. Autom. Reason., 40(1):1–33, 2008.
[Lloyd, 2012] John W Lloyd. Foundations of logic program-
ming. Springer Science & Business Media, 2012.
[Manthey et al., 2014] Norbert Manthey, Tobias Philipp, and
Peter Steinke. A more compact translation of pseudo-
Boolean constraints into CNF such that generalized arc
consistency is maintained. In Carsten Lutz and Michael
Thielscher, editors, KI 2014, volume 8736 of LNCS, pages
123–134. Springer, 2014.
[Mihalkova et al., 2007] Lilyana Mihalkova, Tuyen N.
Huynh, and Raymond J. Mooney. Mapping and revis-
ing markov logic networks for transfer learning. In
Proceedings of the Twenty-Second AAAI Conference on
Artificial Intelligence, July 22-26, 2007, Vancouver, British
Columbia, Canada, pages 608–614. AAAI Press, 2007.
[Muggleton, 1991] Stephen Muggleton. Inductive logic pro-
gramming. New Generation Computing, 8(4):295–318,
1991.
[Muggleton,1995] Stephen H. Muggleton. Inverse en-
tailment and Progol.
New Generation Computing,
13(3&4):245–286, 1995.
[Piotrów,2020] Marek Piotrów. UWrMaxSat: Efficient
solver for MaxSAT and pseudo-Boolean problems. In 2020
IEEE 32nd International Conference on Tools with Artifi-
cial Intelligence (ICTAI), pages 132–136, 2020.
[Quinlan, 1990] J. Ross Quinlan. Learning logical definitions
from relations. Machine Learning, pages 239–266, 1990.
[Raghothaman et al., 2020] Mukund Raghothaman, Jonathan
Mendelson, David Zhao, Mayur Naik, and Bernhard Scholz.
Provenance-guided synthesis of datalog programs. Proc.
ACM Program. Lang., 4(POPL):62:1–62:27, 2020.
[Shapiro, 1983] E.Y. Shapiro. Algorithmic program debug-
ging. 1983.
[Shi et al., 2022] Kensen Shi, Hanjun Dai, Kevin Ellis, and
Charles Sutton. Crossbeam: Learning to search in bottom-
up program synthesis. In International Conference on
Learning Representations, 2022.
[Si et al., 2019] Xujie Si, Mukund Raghothaman, Kihong
Heo, and Mayur Naik. Synthesizing datalog programs using
numerical relaxation. In Proceedings of the Twenty-Eighth
International Joint Conference on Artificial Intelligence,
IJCAI 2019, pages 6117–6124, 2019.
[Srinivasan, 2001] Ashwin Srinivasan. The ALEPH manual.
Machine Learning at the Computing Laboratory, Oxford
University, 2001.
[Tärnlund, 1977] Sten-˚Ake Tärnlund. Horn clause com-
putability. BIT, (2):215–226, 1977.
[Xu et al., 2023] Yudong Xu, Wenhao Li, Pashootan
Vaezipoor, Scott Sanner, and Elias B Khalil. LLMs and
the abstraction and reasoning corpus: Successes, failures,
and the importance of object-based representations. arXiv
preprint arXiv:2305.18354, 2023.
A Terminology
We assume familiarity with logic programming [Lloyd, 2012]
but restate some key relevant notation. A variable is a string
of characters starting with an uppercase letter. A predicate
symbol is a string of characters starting with a lowercase letter.
The arity n of a function or predicate symbol is the number
of arguments it takes. An atom is a tuple p(t1, ..., tn), where
p is a predicate of arity n and t1, ..., tn are terms, either
variables or constants. An atom is ground if it contains no
variables. A literal is an atom or the negation of an atom.
A clause is a set of literals. A clausal theory is a set of
clauses. A constraint is a clause without a positive literal. A
definite clause is a clause with exactly one positive literal. A
hypothesis is a set of definite clauses with the least Herbrand
semantics. We use the term program interchangeably with
hypothesis, i.e. a hypothesis is a program. A substitution
θ = {v1/t1, ..., vn/tn} is the simultaneous replacement of
each variable vi by its corresponding term ti. A clause c1
subsumes a clause c2 if and only if there exists a substitution
θ such that c1θ ⊆ c2. A program h1 subsumes a program h2,
denoted h1 ⪯ h2, if and only if ∀c2 ∈ h2, ∃c1 ∈ h1 such that
c1 subsumes c2. A program h1 is a specialisation of a program
h2 if and only if h2 ⪯ h1. A program h1 is a generalisation
of a program h2 if and only if h1 ⪯ h2.
B Generate stage encoding
Our encoding of the generate stage is the same as COMBO
except we disallow splittable rules in hypotheses which are
not recursive and do not contain predicate invention. To do so,
we add the following ASP program to the generate encoding:
:- not pi_or_rec, clause(C), splittable(C).
splittable(C) :-
body_not_head(C,Var1),
body_not_head(C,Var2),
Var1 < Var2,
not path(C,Var1,Var2).
path(C,Var1,Var2) :- path1(C,Var1,Var2).
path(C,Var1,Var2) :-
path1(C,Var1,Var3),
path(C,Var3,Var2).

Page 10
path1(C,Var1,Var2) :-
body_literal(C,_,_,Vars),
var_member(Var1,Vars),
var_member(Var2,Vars),
Var1 < Var2,
body_not_head(C,Var1),
body_not_head(C,Var2).
%% we also disallow rules with multiple body
literals when a body literal only has head variables
:-
not pi_or_rec,
body_literal(C,P1,_,Vars1),
body_literal(C,P2,_,Vars2),
(P1,Vars1) != (P2,Vars2),
head_only_vars(C,Vars1).
head_only_vars(C,Vars) :-
body_literal(C,_,_,Vars),
clause(C),
not has_body_var(C,Vars).
has_body_var(C,Vars) :-
var_member(Var,Vars),
body_not_head(C,Var).
body_not_head(C,V) :-
body_var(C,V),
not head_var(C,V).
C JOINER Correctness
We show the correctness of JOINER. In the rest of this
section, we consider a LFF input tuple (E,B, H,C) with
E = (E+,E). We assume that a solution always exists:
Assumption 1 (Existence of a solution). We assume there
exists a solution h ∈ HC.
For terseness, in the rest of this section we always assume
Assumption 1 and, therefore, avoid repeatedly saying “if a
solution exists”.
We follow LFF [Cropper and Morel, 2021] and assume the
background knowledge does not depend on any hypothesis:
Assumption 2 (Background knowledge independence). We
assume that no predicate symbol in the body of a rule in the
background knowledge appears in the head of a rule in a
hypothesis.
For instance, given the following background knowledge we
disallow learning hypotheses for the target relations famous or
friend:
happy(A) ← friend(A,B), famous(B)
As with COMBO [Cropper and Hocquette, 2023], JOINER is
correct when the hypothesis space only contains decidable pro-
grams (such as Datalog programs), i.e. when every program is
guaranteed to terminate:
Assumption 3 (Decidable programs). We assume the hy-
pothesis space only contains decidable programs.
When the hypothesis space contains arbitrary definite pro-
grams, the results do not hold because, due to their Turing
completeness, checking entailment of an arbitrary definite
program is semi-decidable [Tärnlund, 1977], i.e. testing a
program might never terminate.
JOINER follows a generate, test, join, combine, and con-
strain loop. We show the correctness of each of these steps in
turn. We finally use these results to prove the correctness of
JOINER i.e. that JOINER returns an optimal solution.
C.1 Preliminaries
We introduce definitions used throughout this section. We
define a splittable rule:
Definition 7 (Splittable rule). A rule is splittable if its body
literals can be partitioned into two non-empty sets such that
the body-only variables (a variable in the body of a rule but
not the head) in the literals of one set are disjoint from the
body-only variables in the literals of the other set. A rule is
non-splittable if it is not splittable.
We define a splittable program:
Definition 8 (Splittable program). A program is splittable if
and only if it has exactly one rule and this rule is splittable. A
program is non-splittable when it is not splittable.
The least Herbrand model M(P) of the logic program P is
the set of all ground atomic logical consequences of P, The
least Herbrand model MB(P) = M(P ∪ B) of P given the
logic program B is the the set of all ground atomic logical
consequences of P∪B. In the following, we assume a program
B denoting background knowledge and concisely note MB(P)
as M(P).
We define a conjunction:
Definition 9 (Conjunction). A conjunction is a set of pro-
grams with the same head literal. The logical consequences of
a conjunction c is the intersection of the logical consequences
of the programs in it: M(c) = ∩p∈cM(p). The cost of a
conjunction c is the sum of the cost of the programs in it:
cost(c) = ∑p∈c cost(p).
Conjunctions only preserve the semantics with respect to their
head predicate symbols. We, therefore, reason about the least
Herbrand model restricted to a predicate symbol:
Definition 10 (Restricted least Herbrand model). Let P
be a logic program and f be a predicate symbol. Then the
least Herbrand model of P restricted to f is M(P, f) = {a ∈
M(P) | the predicate symbol of a is f}.
We define an operator σ : 2H ↦→ H which maps a conjunc-
tion to a program such that (i) for any conjunction c with
head predicate symbol f, M(c, f) = M(σ(c),f), and (ii) for
any conjunctions c1 and c2 such that size(c1) < size(c2),
size(σ(c1)) < size(σ(c2)). In the following, when we say
that a conjunction c is in a program h, or c ⊆ h, we mean
σ(c) ⊆ h.
For instance, let σ be the operator which, given a con-
junction c with head predicate symbol f of arity a, re-
turns the program obtained by (i) for each program p ∈
c, replacing each occurrence of f in p by a new predi-
cate symbol fi, and (ii) adding the rule f(x1,...,xa) ←
f1(x1,...,xa),...,fn(x1,...,xa) where xi,...,xa are
variables.

Page 11
Example 1. Consider the following conjunction:
c1 =
{ { zendo(S) ← piece(S,B), blue(B) }
{ zendo(S) ← piece(S,R), red(R) }
}
Then σ(c1) is:
{ zendo(S) ← zendo1(S),zendo2(S)
zendo1(S) ← piece(S,B), blue(B)
zendo2(S) ← piece(S,R), red(R)
}
Consider the following conjunction:
c2 =


{ f(S) ← head(S,1)
f(S) ← tail(S,T),f(T)
}
{ f(S) ← head(S,2)
f(S) ← tail(S,T),f(T)
}


Then σ(c2) is:


f(S) ← f1(S),f2(S)
f1(S) ← head(S,1)
f1(S) ← tail(S,T),f1(T)
f2(S) ← head(S,2)
f2(S) ← tail(S,T),f2(T)


The generate stage of COMBO generates non-separable pro-
grams. We recall the definition of a separable program:
Definition 11 (Separable program). A program h is sepa-
rable when (i) it has at least two rules, and (ii) no predicate
symbol in the head of a rule in h also appears in the body of a
rule in h. A program is non-separable when it is not separable.
C.2 Generate and Test
We show that JOINER can generate and test every non-
splittable non-separable program:
Proposition 4 (Correctness of the generate and test stages).
JOINER can generate and test every non-splittable non-
separable program.
Proof. Cropper and Hocquette [2023] show (Lemma 1) that
COMBO can generate and test every non-separable program
under Assumption 3. The constraint we add to the gener-
ate encoding (Section B) prunes splittable programs. There-
fore JOINER can generate and test every non-splittable non-
separable program.
C.3 Join
Proposition 4 shows that JOINER can generate and test every
non-splittable non-separable program. However, the generate
stage cannot generate splittable non-separable programs. To
learn a splittable non-separable program, JOINER uses the join
stage to join non-splittable programs into splittable programs.
To show the correctness of this join stage, we first show that the
logical consequences of a rule which body is the conjunction
of the body of two rules r1 and r2 is equal to the intersection
of the logical consequences of r1 and r2:
Lemma 1. Let r1 = (g ← L1), r2 = (g ← L2), and r =
(g ← L1,L2) be three rules, such that the body-only variables
of r1 and r2 are distinct. Then M(r) = M(r1) ∩ M(r2).
Proof. We first show that if a ∈
M(r) then a
M(r1) ∩ M(r2). The rule r specialises r1 which implies that
M(r) ⊆ M(r1) and therefore a ∈ M(r1). Likewise, r spe-
cialises r2 which implies that M(r) ⊆ M(r2) and therefore
a ∈ M(r2). Therefore a ∈ M(r1) ∩ M(r2).
We now show that if a ∈ M(r1) ∩ M(r2) then a ∈ M(r).
Since a ∈ M(r1) then L1 ∨ ¬g |= a. Similarly, since a ∈
M(r2) then L2 ∨ ¬g |= a. Then (L1 ∨ ¬g) ∧ (L2 ∨ ¬g) |= a
by monotonicity, and a ∈ M(r).
With this result, we show that any rule is equivalent to a
conjunction of non-splittable programs:
Lemma 2 (Conjunction of non-splittable programs). Let r
be a rule. Then there exists a conjunction c of non-splittable
programs such that M(r) = M(c).
Proof. We prove the result by induction on the number of
body literals m in the body of r.
For the base case, if m = 1 then r is non-splittable. Let c
be the conjunction {{r}}. Then M(r) = M(c).
For the inductive case, assume the claim holds for rules with
m body literals. Let r be a rule with m+1 body literals. Either
r is (i) non-splittable, or (ii) splittable. For case (i), assume
r is non-splittable. Let c be the conjunction {{r}}. Then
M(r) = M(c). For case (ii), assume r is splittable. Then,
by Definition 7, its body literals can be partitioned into two
non-empty sets L1 and L2 such that the body-only variables
in L1 are disjoint from the body-only variables in L2. Both L1
and L2 have fewer than m literals. Let a be the head literal of
r. Let r1 = (a ← L1) and r2 = (a ← L2). By the induction
hypothesis, there exists a conjunction c1 of non-splittable
programs such that M(r1) = M(c1) and a conjunction c2
of non-splittable programs such that M(r2) = M(c2). Let
c = c1 ∪ c2. Then c is a conjunction because the programs
in c1 and c2 have the same head predicate symbol. From
Lemma 1, M(r) = M(r1) ∩ M(r2) and therefore M(r) =
M(c1) ∩ M(c2) = M(c) which completes the proof.
Example 4 (Conjunction of non-splittable programs). Con-
sider the rule r:
r = { f(A) ← head(A,1),tail(A,B),head(B,3) }
Consider the non-splittable programs:
p1 = { f(A) ← head(A,1) }
p2 = { f(A) ← tail(A,B),head(B,3) }
Then M(r) = M(c) where c is the conjunction c = p1 ∪ p2.
We show that any program is equivalent to a conjunction of
non-splittable programs:
Lemma 3 (Conjunction of non-splittable programs). Let
h be a program. Then there exists a conjunction c of non-
splittable programs such that M(h) = M(c).
Proof. h has either (i) at least two rules, or (ii) one rule. For
case (i), if h has at least two rules, then h is non-splittable.
Let c be the conjunction {{h}}. Then M(h) = M(c). For
case (ii), Lemma 2 shows there exists a conjunction c of non-
splittable programs such that M(h) = M(c).

Page 12
The join stage takes as input joinable programs. We define a
joinable program:
Definition 12 (Joinable program). A program is joinable
when it (i) is non-splittable, (ii) is non-separable, (iii) entails
at least one positive example, and (iv) entails at least one
negative example.
The combine stage takes as input combinable programs. We
define a combinable program:
Definition 13 (Combinable program). A program is com-
binable when it (i) is non-separable, (ii) entails at least one
positive example, and (iii) entails no negative example.
We show that any combinable program is equivalent to a con-
junction of non-splittable non-separable programs which each
entail at least one positive example:
Lemma 4 (Conjunction of programs). Let h be combinable
program. Then there exists a conjunction c of programs such
that M(c) = M(h) and each program in c (i) is non-splittable,
(ii) is non-separable, and (iii) entails at least one positive
example.
Proof. From Lemma 3, there exists a conjunction c of non-
splittable programs such that M(c) = M(h). For contradic-
tion, assume some program hi ∈ c is either (i) separable, or (ii)
does not entail any positive example. For case (i), assume that
hi is separable. Then hi has at least two rules and therefore
cannot be non-splittable, which contradicts our assumption.
For case (ii), assume hi does not entail any positive example.
Then c does not entail any positive example and h does not en-
tail any positive example. Then h is not a combinable program,
which contradicts our assumption. Therefore, each hi ∈ c (i)
is non-splittable, (ii) is non-separable, and (iii) entails at least
one positive example.
The join stage returns non-subsumed conjunctions. A conjunc-
tion is subsumed if it entails a subset of the positive examples
entailed by a strictly smaller conjunction:
Definition 14 (Subsumed conjunction). Let c1 and c2 be
two conjunctions which do not entail any negative examples,
c1 |= E+
1 , c2 |= E+
2 , E+
2 ⊆ E+
1 , size(c1) < size(c2). Then
c2 is subsumed by c1.
Lemma 5 (Conjunction of joinable programs). Let h be
a non-subsumed splittable combinable program. Then there
exists a conjunction c of joinable programs such that M(c) =
M(h).
Proof. Since h is splittable, h has a single rule which body
literals can be partitioned into two non-empty sets L1 and L2
such that the body-only variables in L1 are disjoint from the
body-only variables in L2. Let g be the head literal of the
rule in h. Let r1 = (g ← L1) and r2 = (g ← L2). From
Lemma 4, there exists two conjunctions c1 and c2 such that
M(c1) = M(r1) and M(c2) = M(r2) and each program
in c1 ∪ c2 (i) is non-splittable, (ii) is non-separable, and (iii)
entails at least one positive example. Let c = c1 ∪ c2. Then
c is a conjunction because the programs in c1 and c2 have
the same head predicate symbol. From Lemma 1, M(r) =
M(r1) ∩ M(r2) and therefore M(r) = M(c1) ∩ M(c2) =
M(c). Therefore there exists a conjunction c of at least two
programs such that M(c) = M(h) and each program in c (i)
is non-splittable, (ii) is non-separable, and (iii) entails at least
one positive example. Assume some hi ∈ c does not cover any
negative example. Then the conjunction c= {{hi}} does not
entail any negative example. Since c has at least two elements,
then c⊂ c, and c is subsumed by cwhich contradicts our
assumption. Therefore, each hi ∈ c entails some negative
examples and is joinable.
We show that the join stage returns all non-subsumed splittable
combinable programs:
Proposition 5 (Correctness of the join stage). Given all
joinable programs, the join stage returns all non-subsumed
splittable combinable programs.
Proof. Assume the opposite, i.e. that there is a non-subsumed
splittable combinable program h not returned by the join stage.
As h is non-subsumed splittable combinable, by Lemma 5,
there exists a conjunction c of joinable programs such that
M(c) = M(h). As h is non-subsumed, c is not subsumed
by another conjunction. The join stage uses a SAT solver to
enumerate maximal satisfiable subsets [Liffiton and Sakallah,
2008] corresponding to the subset-maximal coverage of con-
junctions. As we have all joinable programs, the solver will
find every conjunction of joinable programs not subsumed by
another conjunction. Therefore, the solver finds c and the join
stage returns h, which contradicts our assumption.
C.4 Combine
We first show that a subsumed conjunction cannot be in an
optimal solution:
Proposition 6 (Subsumed conjunction). Let c1 and c2 be two
conjunctions such that c2 is subsumed by c1, h be a solution,
and σ(c2) ⊆ h. Then h cannot be optimal.
Proof. Assume the opposite, i.e. h is an optimal solution. We
show that h= (h \ σ(c2)) ∪ σ(c1) is a solution. Since h is a
solution, it does not entail any negative examples. Therefore,
h\σ(c2) does not entail any negative examples. Since c1 does
not entail any negative examples, then hdoes not entail any
negative examples. Since h is a solution, it entails all positive
examples. Since E+
2 ⊆ E+
1 , halso entails all positive exam-
ples. Therefore his a solution. Since size(c1) < size(c2)
then size(h) < size(h). Then h cannot be optimal, which
contradicts our assumption.
Corollary 1. Let c1 and c2 be two conjunctions which do not
entail any negative examples, c1 |= E+
1 , c2 |= E+
2 , E+
2 ⊆ E+
1 ,
and size(c1) < size(c2). Then c2 cannot be in an optimal
solution.
Proof. Follows from Proposition 6.
With this result, we show the correctness of the combine stage:
Proposition 7 (Correctness of the combine stage). Given
all non-subsumed combinable programs, the combine stage
returns an optimal solution.

Page 13
Proof. Cropper and Hocquette [2023] show that given all com-
binable programs, the combine stage returns an optimal solu-
tion under Assumption 2. Corollary 1 shows that subsumed
conjunctions cannot be in an optimal solution. Therefore, the
combine stage of JOINER returns an optimal solution given all
non-subsumed combinable programs.
C.5 Constrain
In this section, we show that the constrain stage never prunes
optimal solutions from the hypothesis space. JOINER builds
specialisations constraints from non-splittable non-separable
programs that do not entail (a) any positive example, or (b)
any negative example.
We show a result for case (a). If a hypothesis does not entail
any positive example, JOINER prunes all its specialisations, as
they cannot be in a conjunction in an optimal solution:
Proposition 8 (Specialisation constraint when tp=0). Let
h1 be a hypothesis that does not entail any positive example,
h2 be a specialisation of h1, c be a conjunction, h2 ∈ c, h be
a solution, and σ(c) ⊆ h. Then h is not an optimal solution.
Proof. Assume the opposite, i.e. h is an optimal solution. We
show that h= h \ σ(c) is a solution. Since h is a solution, it
does not entail any negative example. Then hdoes not entail
any negative example. Since h1 does not entail any positive
example and h2 specialises h1 then h2 does not entail any
positive example. Then c does not entail any positive example.
Since h entails all positive examples and c does not entail any
positive example, then hentails all positive examples. Since
hentails all positive examples and no negative examples, it is
a solution. Moreover, size(h) < size(h). Then h cannot be
optimal, which contradicts our assumption.
Corollary 2. Let h1 be a hypothesis that does not entail any
positive example and h2 be a specialisation of h1. Then h2
cannot be in a conjunction in an optimal solution.
Proof. Follows from Proposition 8.
We show a result for case (b). If a hypothesis does not entail
any negative example, JOINER prunes all its specialisations,
as they cannot be in a conjunction in an optimal solution:
Proposition 9 (Specialisation constraint when fp=0). Let
h1 be a hypothesis that does not entail any negative example,
h2 be a specialisation of h1, c be a conjunction, h2 ∈ c, h be
a solution, and σ(c) ⊆ h. Then h is not an optimal solution.
Proof. Assume the opposite, i.e. h is an optimal solution. Let
c= (c \{h2}) ∪{h1} and h= (h \ σ(c)) ∪ σ(c). We show
that his a solution. Since h1 does not entail any negative
example, then cdoes not entail any negative example. Since
h is a solution, it does not entail any negative examples. Then
hdo not entail any negative example. Since h2 specialises h1,
then h1 entails more positive examples than h2. Then centails
more positive examples than c, and hentails more positive
examples than h. Since h entails all positive examples then
hentails all positive examples. Since hentails all positive
examples and no negative examples, it is a solution. Since
h2 specialises h1 then size(h1) < size(h2) and size(c) <
size(c). Therefore, size(h) < size(h). Then h cannot be
optimal, which contradicts our assumption.
If a hypothesis does not entail any negative example, JOINER
prunes all its specialisations, as they cannot be in a conjunction
in an optimal solution:
Corollary 3. Let h1 be a hypothesis that does not entail any
negative example and h2 be a specialisation of h1. Then h2
cannot be in a conjunction in an optimal solution.
Proof. Follows from Proposition 9.
We now show that the constrain stage never prunes an optimal
solution:
Proposition 10 (Optimal soundness of the constraints). The
constrain stage of JOINER never prunes an optimal solution.
Proof. JOINER builds constraints from non-splittable non-
separable programs that do not entail (a) any positive example,
or (b) any negative example. For case (a), if a program does
not entail any positive example, JOINER prunes its specialisa-
tions which by Corollary 2 cannot be in a conjunction in an
optimal solution. For case (b), if a program does not entail any
negative example, JOINER prunes its specialisations which by
Corollary 3 cannot be in a conjunction in an optimal solution.
Therefore, JOINER never prunes an optimal solution.
C.6 JOINER Correctness
We show the correctness of the JOINER without the constrain
stage:
Proposition 11 (Correctness of JOINER without the con-
strain stage). JOINER without the constrain stage returns an
optimal solution.
Proof. Proposition 4 shows that JOINER can generate and test
all non-splittable non-separable programs. These generated
programs include all joinable programs and all non-splittable
combinable programs. Proposition 5 shows that given all
joinable programs, the join stage returns all non-subsumed
splittable combinable programs. Therefore, JOINER builds
all non-subsumed splittable and non-splittable combinable
programs. By Proposition 7, given all non-subsumed combin-
able programs, the combine stage returns an optimal solution.
Therefore, JOINER without the constrain stage returns an opti-
mal solution.
We show the correctness of the JOINER:
Theorem 2 (Correctness of JOINER). JOINER returns an
optimal solution.
Proof. From Proposition 11, JOINER without the constrain
stage returns an optimal solution. From Proposition 10, the
constrain stage never prunes an optimal solution. Therefore,
JOINER with the constrain stage returns an optimal solution.

Page 14
D Experiments
D.1 Experimental domains
We describe the characteristics of the domains and tasks used
in our experiments. Figure 8 shows example solutions for
some of the tasks.
IGGP. In inductive general game playing (IGGP) [Cropper
et al., 2020] the task is to induce a hypothesis to explain game
traces from the general game playing competition [Genesereth
and Björnsson, 2013]. IGGP is notoriously difficult for ma-
chine learning approaches. The currently best-performing
system can only learn perfect solutions for 40% of the tasks.
Moreover, although seemingly a toy problem, IGGP is rep-
resentative of many real-world problems, such as inducing
semantics of programming languages [Bartha and Cheney,
2019]. We use 11 tasks: minimal decay (md), buttons-next,
buttons-goal, rock - paper - scissors (rps), coins-next, coins-
goal, attrition, centipede, sukoshi, rainbow, and horseshoe.
Zendo. Zendo is an inductive game in which one player,
the Master, creates a rule for structures made of pieces with
varying attributes to follow. The other players, the Students,
try to discover the rule by building and studying structures
which are labelled by the Master as following or breaking the
rule. The first student to correctly state the rule wins. We
learn four increasingly complex rules for structures made of at
most 5 pieces of varying color, size, orientation and position.
Zendo is a challenging game that has attracted much interest
in cognitive science [Bramley et al., 2018].
IMDB. The real-world IMDB dataset [Mihalkova et al.,
2007] includes relations between movies, actors, directors,
movie genre, and gender. It has been created from the Interna-
tional Movie Database (IMDB.com) database. We learn the
relation workedunder/2, a more complex variant workedwith-
samegender/2, and the disjunction of the two. This dataset is
frequently used [Dumancic et al., 2019].
D.2 Experimental Setup
We measure the mean and standard error of the predictive
accuracy and learning time. We use a 3.8 GHz 8-Core Intel
Core i7 with 32GB of ram. The systems use a single CPU.
D.3 Experimental Results
We compare JOINER against COMBO and ALEPH, which we
describe below.
COMBO COMBO uses identical biases to JOINER so the com-
parison is direct, i.e. fair.
ALEPH ALEPH excels at learning many large non-recursive
rules and should excel at the trains and IGGP tasks. Al-
though ALEPH can learn recursive programs, it struggles
to do so. JOINER and ALEPH use similar biases so the
comparison can be considered reasonably fair.
We also tried/considered other ILP systems. We considered
ILASP [Law et al., 2014]. However, ILASP builds on ASPAL
and first precomputes every possible rule in a hypothesis space,
which is infeasible for our datasets. In addition, ILASP cannot
learn Prolog programs so is unusable in the synthesis tasks.
For instance, it would require precomputing 1015 rules for the
coins task.
E Experimental results
E.1 Impact of the features
Tables 2 and 3 show the results with 60s and 600s timeout
respectively.
Task
COMBO
with join
JOINER
splittable
attrition
67 ± 0
75 ± 0
67 ± 0
buttons
91 ± 0
100 ± 0
100 ± 0
buttons-g
100 ± 0
90 ± 0
98 ± 2
centipede
100 ± 0
100 ± 0
100 ± 0
coins
62 ± 0
100 ± 0
80 ± 0
coins-g
97 ± 0
97 ± 0
97 ± 0
horseshoe
67 ± 0
69 ± 8
100 ± 0
md
100 ± 0
100 ± 0
100 ± 0
rainbow
50 ± 0
100 ± 0
100 ± 0
rps
50 ± 0
82 ± 0
100 ± 0
sukoshi
50 ± 0
50 ± 0
99 ± 0
imdb1
100 ± 0
100 ± 0
100 ± 0
imdb2
100 ± 0
100 ± 0
100 ± 0
imdb3
100 ± 0
100 ± 0
100 ± 0
pharma1
50 ± 0
56 ± 2
57 ± 1
pharma2
50 ± 0
99 ± 1
98 ± 1
pharma3
50 ± 0
97 ± 1
96 ± 1
zendo1
94 ± 3
99 ± 0
99 ± 1
zendo2
70 ± 1
92 ± 1
93 ± 1
zendo3
79 ± 1
79 ± 1
80 ± 1
zendo4
93 ± 1
99 ± 1
99 ± 1
string1
50 ± 0
53 ± 1
53 ± 1
string2
50 ± 0
53 ± 1
53 ± 1
string3
50 ± 0
51 ± 2
100 ± 0
denoising 1c
52 ± 2
100 ± 0
100 ± 0
denoising mc
60 ± 10
99 ± 1
99 ± 1
fill
52 ± 2
100 ± 0
100 ± 0
flip
50 ± 0
85 ± 9
85 ± 9
hollow
55 ± 5
55 ± 5
55 ± 5
mirror
57 ± 1
61 ± 4
61 ± 4
move 1p
50 ± 0
100 ± 0
100 ± 0
move 2p
50 ± 0
80 ± 8
88 ± 4
move 2p dp
55 ± 2
87 ± 8
88 ± 9
move 3p
50 ± 0
62 ± 9
64 ± 9
move dp
55 ± 2
65 ± 6
62 ± 3
padded fill
52 ± 2
66 ± 3
66 ± 3
pcopy 1c
50 ± 0
86 ± 9
84 ± 9
pcopy mc
68 ± 6
97 ± 2
97 ± 2
recolor cmp
51 ± 1
55 ± 4
55 ± 4
recolor cnt
50 ± 0
51 ± 1
51 ± 1
recolor oe
50 ± 0
52 ± 2
52 ± 2
scale dp
61 ± 8
96 ± 2
97 ± 2
Table 2: Predictive accuracies with a 60s timeout. We round accura-
cies to integer values. The error is standard error.
E.2 Comparison against other systems
Tables 4 and 5 show the results with 60s and 600s timeout
respectively.

Page 15
Listing 1: zendo1
zendo1 (A): − piece (A,C) , size (C,B) , blue (C) , small (B) , contact (C,D) , red (D).
Listing 2: zendo2
zendo2 (A): − piece (A,B) , piece (A,D) , piece (A,C) , green (D) , red (B) , blue (C) .
zendo2 (A): − piece (A,D) , piece (A,B) , coord1 (B,C) , green (D) , lhs (B) , coord1 (D,C) .
Listing 3: zendo3
zendo3 (A): − piece (A,D) , blue (D) , coord1 (D,B) , piece (A,C) , coord1 (C,B) , red (C) .
zendo3 (A): − piece (A,D) , contact (D,C) , rhs (D) , size (C,B) , large (B).
zendo3 (A): − piece (A,B) , upright (B) , contact (B,D) , blue (D) , size (D,C) , large (C).
Listing 4: zendo4
zendo4 (A): − piece (A,C) , contact (C,B) , strange (B) , upright (C).
zendo4 (A): − piece (A,D) , contact (D,C) , coord2 (C,B) , coord2 (D,B) .
zendo4 (A): − piece (A,D) , contact (D,C) , size (C,B) , red (D) ,medium(B) .
zendo4 (A): − piece (A,D) , blue (D) , lhs (D) , piece (A,C) , size (C,B) , small (B).
Listing 5: pharma1
active (A): − atm (A,B) , typec (B) , bond1 (B,C) , typeo (C) , atm (A, F) , types (F) , bonddu (F ,G) ,
typen (G) , atm (A,D) , typeh (D) , bond2 (D,E) , typedu (E ) .
Listing 6: pharma2
active (A): − atm (A,B) , types (C) , bonddu (C,B) , atm (A,G) , typeh (F) , bond2 (F ,G) ,
atm (A,E) , typec (D) , bond1 (D,E ) .
active (A): − atm (A,B) , typena (C) , bondv (B,C) , atm (A,N) , typen (O) , bondu (N,O) ,
atm (A, F) , typedu (G) , bondar (G, F ) .
Listing 7: pharma3
active (A): − atm (A,D) , bond2 (D,E) , typeh (E) , atm (A,C) , bond1 (C,B) , typec (B) .
active (A): − atm (A,B) , typena (C) , bondv (B,C) , atm (A,N) , typen (O) , bondu (N,O) .
Listing 8: string2
f 1(A):− head(A,B),cm(B).
f 1(A):− tail(A,B),f 1(B).
f 2(A):− head(A,B),cd(B).
f 2(A):− tail(A,B),f 2(B).
f 3(A):− head(A,B),co(B).
f 3(A):− tail(A,B),f 3(B).
f 4(A):− head(A,B),cr(B).
f 4(A):− tail(A,B),f 4(B).
f 5(A):− head(A,B),cu(B).
f 5(A):− tail(A,B),f 5(B).
f(A):− f 1(A),f 2(A),f 3(A),f 4(A),f 5(A).
Figure 8: Example solutions.

Page 16
Listing 9: minimal decay
next value(A,B):−c player(D),c pressButton(C),c5(B),does(A,D,C).
next value(A,B):−c player(C),my true value(A,E),does(A,C,D),my succ(B,E),c noop(D).
Listing 10: rainbow
terminal(A):− mypos r3(M),true color(A,M,L),hue(L),mypos r6(H),true color(A,H,I),
hue(I),mypos r5(F),true color(A,F,G),hue(G),mypos r2(J),
true color(A,J,K),hue(K),mypos r1(D),true color(A,D,E),
hue(E),mypos r4(B),true color(A,B,C),hue(C).
Listing 11: horseshoe
terminal(A):− int 20(B),true step(A,B).
terminal(A):− mypos b(E),mypos c(G),true cell(A,E,F),true cell(A,G,F),mypos d(D),
true cell(A,D,C),true cell(A,B,C),adjacent(B,D).
terminal(A):− mypos c(B),mypos a(H),true cell(A,H,I),mypos b(J),true cell(A,J,I),
mypos e(E),true cell(A,B,D),true cell(A,E,F),true cell(A,G,F),
adjacent (E,G) , true cell (A,C,D) , adjacent (C,B).
Listing 12: buttons
next(A,B):−c p(B),c c(C),does(A,D,C),my true(A,B),my input(D,C).
next(A,B):−my input(C,E),c p(D),my true(A,D),c b(E),does(A,C,E),c q(B).
next(A,B):−my input(C,D),not my true(A,B),does(A,C,D),c p(B),c a(D).
next(A,B):−c a(C),does(A,D,C),my true(A,B),c q(B),my input(D,C).
next(A,B):−my input(C,E),c p(B),my true(A,D),c b(E),does(A,C,E),c q(D).
next(A,B):−c c(D),my true(A,C),c r(B),role(E),does(A,E,D),c q(C).
next(A,B):−my true(A,C),my succ(C,B).
next(A,B):−my input(C,D),does(A,C,D),my true(A,B),c r(B),c b(D).
next(A,B):−my input(C,D),does(A,C,D),my true(A,B),c r(B),c a(D).
next(A,B):−my true(A,E),c c(C),does(A,D,C),c q(B),c r(E),my input(D,C).
Listing 13: rps
next score(A,B,C):−does(A,B,E),different(G,B),my true score(A,B,F),beats(E,D),
my succ(F,C),does(A,G,D).
next score(A,B,C):−different(G,B),beats(D,F),my true score(A,E,C),
does (A,G,D) , does (A,E, F ) .
next score(A,B,C):−my true score(A,B,C),does(A,B,D),does(A,E,D),different(E,B).
Listing 14: coins
next cell(A,B,C):−does jump(A,E,F,D),role(E),different(B,D),
my true cell(A,B,C),different(F,B).
next cell(A,B,C):−my pos(E),role(D),c zerocoins(C),does jump(A,D,B,E).
next cell(A,B,C):−role(D),does jump(A,D,E,B),c twocoins(C),different(B,E).
next cell(A,B,C):−does jump(A,F,E,D),role(F),my succ(E,B),
my true cell(A,B,C),different(E,D).
Listing 15: coins-goal
goal(A,B,C):− role(B),pos 5(D),my true step(A,D),score 100(C).
goal(A,B,C):− c onecoin(E),my pos(D),my true cell(A,D,E),score 0(C),role(B).
Figure 8: Example solutions.

Page 17
Task
COMBO
with join
JOINER
splittable
attrition
67 ± 0
73 ± 2
67 ± 0
buttons
100 ± 0
100 ± 0
100 ± 0
buttons-g
100 ± 0
100 ± 0
96 ± 2
centipede
100 ± 0
100 ± 0
100 ± 0
coins
85 ± 9
100 ± 0
100 ± 0
coins-g
97 ± 0
97 ± 0
97 ± 0
horseshoe
67 ± 0
100 ± 0
100 ± 0
md
100 ± 0
100 ± 0
100 ± 0
rainbow
52 ± 0
100 ± 0
100 ± 0
rps
77 ± 0
100 ± 0
100 ± 0
sukoshi
100 ± 0
100 ± 0
100 ± 0
imdb1
100 ± 0
100 ± 0
100 ± 0
imdb2
100 ± 0
100 ± 0
100 ± 0
imdb3
100 ± 0
100 ± 0
100 ± 0
pharma1
50 ± 0
100 ± 0
100 ± 0
pharma2
50 ± 0
98 ± 1
97 ± 1
pharma3
58 ± 5
96 ± 1
96 ± 1
zendo1
100 ± 0
100 ± 0
100 ± 0
zendo2
72 ± 1
93 ± 1
100 ± 0
zendo3
79 ± 1
78 ± 1
79 ± 1
zendo4
92 ± 1
98 ± 1
98 ± 1
string1
50 ± 0
56 ± 1
100 ± 0
string2
51 ± 0
56 ± 1
100 ± 0
string3
49 ± 0
100 ± 0
100 ± 0
denoising 1c
52 ± 2
100 ± 0
100 ± 0
denoising mc
60 ± 10
99 ± 1
97 ± 2
fill
59 ± 5
100 ± 0
100 ± 0
flip
58 ± 8
94 ± 4
96 ± 4
hollow
55 ± 5
85 ± 10
90 ± 6
mirror
57 ± 1
78 ± 10
70 ± 3
move 1p
100 ± 0
100 ± 0
100 ± 0
move 2p
50 ± 0
81 ± 9
91 ± 9
move 2p dp
55 ± 2
89 ± 9
100 ± 0
move 3p
50 ± 0
64 ± 9
85 ± 3
move dp
55 ± 2
58 ± 2
83 ± 5
padded fill
52 ± 2
66 ± 3
86 ± 5
pcopy 1c
50 ± 0
84 ± 9
92 ± 6
pcopy mc
68 ± 6
97 ± 2
95 ± 3
recolor cmp
51 ± 1
58 ± 5
68 ± 5
recolor cnt
50 ± 0
61 ± 7
80 ± 4
recolor oe
50 ± 0
69 ± 8
75 ± 7
scale dp
61 ± 8
96 ± 2
100 ± 0
Table 3: Predictive accuracies with a 600s timeout. We round accura-
cies to integer values. The error is standard error.
Task
ALEPH
COMBO
JOINER
attrition
50 ± 0
67 ± 0
67 ± 0
buttons
50 ± 0
91 ± 0
100 ± 0
buttons-g
50 ± 0
100 ± 0
98 ± 2
centipede
100 ± 0
100 ± 0
100 ± 0
coins
50 ± 0
62 ± 0
80 ± 0
coins-g
100 ± 0
97 ± 0
97 ± 0
horseshoe
50 ± 0
67 ± 0
100 ± 0
md
97 ± 0
100 ± 0
100 ± 0
rainbow
50 ± 0
50 ± 0
100 ± 0
rps
50 ± 0
50 ± 0
100 ± 0
sukoshi
50 ± 0
50 ± 0
99 ± 0
imdb1
60 ± 10
100 ± 0
100 ± 0
imdb2
50 ± 0
100 ± 0
100 ± 0
imdb3
50 ± 0
100 ± 0
100 ± 0
pharma1
50 ± 0
50 ± 0
57 ± 1
pharma2
50 ± 0
50 ± 0
98 ± 1
pharma3
50 ± 0
50 ± 0
96 ± 1
zendo1
100 ± 0
94 ± 3
99 ± 1
zendo2
100 ± 0
70 ± 1
93 ± 1
zendo3
99 ± 0
79 ± 1
80 ± 1
zendo4
99 ± 1
93 ± 1
99 ± 1
string1
50 ± 0
50 ± 0
53 ± 1
string2
50 ± 0
50 ± 0
53 ± 1
string3
50 ± 0
50 ± 0
100 ± 0
denoising 1c
50 ± 0
52 ± 2
100 ± 0
denoising mc
50 ± 0
60 ± 10
99 ± 1
fill
50 ± 0
52 ± 2
100 ± 0
flip
50 ± 0
50 ± 0
85 ± 9
hollow
50 ± 0
55 ± 5
55 ± 5
mirror
50 ± 0
57 ± 1
61 ± 4
move 1p
50 ± 0
50 ± 0
100 ± 0
move 2p
50 ± 0
50 ± 0
88 ± 4
move 2p dp
50 ± 0
55 ± 2
88 ± 9
move 3p
50 ± 0
50 ± 0
64 ± 9
move dp
50 ± 0
55 ± 2
62 ± 3
padded fill
50 ± 0
52 ± 2
66 ± 3
pcopy 1c
50 ± 0
50 ± 0
84 ± 9
pcopy mc
50 ± 0
68 ± 6
97 ± 2
recolor cmp
50 ± 0
51 ± 1
55 ± 4
recolor cnt
50 ± 0
50 ± 0
51 ± 1
recolor oe
50 ± 0
50 ± 0
52 ± 2
scale dp
50 ± 0
61 ± 8
97 ± 2
Table 4: Predictive accuracies with a 60s timeout. We round accura-
cies to integer values. The error is standard error.

Page 18
Task
ALEPH
COMBO
JOINER
attrition
50 ± 0
67 ± 0
67 ± 0
buttons
100 ± 0
100 ± 0
100 ± 0
buttons-g
100 ± 0
100 ± 0
96 ± 2
centipede
100 ± 0
100 ± 0
100 ± 0
coins
50 ± 0
85 ± 9
100 ± 0
coins-g
100 ± 0
97 ± 0
97 ± 0
horseshoe
65 ± 0
67 ± 0
100 ± 0
md
97 ± 0
100 ± 0
100 ± 0
rainbow
50 ± 0
52 ± 0
100 ± 0
rps
100 ± 0
77 ± 0
100 ± 0
sukoshi
50 ± 0
100 ± 0
100 ± 0
imdb1
100 ± 0
100 ± 0
100 ± 0
imdb2
50 ± 0
100 ± 0
100 ± 0
imdb3
50 ± 0
100 ± 0
100 ± 0
pharma1
50 ± 0
50 ± 0
100 ± 0
pharma2
50 ± 0
50 ± 0
97 ± 1
pharma3
50 ± 0
58 ± 5
96 ± 1
zendo1
100 ± 0
100 ± 0
100 ± 0
zendo2
100 ± 0
72 ± 1
100 ± 0
zendo3
100 ± 0
79 ± 1
79 ± 1
zendo4
99 ± 0
92 ± 1
98 ± 1
string1
50 ± 0
50 ± 0
100 ± 0
string2
50 ± 0
51 ± 0
100 ± 0
string3
50 ± 0
49 ± 0
100 ± 0
denoising 1c
50 ± 0
52 ± 2
100 ± 0
denoising mc
50 ± 0
60 ± 10
97 ± 2
fill
50 ± 0
59 ± 5
100 ± 0
flip
50 ± 0
58 ± 8
96 ± 4
hollow
55 ± 5
55 ± 5
90 ± 6
mirror
60 ± 8
57 ± 1
70 ± 3
move 1p
50 ± 0
100 ± 0
100 ± 0
move 2p
50 ± 0
50 ± 0
91 ± 9
move 2p dp
50 ± 0
55 ± 2
100 ± 0
move 3p
50 ± 0
50 ± 0
85 ± 3
move dp
50 ± 0
55 ± 2
83 ± 5
padded fill
50 ± 0
52 ± 2
86 ± 5
pcopy 1c
50 ± 0
50 ± 0
92 ± 6
pcopy mc
50 ± 0
68 ± 6
95 ± 3
recolor cmp
50 ± 0
51 ± 1
68 ± 5
recolor cnt
56 ± 6
50 ± 0
80 ± 4
recolor oe
50 ± 0
50 ± 0
75 ± 7
scale dp
50 ± 0
61 ± 8
100 ± 0
Table 5: Predictive accuracies with a 600s timeout. We round accura-
cies to integer values. The error is standard error.