arXiv:2401.16215v1 [cs.LG] 29 Jan 2024

Page 1

Learning Big Logical Rules by Joining Small Rules

Céline Hocquette1, Andreas Niskanen2, Rolf Morel1, Matti Järvisalo2, Andrew Cropper1

1University of Oxford

2University of Helsinki

[email protected]

Abstract

A major challenge in inductive logic programming

is learning big rules. To address this challenge, we

introduce an approach where we join small rules to

learn big rules. We implement our approach in a

constraint-driven system and use constraint solvers

to efficiently join rules. Our experiments on many

domains, including game playing and drug design,

show that our approach can (i) learn rules with more

than 100 literals, and (ii) drastically outperform ex-

isting approaches in terms of predictive accuracies.

1 Introduction

Zendo is an inductive reasoning game. One player, the teacher,

creates a secret rule that describes structures. The other play-

ers, the students, try to discover the secret rule by building

structures. The teacher marks whether structures follow or

break the rule. The first student to correctly guess the rule wins.

For instance, for the positive examples shown in Figure 1, a

possible rule is “there is a blue piece”.

Figure 1: Two positive Zendo examples.

To use machine learning to play Zendo, we need to learn

explainable rules from a small number of examples. Although

crucial for many real-world problems, many machine-learning

approaches struggle with this type of learning [Cropper et

al., 2022]. Inductive logic programming (ILP) [Muggleton,

1991] is a form of machine learning that can learn explainable

rules from a small number of examples. For instance, for the

examples in Figure 1, an ILP system could learn the rule:

h1 = 1 zendo(S) ← piece(S,B), blue(B) l

This rule says that the relation zendo holds for the structure S

when there is a blue piece B.

Suppose we also have the three negative examples shown

in Figure 2. Our previous rule incorrectly entails the first and

second negative examples, as they have a blue piece. To entail

Figure 2: Three negative Zendo examples.

all the positive and no negative examples, we need a bigger

rule, such as:

h2 =

{ zendo(S) ← piece(S,B), blue(B),

piece(S,R), red(R),

piece(S,G), green(G)

}

This rule says that the relation zendo holds for a structure

when there is a blue piece, a red piece, and a green piece.

Most ILP approaches can learn small rules, such as h1.

However, many struggle to learn bigger rules, such as h2

This limitation noticeably holds for approaches which pre-

compute all possible rules and search for a subset of them

[Corapi et al., 2011; Law et al., 2014; Kaminski et al., 2019;

Si et al., 2019; Raghothaman et al., 2020; Evans et al., 2021;

Bembenek et al., 2023]. As they precompute all rules, these

approaches struggle to learn rules with more than a few literals.

To address this limitation, we introduce an approach that

learns big rules by joining small rules. The idea is to first

find small rules where each rule entails some positive and

some negative examples. We then search for conjunctions of

these small rules such that each conjunction entails at least

one positive example but no negative examples.

To illustrate our idea, consider our Zendo scenario. We first

search for rules that entail at least one positive example, such

as:

r1 = 1 zendo1(S) ← piece(S,B), blue(B) l

r2 = 1 zendo2(S) ← piece(S,R), red(R) l

r3 = 1 zendo3(S) ← piece(S,G), green(G) l

r4 = 1 zendo4(S) ← piece(S,Y), yellow(Y) l

Each of these rules entails at least one negative example.

Therefore, we search for a subset of these rules where the

intersection of the logical consequences of the subset entails

at least one positive example and no negative examples. The

1A rule of size 7 is not especially big but for readability we do

not use a bigger rule in this example. In our experiments, we show

we can learn similar rules with over 100 literals.

arXiv:2401.16215v1 [cs.LG] 29 Jan 2024

Page 2

set of rules 1r1,r2,r3l satisfies these criteria. We therefore

form a hypothesis from the conjunction of these rules:

h3 = 1 zendo(S) ← zendo1(S), zendo2(S), zendo3(S) l

The hypothesis h3 entails all the positive but none of the

negative examples and has the same logical consequences

(restricted to zendo/1 atoms) as h2.

The main benefit of our approach is that we can learn rules

with over 100 literals, which existing approaches cannot. Our

approach works because we decompose a learning task into

smaller tasks that can be solved separately. For instance, in-

stead of directly searching for a rule of size 7 to learn h2, we

search for rules of size 3 (r1 to r3) and try to join them to

learn h3. As the search complexity of ILP is usually expo-

nential in the size of the program to be learned, this reduction

can substantially improve learning performance. Moreover,

because we can join small rules to learn big rules of a cer-

tain syntactic form, we can eliminate splittable rules from the

search space. We formally define a splittable rule in Section 4,

but informally the body of a splittable rule can be decomposed

into independent subsets, such as the body of h2 in our Zendo

example.

To explore our idea, we build on learning from failures

(LFF) [Cropper and Morel, 2021], a constraint-driven ILP

approach. We extend the LFF system COMBO [Cropper and

Hocquette, 2023] with a join stage to learn programs with big

rules. We develop a Boolean satisfiability (SAT) [Biere et al.,

2021] approach to find conjunctions in the join stage. We call

our implementation JOINER.

Novelty and Contributions. Our main contribution is the

idea of joining small rules to learn big rules which, as our

experiments on many diverse domains show, can improve

predictive accuracies. Overall, our main contributions are:

1. We introduce an approach which joins small rules to learn

big rules.

2. We implement our approach in JOINER, which learns

optimal (textually minimal) and recursive programs. We

prove the correctness of JOINER (Theorem 1).

3. We experimentally show on several domains, including

game playing, drug design, and string classification, that

our approach can (i) learn rules with more than 100 liter-

als, and (ii) drastically outperform existing approaches in

terms of predictive accuracy.

2 Related Work

Program synthesis. Several approaches build a program

one token at a time using an LSTM [Devlin et al., 2017;

Bunel et al., 2018]. CROSSBEAM [Shi et al., 2022] uses a

neural model to generate programs as the compositions of

seen subprograms. CROSSBEAM is not guaranteed to learn

a solution if one exists and can only use unary and binary

relations. By contrast, JOINER can use relations of any arity.

Rule mining. AMIE+ [Galárraga et al., 2015] is a promi-

nent rule mining approach. In contrast to JOINER, AMIE+

can only use unary and binary relations and struggles to learn

rules with more than 4 literals.

ILP. Top-down ILP systems [Quinlan, 1990; Blockeel and

De Raedt, 1998] specialise rules with refinement operators

[Shapiro, 1983]. Because they learn a single rule at a time

and add a single literal at each step, these systems struggle

to learn recursive and optimal programs. Recent approaches

overcome these issues by formulating the search as a rule

selection problem [Corapi et al., 2011; Evans and Grefenstette,

2018; Kaminski et al., 2019; Si et al., 2019; Raghothaman et

al., 2020; Evans et al., 2021; Bembenek et al., 2023]. These

approaches precompute all possible rules (up to a maximum

rule size) and thus struggle to learn rules with more than a few

literals. By contrast, we avoid enumeration and use constraints

to soundly prune rules. Moreover, our join stage allows us to

learn rules with more than 100 literals.

Many rules. COMBO [Cropper and Hocquette, 2023]

searches for a disjunction of small programs that entails all

the positive examples. COMBO learns optimal and recursive

programs and large programs with many rules. However, it

struggles to learn rules with more than 6 literals. Our approach

builds on COMBO and can learn rules with over 100 literals.

Big rules. Inverse entailment approaches [Muggleton, 1995;

Srinivasan, 2001] can learn big rules by returning bottom

clauses. However, these approaches struggle to learn optimal

and recursive programs and tend to overfit. Ferilli [2016]

specialises rules in a theory revision task. This approach can

learn negated conjunctions but only Datalog preconditions

and not recursive programs. BRUTE [Cropper and Dumancic,

2020] can learn recursive programs with hundreds of literals.

However, in contrast to our approach, BRUTE needs a user-

provided domain-specific loss function, does not learn optimal

programs, and can only use binary relations.

Splitting rules. Costa et al. [2003] split rules into conjunc-

tions of independent goals that can be executed separately to

avoid unnecessary backtracking and thus to improve execution

times. By contrast, we split rules to reduce search complexity.

Costa et al. allow joined rules to share variables, whereas we

prevent joined rules from sharing body-only variables.

3 Problem Setting

We now describe our problem setting. We assume familiarity

with logic programming [Lloyd, 2012] but have stated relevant

notation and definitions in the appendix.

We use the learning from failures (LFF) [Cropper and Morel,

2021] setting. We restate some key definitions [Cropper and

Hocquette, 2023]. A hypothesis is a set of definite clauses with

the least Herbrand model semantics. We use the term program

interchangeably with the term hypothesis. A hypothesis space

E is a set of hypotheses. LFF assumes a language L that

defines hypotheses. A LFF learner uses hypothesis constraints

to restrict the hypothesis space. A hypothesis constraint is a

constraint (a headless rule) expressed in L. A hypothesis h is

consistent with a set of constraints C if, when written in L, h

does not violate any constraint in C. We call EC the subset of

E consistent with C. We define a LFF input and a solution:

Definition 1 (LFF input). A LFF input is a tuple

(E,B, E,C) where E = (E+,E−) is a pair of sets of ground

atoms denoting positive (E+) and negative (E−) examples, B

Page 3

is a definite program denoting background knowledge, E is a

hypothesis space, and C is a set of hypothesis constraints.

Definition 2 (Solution). For a LFF input (E,B, E,C), where

E = (E+,E−), a hypothesis h ∈ EC is a solution when h

entails every example in E+ (Ve ∈ E+, B U h |= e) and no

example in E− (Ve ∈ E−, B U h ̸|= e).

Let cost : E ↦→ N be an arbitrary cost function that measures

the cost of a hypothesis. We define an optimal solution:

Definition 3 (Optimal solution). For a LFF input

(E,B, E,C), a hypothesis h ∈ EC is optimal when (i) h

is a solution, and (ii) Vh′ ∈ EC, where h′ is a solution,

cost(h) ≤ cost(h′).

Our cost function is the number of literals in a hypothesis.

A hypothesis which is not a solution is a failure. For a

hypothesis h, the number of true positives (tp) and false nega-

tives (fn) is the number of positive examples entailed and not

entailed by h respectively. The number of false positives (fp)

is the number of negative examples entailed by h.

4 Algorithm

To describe JOINER, we first describe COMBO [Cropper and

Hocquette, 2023], which we build on.

COMBO. COMBO takes as input background knowledge,

positive and negative training examples, and a maximum hy-

pothesis size. COMBO builds a constraint satisfaction problem

(CSP) program c, where each model of c corresponds to a hy-

pothesis (a definite program). In the generate stage, COMBO

searches for a model of c for increasing hypothesis sizes. If no

model is found, COMBO increments the hypothesis size and

resumes the search. If there is a model, COMBO converts it to

a hypothesis h. In the test stage, COMBO uses Prolog to test h

on the training examples. If h is a solution, COMBO returns

it. Otherwise, if h entails at least one positive example and no

negative examples, COMBO saves h as a promising program.

In the combine stage, COMBO searches for a combination

(a union) of promising programs that entails all the positive

examples and is minimal in size. If there is a combination,

COMBO saves it as the best combination so far and updates

the maximum hypothesis size. In the constrain stage, COMBO

uses h to build constraints and adds them to c to prune models

and thus prune the hypothesis space. For instance, if h does

not entail any positive example, COMBO adds a constraint

to eliminate its specialisations as they are guaranteed to not

entail any positive example. COMBO repeats this loop until it

finds an optimal (textually minimal) solution or exhausts the

models of c.

4.1 Joiner

Algorithm 1 shows our JOINER algorithm. JOINER builds on

COMBO and uses a generate, test, join, combine, and constrain

loop to find an optimal solution (Definition 3). JOINER differs

from COMBO by (i) eliminating splittable programs in the

generate stage (line 5), (ii) using a join stage to build big rules

from small rules and saving them as promising programs (line

16), and (iii) using different constraints (line 20). We describe

these differences in turn.

Algorithm 1 JOINER

1 def Joiner(bk, E+, E−, maxsize):

2 cons, to_join, to_combine = {}, {}, {}

3 bestsol, k = None, 1

4 while k ≤ maxsize:

h = generate_non_splittable(cons, k)

if h == UNSAT:

k += 1

continue

tp, fn, fp = test(E+, E−, bk, h)

if fn == 0 and fp == 0:

return h

elif tp > 0 and fp == 0:

to_combine += {h}

elif tp > 0 and fp > 0:

to_join += {h}

to_combine += join(to_join, bestsol,E+, E−, k)

combi = combine(to_combine, maxsize, bk, E−)

if combi != None:

bestsol, maxsize = combi, size(bestsol)-1

cons += constrain(h, tp, fp)

21 return bestsol

4.2 Generate

In our generate stage, we eliminate splittable programs be-

cause we can build them in the join stage. We define a split-

table rule:

Definition 4 (Splittable rule). A rule is splittable if its body

literals can be partitioned into two non-empty sets such that

the body-only variables (a variable in the body of a rule but

not the head) in the literals of one set are disjoint from the

body-only variables in the literals of the other set. A rule is

non-splittable if it is not splittable.

Example 1. (Splittable rule) Consider the rule:

1 zendo(S) ← piece(S,R), red(R), piece(S,B), blue(B) l

This rule is splittable because its body literals can be par-

titioned into two sets 1piece(S,R), red(R)l and 1piece(S,B),

blue(B)l, with body-only variables 1Rl and 1Bl respectively.

Example 2. (Non-splittable rule) Consider the rule:

{ zendo(S) ← piece(S,R), red(R), piece(S,B),

blue(B), contact(R,B)

}

This rule is non-splittable because each body literal contains

the body-only variable R or B and one literal contains both.

We define a splittable program:

Definition 5 (Splittable program). A program is splittable if

and only if it has exactly one rule and this rule is splittable. A

program is non-splittable when it is not splittable.

We use a constraint to prevent the CSP solver from considering

models with splittable programs. The appendix includes our

encoding of this constraint. At a high level, we first identify

connected body-only variables. Two body-only variables A

and B are connected if they are in the same body literal, or

if there exists another body-only variable C such that A and

Page 4

C are connected and B and C are connected. Our constraint

prunes programs with a single rule which has (i) two body-only

variables that are not connected, or (ii) multiple body literals

and at least one body literal without body-only variables. As

our experiments show, eliminating splittable programs can

substantially improve learning performance.

4.3 Join

Algorithm 2 shows our join algorithm, which takes as input a

set of programs and their coverage (P), where each program

entails some positive and some negative examples, the best

solution found thus far (bestsol), the positive examples (E+),

the negative examples (E−), and a maximum conjunction

size (k). It returns subsets of P, where the intersection of the

logical consequences of the programs in each subset entails at

least one positive example and no negative example. We call

such subsets conjunctions.

We define a conjunction:

Definition 6 (Conjunction). A conjunction is a set of pro-

grams with the same head literal. We call M(p) the least

Herbrand model of the logic program p. The logical conse-

quences of a conjunction c is the intersection of the logical

consequences of the programs in it: M(c) = np∈cM(p). The

cost of a conjunction c is the sum of the cost of the programs

in it: cost(c) = ∑p∈c cost(p).

Our join algorithm has two stages. We first search for con-

junctions that together entail all the positive examples, which

allows us to quickly find a solution (Definition 2). If we have

a solution, we enumerate all remaining conjunctions to guar-

antee optimality (Definition 3). In other words, at each call

of the join stage (line 16 in Algorithm 1), we either run the

incomplete or the complete join stage (line 3 or 5 in Algorithm

2). We describe these two stages.

Incomplete Join Stage

If we do not have a solution, we use a greedy set-covering

algorithm to try to cover all the positive examples. We initially

mark each positive example as uncovered (line 8). We then

search for a conjunction that entails the maximum number

of uncovered positive examples (line 11). We repeat this

loop until we have covered all the positive examples or there

are no more conjunctions. This step allows us to first find

conjunctions with large coverage and quickly build a solution.

However, this solution may not be optimal.

Complete Join Stage

If we have a solution, we find all remaining conjunctions to

ensure we consider all splittable programs and thus ensure

optimality. However, we do not need to find all conjunctions

as some cannot be in an optimal solution. If a conjunction

entails a subset of the positive examples entailed by a strictly

smaller conjunction then it cannot be in an optimal solution:

Proposition 1. Let c1 and c2 be two conjunctions which do not

entail any negative examples, c1 |= E+

1 , c2 |= E+

2 , E+

2 Ç E+

1 ,

and size(c1) < size(c2). Then c2 cannot be in an optimal

solution.

The appendix contains a proof for this result. Following this

result, our join stage enumerates conjunctions by increasing

Algorithm 2 Join stage

1 def join(P, bestsol, E+, E−, k):

2 if bestsol == None:

return incomplete_join(P, E+, E−)

4 else:

return complete_join(P, E+, E−,k)

7 def incomplete_join(P, E+, E−):

8 uncovered, conjunctions = E+, {}

9 while uncovered:

encoding = buildencoding(P, E+, E−, conjunctions)

conj = conj_max_coverage(uncovered, encoding)

if not conj:

break

uncovered -= pos_entailed(conj)

conjunctions += {conj}

16 return conjunctions

18 def complete_join(P, E+, E−, k):

19 conjunctions = {}

20 while True:

encoding = buildencoding(P, E+, E−, conjunctions)

encoding += sizeconstraint(k)

τ = find_assignment(encoding)

if not τ:

break

while True:

assignment = cover_more_pos(encoding, τ)

if not assignment:

break

τ = assignment

conjunctions += {conjunction(τ)}

32 return conjunctions

size. For increasing values of k, we search for all subset-

maximal coverage conjunctions of size k, i.e. conjunctions

which entail subsets of the positive examples not included

between each other.

Example 3 (Join stage). Consider the positive examples

E+ = 1f([a, b, c, d]),f([c, b, d, e])l, the negative examples

E− = 1f([c, b]),f([d, b]),f([a, c, d, e])l, and the programs:

p1 = 1 f(S) ← head(S,a) l

p2 = 1 f(S) ← last(S,e) l

p3 = 1 f(S) ← tail(S,T), head(T,b) l

p4 =

{ f(S) ← head(S,c)

f(S) ← tail(S,T), f(T)

}

p5 =

{ f(S) ← head(S,d)

f(S) ← tail(S,T), f(T)

}

Each of these programs entails at least one positive and one

negative example. The incomplete join stage first outputs the

conjunction c1 = 1p3,p4,p5l as it entails all the positive ex-

amples and no negative example. The complete join stage then

outputs the conjunctions c2 = 1p1,p3l and c3 = 1p2,p3l.

The other conjunctions are not output because they (i) do

not entail any positive example, or (ii) entail some negative

example, or (iii) are subsumed by c2 or c3.

Page 5

Finding Conjunctions using SAT

To find conjunctions, we use a Boolean satisfiabil-

ity (SAT) [Biere et al., 2021] approach. We build a propo-

sitional encoding (lines 10 and 21) for the join stage as fol-

lows. Let P be the set of input programs. For each pro-

gram h ∈ P, the variable ph indicates that h is in a con-

junction. For each positive example e ∈ E+, the variable

ce indicates that the conjunction entails e. The constraint

F+

= ce → ∧h∈P|B∪h̸|=e ¬ph ensures that if the conjunc-

tion entails e, then every program in the conjunction en-

tails e. For each negative example e ∈ E−, the constraint

F−

= ∨h∈P|B∪h̸|=e ph ensures that at least one of the pro-

grams in the conjunction does not entail e.

Subset-maximal coverage conjunctions. For the complete

join stage, to find all conjunctions of size k with subset-

maximal coverage, we use a SAT solver to enumerate max-

imal satisfiable subsets [Liffiton and Sakallah, 2008] corre-

sponding to the subset-maximal coverage conjunctions on

the following propositional encoding. We build the con-

straint S = ∑h size(h) · ph ≤ k to bound the size of the

conjunctions and we encode S as a propositional formula

FS [Manthey et al., 2014]. To find a conjunction with subset-

maximal coverage, we iteratively call a SAT solver on the

formula F = ∧e∈E+ F+

e < ∧e∈E− F−

e < FS. If F has a

satisfying assignment τ (line 23), we form a conjunction

c by including a program h iff τ(h) = 1 and update F to

F < ∧e∈E+|B∪c|=e ce < ∨e∈E+|B∪c̸|=e ce to ensure that sub-

sequent conjunctions cover more examples (line 27). We

iterate until F is unsatisfiable (lines 26 to 30), in which

case c has subset-maximal coverage. To enumerate all con-

junctions, we iteratively call this procedure on the formula

F < ∧c∈C

∨

e∈E+|B∪c̸|=e ce, where C is the set of conjunc-

tions found so far.

Maximal coverage conjunctions. For the incomplete join

stage, we use maximum satisfiability (MaxSAT) [Bacchus

et al., 2021] solving to find conjunctions which entail the

maximum number of uncovered positive examples (line 11).

The MaxSAT encoding includes the hard clauses ∧e∈E+ F+

e <

∧

e∈E− F−

e to ensure correct coverage, as well as soft clauses

(ce) for each uncovered positive example e to maximise the

number of uncovered examples.

4.4 Constrain

In the constrain stage (line 20 in Algorithm 1), JOINER uses

two optimally sound constraints to prune the hypothesis space.

If a hypothesis does not entail any positive example, JOINER

prunes all its specialisations, as they cannot be in a conjunction

in an optimal solution:

Proposition 2. Let h1 be a hypothesis that does not entail any

positive example and h2 be a specialisation of h1. Then h2

cannot be in a conjunction in an optimal solution.

If a hypothesis does not entail any negative example, JOINER

prunes all its specialisations, as they cannot be in a conjunction

in an optimal solution:

Proposition 3. Let h1 be a hypothesis that does not entail any

negative example and h2 be a specialisation of h1. Then h2

cannot be in a conjunction in an optimal solution.

The appendix includes proofs for these propositions.

4.5 Correctness

We prove the correctness of JOINER:

Theorem 1. JOINER returns an optimal solution, if one exists.

The proof is in the appendix. To show this result, we show

that (i) JOINER can generate and test every non-splittable

program, (ii) each rule of an optimal solution is equivalent to

the conjunction of non-splittable rules, and (iii) our constraints

(Propositions 1, 2, and 3) never prune optimal solutions.

5 Experiments

To test our claim that our join stage can improve learning

performance, our experiments aim to answer the question:

Q1 Can the join stage improve learning performance?

To answer Q1, we compare learning with and without the join

stage.

To test our claim that eliminating splittable programs in

the generate stage can improve learning performance, our

experiments aim to answer the question:

Q2 Can eliminating splittable programs in the generate stage

improve learning performance?

To answer Q2, we compare learning with and without the

constraint eliminating splittable programs.

To test our claim that JOINER can learn programs with big

splittable rules, our experiments aim to answer the question:

Q3 How well does JOINER scale with the size of splittable

rules?

To answer Q3, we vary the size of rules and evaluate the

performance of JOINER.

Finally, to test our claim that JOINER can outperform other

approaches, our experiments aim to answer the question:

Q4 How well does JOINER compare against other ap-

proaches?

To answer Q4, we compare JOINER against COMBO [Crop-

per and Hocquette, 2023] and ALEPH [Srinivasan, 2001] on

multiple tasks and domains2.

Domains

We consider several domains. The appendix provides addi-

tional information about our domains and tasks.

IGGP. In inductive general game playing (IGGP) [Cropper et

al., 2020], the task is to learn rules from game traces from the

general game playing competition [Genesereth and Björnsson,

2013].

Zendo. Zendo is an inductive game where the goal is to

identify a secret rule that structures must follow [Bramley et

al., 2018; Cropper and Hocquette, 2023].

IMDB. The international movie database (IMDB) [Mihalkova

et al., 2007] is a relational domain containing relations be-

tween movies, directors, and actors.

Pharmacophores. The task is to identify structures responsi-

ble for the medicinal activity of molecules [Finn et al., 1998].

2We also tried rule selection approaches (Section 2) but precom-

puting every possible rule is infeasible on our datasets.

Page 6

Strings. The goal is to identify recursive patterns to classify

strings.

1D-ARC. This dataset [Xu et al., 2023] contains visual reason-

ing tasks inspired by the abstract reasoning corpus [Chollet,

2019].

Experimental Setup

We use 60s and 600s timeouts. We repeat each experiment 5

times. We measure predictive accuracies (the proportion of

correct predictions on testing data). For clarity, our figures

only show tasks where the approaches differ. The appendix

contains the detailed results for each task. We use an 8-core 3.2

GHz Apple M1 and a single CPU to run the experiments. We

use the MaxSAT solver UWrMaxSat [Piotrów, 2020] and the

SAT solver CaDiCaL 1.5.3 [Biere et al., 2023] (via PySAT [Ig-

natiev et al., 2018]) in the join stage of JOINER.

Q1. We compare learning with and without the join stage.

To isolate the impact of the join stage, we allow splittable

programs in the generate stage.

Q2. We compare the predictive accuracies of JOINER with

and without the constraint that eliminates splittable programs.

Q3. To evaluate scalability, for increasing values of k, we

generate a task where an optimal solution has size k. We learn

a hypothesis with a single splittable rule. We use a zendo task

similar to the one shown in Section 1 and a string task.

Q4. We provide JOINER and COMBO with identical input.

The only differences are (i) the join stage, and (ii) the elimi-

nation of splittable programs in the generate stage. However,

because it can build conjunctions in the join stage, JOINER

searches a larger hypothesis space than COMBO. ALEPH uses

a different bias than JOINER to define the hypothesis space.

In particular, ALEPH expects a maximum rule size as input.

Therefore, the comparison is less fair and should be viewed as

indicative only.

Reproducibility. The code and experimental data for re-

producing the experiments are provided as supplementary

material and will be made publicly available if the paper is

accepted for publication.

Experimental Results

Q1. Can the Join Stage Improve Performance?

Figure 3 shows that the join stage can drastically improve

predictive accuracies. A McNeymar’s test confirms (p < 0.01)

that the join stage improves accuracies on 24/42 tasks with a

60s timeout and on 22/42 tasks with a 600s timeout. There is

no significant difference for the other tasks.

The join stage can learn big rules which otherwise cannot

be learned. For instance, for the task pharma1, the join stage

finds a rule of size 17 which has 100% accuracy. By contrast,

without the join stage, no solution is found, resulting in default

accuracy (50%). Similarly, an optimal solution for the task

iggp-rainbow has a single rule of size 19. This rule is splittable

and is the conjunction of 6 small rules. The join stage identifies

this rule in less than 1s as it entails all the positive examples.

By contrast, without the join stage, the system exceeds the

timeout without finding a solution as it needs to search through

the set of all rules up to size 19 to find a solution.

The overhead of the join stage is small. For instance, for

the task scale in the 1D-ARC domain, the join stage takes less

than 1% of the total learning time, yet this stage allows us to

find a perfectly accurate solution with a rule of size 13.

Overall, the results suggest that the answer to Q1 is that the

join stage can substantially improve predictive accuracy.

50 60 70 80 90 100

100

without join stage

with

join

stage

50 60 70 80 90 100

100

without join stage

with

join

stage

Figure 3: Predictive accuracy (%) with and without join stage with

60s (left) and 600s (right) timeouts.

Q2. Can Eliminating Splittable Programs Improve

Performance?

Figure 4 shows that eliminating splittable programs in the gen-

erate stage can improve learning performance. A McNeymar’s

test confirms (p < 0.01) that eliminating splittable programs

improves performance on 8/42 tasks with a 60s timeout and

on 11/42 tasks with a 600s timeout. It degrades performance

(p < 0.01) on 1/42 tasks with a 60s timeout. There is no

significant difference for the other tasks.

Eliminating splittable programs from the generate stage

can greatly reduce the number of programs JOINER considers.

For instance, for iggp-rainbow, the hypothesis space contains

1,986,422 rules of size at most 6, but only 212,564 are non-

splittable. Likewise, for string2, when eliminating splittable

programs, JOINER finds a perfectly accurate solution in 133s

(2min13s). By contrast, with splittable programs, JOINER

considers more programs and fails to find a solution within

the 600s timeout, resulting in default accuracy (50%).

Overall, these results suggest that the answer to Q2 is that

eliminating splittable programs from the generate stage can

improve predictive accuracies.

50 60 70 80 90 100

100

with splittable

without

splittable

50 60 70 80 90 100

100

with splittable

without

splittable

Figure 4: Predictive accuracies (%) with and without generating

splittable programs with 60s (left) and 600s (right) timeouts.

Q3. How Well Does JOINER Scale?

Figure 5 shows that JOINER can learn an almost perfectly

accurate hypothesis with up to 100 literals for both the zendo

and string tasks. By contrast, COMBO and ALEPH struggle to

learn hypotheses with more than 10 literals.

Page 7

JOINER learns a zendo hypothesis of size k after searching

for programs of size 4. By contrast, COMBO must search for

programs up to size k to find a solution. Similarly, an optimal

solution for the string task is the conjunction of programs

with 6 literals each. By contrast, COMBO must search for

programs up to size k to find a solution. ALEPH struggles to

learn recursive programs and thus struggles on the string task.

Overall, the results suggest that the answer to Q3 is that

JOINER can scale well with the size of rules.

JOINER

COMBO

ALEPH

80 100

100

Optimal solution size

Accuracy

100

Optimal solution size

Accuracy

Figure 5: Predictive accuracies (%) when varying the optimal solution

size for zendo (left) and string (right) with a 600s timeout.

Q4. How Does JOINER Compare to Other Approaches?

Table 1 shows the predictive accuracies aggregated over each

domain. It shows JOINER achieves higher accuracies than

COMBO and ALEPH on almost all domains.

Task

ALEPH

COMBO

JOINER

iggp

78 ± 3

86 ± 2

96 ± 1

zendo

100 ± 0

86 ± 3

94 ± 2

pharma

50 ± 0

53 ± 2

98 ± 1

imdb

67 ± 6

100 ± 0

string

50 ± 0

100 ± 0

onedarc

51 ± 1

57 ± 2

89 ± 1

Table 1: Aggregated predictive accuracies (%) with a 600s timeout.

Figure 6 shows that JOINER outperforms COMBO. A Mc-

Neymar’s test confirms (p < 0.01) that JOINER outperforms

COMBO on 27/42 tasks with a 60s timeout and on 26/42 tasks

with a 600s timeout. JOINER and COMBO have similar perfor-

mance on other tasks.

JOINER can find hypotheses with big rules. For example,

the flip task in the 1D-ARC domain involves reversing the

order of colored pixels in an image. JOINER finds a solution

with two splittable rules of size 9 and 16. By contrast, COMBO

only searches programs of size at most 4 before it timeouts and

it does not learn any program. JOINER can also perform better

when learning non-splittable programs. For instance, JOINER

learns a perfectly accurate solution for iggp-rps and proves that

this solution is optimal in 20s. By contrast, COMBO requires

440s (7min20s) to find the same solution and prove optimality.

Figure 7 shows that JOINER outperforms ALEPH. A Mc-

Neymar’s test confirms (p < 0.01) that JOINER outperforms

ALEPH on 28/42 tasks with both 60s and 600s timeouts.

ALEPH outperforms (p < 0.01) JOINER on 4/42 tasks with a

60s timeout and on 2/42 tasks with a 600s timeout. JOINER

and ALEPH have similar performance on other tasks.

50 60 70 80 90 100

100

COMBO

50 60 70 80 90 100

100

COMBO

Figure 6: Predictive accuracies (%) of JOINER versus COMBO with

60s (left) and 600s (right) timeouts.

ALEPH struggles to learn recursive programs and there-

fore does not perform well on the string tasks. JOINER also

consistently surpasses ALEPH on tasks which do not require

recursion. For instance, JOINER achieves 98% average accu-

racy on the pharma tasks while ALEPH has default accuracy

(50%). However, for zendo3, ALEPH can achieve better accu-

racies (100% vs 79%) than JOINER. An optimal solution is

not splittable and JOINER exceeds the timeout.

Overall, the results suggest that the answer to Q4 is that

JOINER can outperform other approaches in terms of predic-

tive accuracy.

50 60 70 80 90 100

100

ALEPH

50 60 70 80 90 100

100

ALEPH

Figure 7: Predictive accuracies (%) of JOINER versus ALEPH with

60s (left) and 600s (right) timeouts.

6 Conclusions and Limitations

Learning programs with big rules is a major challenge. To ad-

dress this challenge, we introduced an approach which learns

big rules by joining small rules. We implemented our approach

in JOINER, which can learn optimal and recursive programs.

Our experiments on various domains show that JOINER can (i)

learn splittable rules with more than 100 literals, and (ii) out-

perform existing approaches in terms of predictive accuracies.

Limitations

Splittability. Our join stage builds splittable rules. The body

of a splittable rule is split into subsets which do not share body-

only variables. Future work should generalise our approach to

split rules into subsets which may share body-only variables.

Noise. In the join stage, we search for conjunctions which

entail some positive examples and no negative examples. Our

approach does not support noisy examples. Hocquette et al.

[2024] relax the LFF definition based on the minimal descrip-

tion length principle. Future work should combine our ap-

proach with their approach to learn big rules from noisy data.

Page 8

References

[Bacchus et al., 2021] Fahiem Bacchus, Matti Järvisalo, and

Ruben Martins. Maximum satisfiabiliy. In Armin Biere,

Marijn Heule, Hans van Maaren, and Toby Walsh, editors,

Handbook of Satisfiability - Second Edition, volume 336 of

Frontiers in Artificial Intelligence and Applications, pages

929–991. IOS Press, 2021.

[Bartha and Cheney, 2019] Sándor Bartha and James Cheney.

Towards meta-interpretive learning of programming lan-

guage semantics. In Inductive Logic Programming - 29th

International Conference, ILP 2019, Plovdiv, Bulgaria,

September 3-5, 2019, Proceedings, pages 16–25, 2019.

[Bembenek et al., 2023] Aaron Bembenek, Michael Green-

berg, and Stephen Chong. From SMT to ASP: Solver-based

approaches to solving datalog synthesis-as-rule-selection

problems. Proc. ACM Program. Lang., 7(POPL), jan 2023.

[Biere et al., 2021] Armin Biere, Marijn Heule, Hans van

Maaren, and Toby Walsh, editors. Handbook of Satisfi-

ability - Second Edition, volume 336 of FAIA. IOS Press,

2021.

[Biere et al., 2023] Armin Biere, Mathias Fleury, and Flo-

rian Pollitt. CaDiCaL, vivinst, IsaSAT, Gimsatul, Kissat,

and TabularaSAT entering the SAT Competition 2023. In

Proc. SAT Competition 2023: Solver, Benchmark and Proof

Checker Descriptions, volume B-2023-1 of Department of

Computer Science Report Series B, pages 14–15. University

of Helsinki, 2023.

[Blockeel and De Raedt, 1998] Hendrik Blockeel and Luc

De Raedt. Top-down induction of first-order logical deci-

sion trees. Artificial intelligence, 101(1-2):285–297, 1998.

[Bramley et al., 2018] Neil Bramley, Anselm Rothe, Josh

Tenenbaum, Fei Xu, and Todd M. Gureckis. Grounding

compositional hypothesis generation in specific instances.

In Proceedings of the 40th Annual Meeting of the Cognitive

Science Society, CogSci 2018, 2018.

[Bunel et al., 2018] Rudy Bunel, Matthew Hausknecht, Jacob

Devlin, Rishabh Singh, and Pushmeet Kohli. Leveraging

grammar and reinforcement learning for neural program

synthesis. arXiv preprint arXiv:1805.04276, 2018.

[Chollet, 2019] François Chollet. On the measure of intelli-

gence. CoRR, 2019.

[Corapi et al., 2011] Domenico Corapi, Alessandra Russo,

and Emil Lupu. Inductive logic programming in answer

set programming. In Inductive Logic Programming - 21st

International Conference, pages 91–97, 2011.

[Costa et al., 2003] Vıtor Santos Costa, Ashwin Srinivasan,

Rui Camacho, Hendrik Blockeel, Bart Demoen, Gerda

Janssens, Jan Struyf, Henk Vandecasteele, and Wim Van

Laer. Query transformations for improving the efficiency

of ILP systems. Journal of Machine Learning Research,

pages 465–491, 2003.

[Cropper and Dumancic, 2020] Andrew Cropper and Sebasti-

jan Dumancic. Learning large logic programs by going

beyond entailment. In Proceedings of the Twenty-Ninth

International Joint Conference on Artificial Intelligence,

IJCAI 2020, pages 2073–2079, 2020.

[Cropper and Hocquette, 2023] Andrew Cropper and Céline

Hocquette. Learning logic programs by combining pro-

grams. In ECAI 2023 - 26th European Conference on

Artificial Intelligence, volume 372, pages 501–508. IOS

Press, 2023.

[Cropper and Morel, 2021] Andrew Cropper and Rolf Morel.

Learning programs by learning from failures. Machine

Learning, 110(4):801–856, 2021.

[Cropper et al., 2020] Andrew Cropper, Richard Evans, and

Mark Law. Inductive general game playing. Machine

Learning, 109(7):1393–1434, 2020.

[Cropper et al., 2022] Andrew Cropper, Sebastijan Du-

mancic, Richard Evans, and Stephen H. Muggleton. In-

ductive logic programming at 30. Machine Learning,

111(1):147–172, 2022.

[Devlin et al., 2017] Jacob Devlin, Jonathan Uesato, Surya

Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and

Pushmeet Kohli. Robustfill: Neural program learning under

noisy i/o. In International conference on machine learning,

pages 990–998. PMLR, 2017.

[Dumancic et al.,2019] Sebastijan Dumancic, Tias Guns,

Wannes Meert, and Hendrik Blockeel. Learning relational

representations with auto-encoding logic programs. In

Proceedings of the Twenty-Eighth International Joint Con-

ference on Artificial Intelligence, IJCAI 2019, pages 6081–

6087, 2019.

[Evans and Grefenstette, 2018] Richard Evans and Edward

Grefenstette. Learning explanatory rules from noisy data.

Journal of Artificial Intelligence Research, pages 1–64,

2018.

[Evans et al., 2021] Richard Evans, José Hernández-Orallo,

Johannes Welbl, Pushmeet Kohli, and Marek J. Sergot.

Making sense of sensory input. Artificial Intelligence, page

103438, 2021.

[Ferilli, 2016] Stefano Ferilli. Predicate invention-based spe-

cialization in inductive logic programming. Journal of

Intelligent Information Systems, 47(1):33–55, 2016.

[Finn et al.,1998] Paul Finn, Stephen Muggleton, David

Page, and Ashwin Srinivasan. Pharmacophore discovery

using the inductive logic programming system progol. Ma-

chine Learning, 30(2):241–270, 1998.

[Galárraga et al., 2015] Luis Galárraga, Christina Teflioudi,

Katja Hose, and Fabian M. Suchanek. Fast rule mining

in ontological knowledge bases with AMIE+. VLDB J.,

24(6):707–730, 2015.

[Genesereth and Björnsson, 2013] Michael Genesereth and

Yngvi Björnsson. The international general game play-

ing competition. AI Magazine, 34(2):107–107, 2013.

[Hocquette et al., 2024] Céline Hocquette, Andreas Niska-

nen, Matti Järvisalo, and Andrew Cropper. Learning MDL

logic programs from noisy data. In AAAI, 2024.

Page 9

[Ignatiev et al.,2018] Alexey Ignatiev, Antonio Morgado,

and Joao Marques-Silva. PySAT: A Python toolkit for

prototyping with SAT oracles. In International Conference

on Theory and Applications of Satisfiability Testing, pages

428–437. Springer, 2018.

[Kaminski et al., 2019] Tobias Kaminski, Thomas Eiter, and

Katsumi Inoue. Meta-interpretive learning using hex-

programs. In Proceedings of the Twenty-Eighth Interna-

tional Joint Conference on Artificial Intelligence, IJCAI

2019, pages 6186–6190, 2019.

[Law et al., 2014] Mark Law, Alessandra Russo, and Krysia

Broda. Inductive learning of answer set programs. In

Logics in Artificial Intelligence - 14th European Conference,

JELIA 2014, pages 311–325, 2014.

[Liffiton and Sakallah, 2008] Mark H. Liffiton and Karem A.

Sakallah. Algorithms for computing minimal unsatisfiable

subsets of constraints. J. Autom. Reason., 40(1):1–33, 2008.

[Lloyd, 2012] John W Lloyd. Foundations of logic program-

ming. Springer Science & Business Media, 2012.

[Manthey et al., 2014] Norbert Manthey, Tobias Philipp, and

Peter Steinke. A more compact translation of pseudo-

Boolean constraints into CNF such that generalized arc

consistency is maintained. In Carsten Lutz and Michael

Thielscher, editors, KI 2014, volume 8736 of LNCS, pages

123–134. Springer, 2014.

[Mihalkova et al., 2007] Lilyana Mihalkova, Tuyen N.

Huynh, and Raymond J. Mooney. Mapping and revis-

ing markov logic networks for transfer learning. In

Proceedings of the Twenty-Second AAAI Conference on

Artificial Intelligence, July 22-26, 2007, Vancouver, British

Columbia, Canada, pages 608–614. AAAI Press, 2007.

[Muggleton, 1991] Stephen Muggleton. Inductive logic pro-

gramming. New Generation Computing, 8(4):295–318,

1991.

[Muggleton,1995] Stephen H. Muggleton. Inverse en-

tailment and Progol.

New Generation Computing,

13(3&4):245–286, 1995.

[Piotrów,2020] Marek Piotrów. UWrMaxSat: Efficient

solver for MaxSAT and pseudo-Boolean problems. In 2020

IEEE 32nd International Conference on Tools with Artifi-

cial Intelligence (ICTAI), pages 132–136, 2020.

[Quinlan, 1990] J. Ross Quinlan. Learning logical definitions

from relations. Machine Learning, pages 239–266, 1990.

[Raghothaman et al., 2020] Mukund Raghothaman, Jonathan

Mendelson, David Zhao, Mayur Naik, and Bernhard Scholz.

Provenance-guided synthesis of datalog programs. Proc.

ACM Program. Lang., 4(POPL):62:1–62:27, 2020.

[Shapiro, 1983] E.Y. Shapiro. Algorithmic program debug-

ging. 1983.

[Shi et al., 2022] Kensen Shi, Hanjun Dai, Kevin Ellis, and

Charles Sutton. Crossbeam: Learning to search in bottom-

up program synthesis. In International Conference on

Learning Representations, 2022.

[Si et al., 2019] Xujie Si, Mukund Raghothaman, Kihong

Heo, and Mayur Naik. Synthesizing datalog programs using

numerical relaxation. In Proceedings of the Twenty-Eighth

International Joint Conference on Artificial Intelligence,

IJCAI 2019, pages 6117–6124, 2019.

[Srinivasan, 2001] Ashwin Srinivasan. The ALEPH manual.

Machine Learning at the Computing Laboratory, Oxford

University, 2001.

[Tärnlund, 1977] Sten-˚Ake Tärnlund. Horn clause com-

putability. BIT, (2):215–226, 1977.

[Xu et al., 2023] Yudong Xu, Wenhao Li, Pashootan

Vaezipoor, Scott Sanner, and Elias B Khalil. LLMs and

the abstraction and reasoning corpus: Successes, failures,

and the importance of object-based representations. arXiv

preprint arXiv:2305.18354, 2023.

A Terminology

We assume familiarity with logic programming [Lloyd, 2012]

but restate some key relevant notation. A variable is a string

of characters starting with an uppercase letter. A predicate

symbol is a string of characters starting with a lowercase letter.

The arity n of a function or predicate symbol is the number

of arguments it takes. An atom is a tuple p(t1, ..., tn), where

p is a predicate of arity n and t1, ..., tn are terms, either

variables or constants. An atom is ground if it contains no

variables. A literal is an atom or the negation of an atom.

A clause is a set of literals. A clausal theory is a set of

clauses. A constraint is a clause without a positive literal. A

definite clause is a clause with exactly one positive literal. A

hypothesis is a set of definite clauses with the least Herbrand

semantics. We use the term program interchangeably with

hypothesis, i.e. a hypothesis is a program. A substitution

θ = {v1/t1, ..., vn/tn} is the simultaneous replacement of

each variable vi by its corresponding term ti. A clause c1

subsumes a clause c2 if and only if there exists a substitution

θ such that c1θ ⊆ c2. A program h1 subsumes a program h2,

denoted h1 ⪯ h2, if and only if ∀c2 ∈ h2, ∃c1 ∈ h1 such that

c1 subsumes c2. A program h1 is a specialisation of a program

h2 if and only if h2 ⪯ h1. A program h1 is a generalisation

of a program h2 if and only if h1 ⪯ h2.

B Generate stage encoding

Our encoding of the generate stage is the same as COMBO

except we disallow splittable rules in hypotheses which are

not recursive and do not contain predicate invention. To do so,

we add the following ASP program to the generate encoding:

:- not pi_or_rec, clause(C), splittable(C).

splittable(C) :-

body_not_head(C,Var1),

body_not_head(C,Var2),

Var1 < Var2,

not path(C,Var1,Var2).

path(C,Var1,Var2) :- path1(C,Var1,Var2).

path(C,Var1,Var2) :-

path1(C,Var1,Var3),

path(C,Var3,Var2).

Page 10

path1(C,Var1,Var2) :-

body_literal(C,_,_,Vars),

var_member(Var1,Vars),

var_member(Var2,Vars),

Var1 < Var2,

body_not_head(C,Var1),

body_not_head(C,Var2).

%% we also disallow rules with multiple body

literals when a body literal only has head variables

not pi_or_rec,

body_literal(C,P1,_,Vars1),

body_literal(C,P2,_,Vars2),

(P1,Vars1) != (P2,Vars2),

head_only_vars(C,Vars1).

head_only_vars(C,Vars) :-

body_literal(C,_,_,Vars),

clause(C),

not has_body_var(C,Vars).

has_body_var(C,Vars) :-

var_member(Var,Vars),

body_not_head(C,Var).

body_not_head(C,V) :-

body_var(C,V),

not head_var(C,V).

C JOINER Correctness

We show the correctness of JOINER. In the rest of this

section, we consider a LFF input tuple (E,B, H,C) with

E = (E+,E−). We assume that a solution always exists:

Assumption 1 (Existence of a solution). We assume there

exists a solution h ∈ HC.

For terseness, in the rest of this section we always assume

Assumption 1 and, therefore, avoid repeatedly saying “if a

solution exists”.

We follow LFF [Cropper and Morel, 2021] and assume the

background knowledge does not depend on any hypothesis:

Assumption 2 (Background knowledge independence). We

assume that no predicate symbol in the body of a rule in the

background knowledge appears in the head of a rule in a

hypothesis.

For instance, given the following background knowledge we

disallow learning hypotheses for the target relations famous or

friend:

happy(A) ← friend(A,B), famous(B)

As with COMBO [Cropper and Hocquette, 2023], JOINER is

correct when the hypothesis space only contains decidable pro-

grams (such as Datalog programs), i.e. when every program is

guaranteed to terminate:

Assumption 3 (Decidable programs). We assume the hy-

pothesis space only contains decidable programs.

When the hypothesis space contains arbitrary definite pro-

grams, the results do not hold because, due to their Turing

completeness, checking entailment of an arbitrary definite

program is semi-decidable [Tärnlund, 1977], i.e. testing a

program might never terminate.

JOINER follows a generate, test, join, combine, and con-

strain loop. We show the correctness of each of these steps in

turn. We finally use these results to prove the correctness of

JOINER i.e. that JOINER returns an optimal solution.

C.1 Preliminaries

We introduce definitions used throughout this section. We

define a splittable rule:

Definition 7 (Splittable rule). A rule is splittable if its body

literals can be partitioned into two non-empty sets such that

the body-only variables (a variable in the body of a rule but

not the head) in the literals of one set are disjoint from the

body-only variables in the literals of the other set. A rule is

non-splittable if it is not splittable.

We define a splittable program:

Definition 8 (Splittable program). A program is splittable if

and only if it has exactly one rule and this rule is splittable. A

program is non-splittable when it is not splittable.

The least Herbrand model M(P) of the logic program P is

the set of all ground atomic logical consequences of P, The

least Herbrand model MB(P) = M(P ∪ B) of P given the

logic program B is the the set of all ground atomic logical

consequences of P∪B. In the following, we assume a program

B denoting background knowledge and concisely note MB(P)

as M(P).

We define a conjunction:

Definition 9 (Conjunction). A conjunction is a set of pro-

grams with the same head literal. The logical consequences of

a conjunction c is the intersection of the logical consequences

of the programs in it: M(c) = ∩p∈cM(p). The cost of a

conjunction c is the sum of the cost of the programs in it:

cost(c) = ∑p∈c cost(p).

Conjunctions only preserve the semantics with respect to their

head predicate symbols. We, therefore, reason about the least

Herbrand model restricted to a predicate symbol:

Definition 10 (Restricted least Herbrand model). Let P

be a logic program and f be a predicate symbol. Then the

least Herbrand model of P restricted to f is M(P, f) = {a ∈

M(P) | the predicate symbol of a is f}.

We define an operator σ : 2H ↦→ H which maps a conjunc-

tion to a program such that (i) for any conjunction c with

head predicate symbol f, M(c, f) = M(σ(c),f), and (ii) for

any conjunctions c1 and c2 such that size(c1) < size(c2),

size(σ(c1)) < size(σ(c2)). In the following, when we say

that a conjunction c is in a program h, or c ⊆ h, we mean

σ(c) ⊆ h.

For instance, let σ be the operator which, given a con-

junction c with head predicate symbol f of arity a, re-

turns the program obtained by (i) for each program p ∈

c, replacing each occurrence of f in p by a new predi-

cate symbol fi, and (ii) adding the rule f(x1,...,xa) ←

f1(x1,...,xa),...,fn(x1,...,xa) where xi,...,xa are

variables.

Page 11

Example 1. Consider the following conjunction:

c1 =

{ { zendo(S) ← piece(S,B), blue(B) }

{ zendo(S) ← piece(S,R), red(R) }

}

Then σ(c1) is:

{ zendo(S) ← zendo1(S),zendo2(S)

zendo1(S) ← piece(S,B), blue(B)

zendo2(S) ← piece(S,R), red(R)

}

Consider the following conjunction:

c2 =







{ f(S) ← head(S,1)

f(S) ← tail(S,T),f(T)

}

{ f(S) ← head(S,2)

f(S) ← tail(S,T),f(T)

}







Then σ(c2) is:







f(S) ← f1(S),f2(S)

f1(S) ← head(S,1)

f1(S) ← tail(S,T),f1(T)

f2(S) ← head(S,2)

f2(S) ← tail(S,T),f2(T)







The generate stage of COMBO generates non-separable pro-

grams. We recall the definition of a separable program:

Definition 11 (Separable program). A program h is sepa-

rable when (i) it has at least two rules, and (ii) no predicate

symbol in the head of a rule in h also appears in the body of a

rule in h. A program is non-separable when it is not separable.

C.2 Generate and Test

We show that JOINER can generate and test every non-

splittable non-separable program:

Proposition 4 (Correctness of the generate and test stages).

JOINER can generate and test every non-splittable non-

separable program.

Proof. Cropper and Hocquette [2023] show (Lemma 1) that

COMBO can generate and test every non-separable program

under Assumption 3. The constraint we add to the gener-

ate encoding (Section B) prunes splittable programs. There-

fore JOINER can generate and test every non-splittable non-

separable program.

C.3 Join

Proposition 4 shows that JOINER can generate and test every

non-splittable non-separable program. However, the generate

stage cannot generate splittable non-separable programs. To

learn a splittable non-separable program, JOINER uses the join

stage to join non-splittable programs into splittable programs.

To show the correctness of this join stage, we first show that the

logical consequences of a rule which body is the conjunction

of the body of two rules r1 and r2 is equal to the intersection

of the logical consequences of r1 and r2:

Lemma 1. Let r1 = (g ← L1), r2 = (g ← L2), and r =

(g ← L1,L2) be three rules, such that the body-only variables

of r1 and r2 are distinct. Then M(r) = M(r1) ∩ M(r2).

Proof. We first show that if a ∈

M(r) then a

∈

M(r1) ∩ M(r2). The rule r specialises r1 which implies that

M(r) ⊆ M(r1) and therefore a ∈ M(r1). Likewise, r spe-

cialises r2 which implies that M(r) ⊆ M(r2) and therefore

a ∈ M(r2). Therefore a ∈ M(r1) ∩ M(r2).

We now show that if a ∈ M(r1) ∩ M(r2) then a ∈ M(r).

Since a ∈ M(r1) then L1 ∨ ¬g |= a. Similarly, since a ∈

M(r2) then L2 ∨ ¬g |= a. Then (L1 ∨ ¬g) ∧ (L2 ∨ ¬g) |= a

by monotonicity, and a ∈ M(r).

With this result, we show that any rule is equivalent to a

conjunction of non-splittable programs:

Lemma 2 (Conjunction of non-splittable programs). Let r

be a rule. Then there exists a conjunction c of non-splittable

programs such that M(r) = M(c).

Proof. We prove the result by induction on the number of

body literals m in the body of r.

For the base case, if m = 1 then r is non-splittable. Let c

be the conjunction {{r}}. Then M(r) = M(c).

For the inductive case, assume the claim holds for rules with

m body literals. Let r be a rule with m+1 body literals. Either

r is (i) non-splittable, or (ii) splittable. For case (i), assume

r is non-splittable. Let c be the conjunction {{r}}. Then

M(r) = M(c). For case (ii), assume r is splittable. Then,

by Definition 7, its body literals can be partitioned into two

non-empty sets L1 and L2 such that the body-only variables

in L1 are disjoint from the body-only variables in L2. Both L1

and L2 have fewer than m literals. Let a be the head literal of

r. Let r1 = (a ← L1) and r2 = (a ← L2). By the induction

hypothesis, there exists a conjunction c1 of non-splittable

programs such that M(r1) = M(c1) and a conjunction c2

of non-splittable programs such that M(r2) = M(c2). Let

c = c1 ∪ c2. Then c is a conjunction because the programs

in c1 and c2 have the same head predicate symbol. From

Lemma 1, M(r) = M(r1) ∩ M(r2) and therefore M(r) =

M(c1) ∩ M(c2) = M(c) which completes the proof.

Example 4 (Conjunction of non-splittable programs). Con-

sider the rule r:

r = { f(A) ← head(A,1),tail(A,B),head(B,3) }

Consider the non-splittable programs:

p1 = { f(A) ← head(A,1) }

p2 = { f(A) ← tail(A,B),head(B,3) }

Then M(r) = M(c) where c is the conjunction c = p1 ∪ p2.

We show that any program is equivalent to a conjunction of

non-splittable programs:

Lemma 3 (Conjunction of non-splittable programs). Let

h be a program. Then there exists a conjunction c of non-

splittable programs such that M(h) = M(c).

Proof. h has either (i) at least two rules, or (ii) one rule. For

case (i), if h has at least two rules, then h is non-splittable.

Let c be the conjunction {{h}}. Then M(h) = M(c). For

case (ii), Lemma 2 shows there exists a conjunction c of non-

splittable programs such that M(h) = M(c).

Page 12

The join stage takes as input joinable programs. We define a

joinable program:

Definition 12 (Joinable program). A program is joinable

when it (i) is non-splittable, (ii) is non-separable, (iii) entails

at least one positive example, and (iv) entails at least one

negative example.

The combine stage takes as input combinable programs. We

define a combinable program:

Definition 13 (Combinable program). A program is com-

binable when it (i) is non-separable, (ii) entails at least one

positive example, and (iii) entails no negative example.

We show that any combinable program is equivalent to a con-

junction of non-splittable non-separable programs which each

entail at least one positive example:

Lemma 4 (Conjunction of programs). Let h be combinable

program. Then there exists a conjunction c of programs such

that M(c) = M(h) and each program in c (i) is non-splittable,

(ii) is non-separable, and (iii) entails at least one positive

example.

Proof. From Lemma 3, there exists a conjunction c of non-

splittable programs such that M(c) = M(h). For contradic-

tion, assume some program hi ∈ c is either (i) separable, or (ii)

does not entail any positive example. For case (i), assume that

hi is separable. Then hi has at least two rules and therefore

cannot be non-splittable, which contradicts our assumption.

For case (ii), assume hi does not entail any positive example.

Then c does not entail any positive example and h does not en-

tail any positive example. Then h is not a combinable program,

which contradicts our assumption. Therefore, each hi ∈ c (i)

is non-splittable, (ii) is non-separable, and (iii) entails at least

one positive example.

The join stage returns non-subsumed conjunctions. A conjunc-

tion is subsumed if it entails a subset of the positive examples

entailed by a strictly smaller conjunction:

Definition 14 (Subsumed conjunction). Let c1 and c2 be

two conjunctions which do not entail any negative examples,

c1 |= E+

1 , c2 |= E+

2 , E+

2 ⊆ E+

1 , size(c1) < size(c2). Then

c2 is subsumed by c1.

Lemma 5 (Conjunction of joinable programs). Let h be

a non-subsumed splittable combinable program. Then there

exists a conjunction c of joinable programs such that M(c) =

M(h).

Proof. Since h is splittable, h has a single rule which body

literals can be partitioned into two non-empty sets L1 and L2

such that the body-only variables in L1 are disjoint from the

body-only variables in L2. Let g be the head literal of the

rule in h. Let r1 = (g ← L1) and r2 = (g ← L2). From

Lemma 4, there exists two conjunctions c1 and c2 such that

M(c1) = M(r1) and M(c2) = M(r2) and each program

in c1 ∪ c2 (i) is non-splittable, (ii) is non-separable, and (iii)

entails at least one positive example. Let c = c1 ∪ c2. Then

c is a conjunction because the programs in c1 and c2 have

the same head predicate symbol. From Lemma 1, M(r) =

M(r1) ∩ M(r2) and therefore M(r) = M(c1) ∩ M(c2) =

M(c). Therefore there exists a conjunction c of at least two

programs such that M(c) = M(h) and each program in c (i)

is non-splittable, (ii) is non-separable, and (iii) entails at least

one positive example. Assume some hi ∈ c does not cover any

negative example. Then the conjunction c′ = {{hi}} does not

entail any negative example. Since c has at least two elements,

then c′ ⊂ c, and c is subsumed by c′ which contradicts our

assumption. Therefore, each hi ∈ c entails some negative

examples and is joinable.

We show that the join stage returns all non-subsumed splittable

combinable programs:

Proposition 5 (Correctness of the join stage). Given all

joinable programs, the join stage returns all non-subsumed

splittable combinable programs.

Proof. Assume the opposite, i.e. that there is a non-subsumed

splittable combinable program h not returned by the join stage.

As h is non-subsumed splittable combinable, by Lemma 5,

there exists a conjunction c of joinable programs such that

M(c) = M(h). As h is non-subsumed, c is not subsumed

by another conjunction. The join stage uses a SAT solver to

enumerate maximal satisfiable subsets [Liffiton and Sakallah,

2008] corresponding to the subset-maximal coverage of con-

junctions. As we have all joinable programs, the solver will

find every conjunction of joinable programs not subsumed by

another conjunction. Therefore, the solver finds c and the join

stage returns h, which contradicts our assumption.

C.4 Combine

We first show that a subsumed conjunction cannot be in an

optimal solution:

Proposition 6 (Subsumed conjunction). Let c1 and c2 be two

conjunctions such that c2 is subsumed by c1, h be a solution,

and σ(c2) ⊆ h. Then h cannot be optimal.

Proof. Assume the opposite, i.e. h is an optimal solution. We

show that h′ = (h \ σ(c2)) ∪ σ(c1) is a solution. Since h is a

solution, it does not entail any negative examples. Therefore,

h\σ(c2) does not entail any negative examples. Since c1 does

not entail any negative examples, then h′ does not entail any

negative examples. Since h is a solution, it entails all positive

examples. Since E+

2 ⊆ E+

1 , h′ also entails all positive exam-

ples. Therefore h′ is a solution. Since size(c1) < size(c2)

then size(h′) < size(h). Then h cannot be optimal, which

contradicts our assumption.

Corollary 1. Let c1 and c2 be two conjunctions which do not

entail any negative examples, c1 |= E+

1 , c2 |= E+

2 , E+

2 ⊆ E+

1 ,

and size(c1) < size(c2). Then c2 cannot be in an optimal

solution.

Proof. Follows from Proposition 6.

With this result, we show the correctness of the combine stage:

Proposition 7 (Correctness of the combine stage). Given

all non-subsumed combinable programs, the combine stage

returns an optimal solution.

Page 13

Proof. Cropper and Hocquette [2023] show that given all com-

binable programs, the combine stage returns an optimal solu-

tion under Assumption 2. Corollary 1 shows that subsumed

conjunctions cannot be in an optimal solution. Therefore, the

combine stage of JOINER returns an optimal solution given all

non-subsumed combinable programs.

C.5 Constrain

In this section, we show that the constrain stage never prunes

optimal solutions from the hypothesis space. JOINER builds

specialisations constraints from non-splittable non-separable

programs that do not entail (a) any positive example, or (b)

any negative example.

We show a result for case (a). If a hypothesis does not entail

any positive example, JOINER prunes all its specialisations, as

they cannot be in a conjunction in an optimal solution:

Proposition 8 (Specialisation constraint when tp=0). Let

h1 be a hypothesis that does not entail any positive example,

h2 be a specialisation of h1, c be a conjunction, h2 ∈ c, h be

a solution, and σ(c) ⊆ h. Then h is not an optimal solution.

Proof. Assume the opposite, i.e. h is an optimal solution. We

show that h′ = h \ σ(c) is a solution. Since h is a solution, it

does not entail any negative example. Then h′ does not entail

any negative example. Since h1 does not entail any positive

example and h2 specialises h1 then h2 does not entail any

positive example. Then c does not entail any positive example.

Since h entails all positive examples and c does not entail any

positive example, then h′ entails all positive examples. Since

h′ entails all positive examples and no negative examples, it is

a solution. Moreover, size(h′) < size(h). Then h cannot be

optimal, which contradicts our assumption.

Corollary 2. Let h1 be a hypothesis that does not entail any

positive example and h2 be a specialisation of h1. Then h2

cannot be in a conjunction in an optimal solution.

Proof. Follows from Proposition 8.

We show a result for case (b). If a hypothesis does not entail

any negative example, JOINER prunes all its specialisations,

as they cannot be in a conjunction in an optimal solution:

Proposition 9 (Specialisation constraint when fp=0). Let

h1 be a hypothesis that does not entail any negative example,

h2 be a specialisation of h1, c be a conjunction, h2 ∈ c, h be

a solution, and σ(c) ⊆ h. Then h is not an optimal solution.

Proof. Assume the opposite, i.e. h is an optimal solution. Let

c′ = (c \{h2}) ∪{h1} and h′ = (h \ σ(c)) ∪ σ(c′). We show

that h′ is a solution. Since h1 does not entail any negative

example, then c′ does not entail any negative example. Since

h is a solution, it does not entail any negative examples. Then

h′ do not entail any negative example. Since h2 specialises h1,

then h1 entails more positive examples than h2. Then c′ entails

more positive examples than c, and h′ entails more positive

examples than h. Since h entails all positive examples then

h′ entails all positive examples. Since h′ entails all positive

examples and no negative examples, it is a solution. Since

h2 specialises h1 then size(h1) < size(h2) and size(c′) <

size(c). Therefore, size(h′) < size(h). Then h cannot be

optimal, which contradicts our assumption.

If a hypothesis does not entail any negative example, JOINER

prunes all its specialisations, as they cannot be in a conjunction

in an optimal solution:

Corollary 3. Let h1 be a hypothesis that does not entail any

negative example and h2 be a specialisation of h1. Then h2

cannot be in a conjunction in an optimal solution.

Proof. Follows from Proposition 9.

We now show that the constrain stage never prunes an optimal

solution:

Proposition 10 (Optimal soundness of the constraints). The

constrain stage of JOINER never prunes an optimal solution.

Proof. JOINER builds constraints from non-splittable non-

separable programs that do not entail (a) any positive example,

or (b) any negative example. For case (a), if a program does

not entail any positive example, JOINER prunes its specialisa-

tions which by Corollary 2 cannot be in a conjunction in an

optimal solution. For case (b), if a program does not entail any

negative example, JOINER prunes its specialisations which by

Corollary 3 cannot be in a conjunction in an optimal solution.

Therefore, JOINER never prunes an optimal solution.

C.6 JOINER Correctness

We show the correctness of the JOINER without the constrain

stage:

Proposition 11 (Correctness of JOINER without the con-

strain stage). JOINER without the constrain stage returns an

optimal solution.

Proof. Proposition 4 shows that JOINER can generate and test

all non-splittable non-separable programs. These generated

programs include all joinable programs and all non-splittable

combinable programs. Proposition 5 shows that given all

joinable programs, the join stage returns all non-subsumed

splittable combinable programs. Therefore, JOINER builds

all non-subsumed splittable and non-splittable combinable

programs. By Proposition 7, given all non-subsumed combin-

able programs, the combine stage returns an optimal solution.

Therefore, JOINER without the constrain stage returns an opti-

mal solution.

We show the correctness of the JOINER:

Theorem 2 (Correctness of JOINER). JOINER returns an

optimal solution.

Proof. From Proposition 11, JOINER without the constrain

stage returns an optimal solution. From Proposition 10, the

constrain stage never prunes an optimal solution. Therefore,

JOINER with the constrain stage returns an optimal solution.

Page 14

D Experiments

D.1 Experimental domains

We describe the characteristics of the domains and tasks used

in our experiments. Figure 8 shows example solutions for

some of the tasks.

IGGP. In inductive general game playing (IGGP) [Cropper

et al., 2020] the task is to induce a hypothesis to explain game

traces from the general game playing competition [Genesereth

and Björnsson, 2013]. IGGP is notoriously difficult for ma-

chine learning approaches. The currently best-performing

system can only learn perfect solutions for 40% of the tasks.

Moreover, although seemingly a toy problem, IGGP is rep-

resentative of many real-world problems, such as inducing

semantics of programming languages [Bartha and Cheney,

2019]. We use 11 tasks: minimal decay (md), buttons-next,

buttons-goal, rock - paper - scissors (rps), coins-next, coins-

goal, attrition, centipede, sukoshi, rainbow, and horseshoe.

Zendo. Zendo is an inductive game in which one player,

the Master, creates a rule for structures made of pieces with

varying attributes to follow. The other players, the Students,

try to discover the rule by building and studying structures

which are labelled by the Master as following or breaking the

rule. The first student to correctly state the rule wins. We

learn four increasingly complex rules for structures made of at

most 5 pieces of varying color, size, orientation and position.

Zendo is a challenging game that has attracted much interest

in cognitive science [Bramley et al., 2018].

IMDB. The real-world IMDB dataset [Mihalkova et al.,

2007] includes relations between movies, actors, directors,

movie genre, and gender. It has been created from the Interna-

tional Movie Database (IMDB.com) database. We learn the

relation workedunder/2, a more complex variant workedwith-

samegender/2, and the disjunction of the two. This dataset is

frequently used [Dumancic et al., 2019].

D.2 Experimental Setup

We measure the mean and standard error of the predictive

accuracy and learning time. We use a 3.8 GHz 8-Core Intel

Core i7 with 32GB of ram. The systems use a single CPU.

D.3 Experimental Results

We compare JOINER against COMBO and ALEPH, which we

describe below.

COMBO COMBO uses identical biases to JOINER so the com-

parison is direct, i.e. fair.

ALEPH ALEPH excels at learning many large non-recursive

rules and should excel at the trains and IGGP tasks. Al-

though ALEPH can learn recursive programs, it struggles

to do so. JOINER and ALEPH use similar biases so the

comparison can be considered reasonably fair.

We also tried/considered other ILP systems. We considered

ILASP [Law et al., 2014]. However, ILASP builds on ASPAL

and first precomputes every possible rule in a hypothesis space,

which is infeasible for our datasets. In addition, ILASP cannot

learn Prolog programs so is unusable in the synthesis tasks.

For instance, it would require precomputing 1015 rules for the

coins task.

E Experimental results

E.1 Impact of the features

Tables 2 and 3 show the results with 60s and 600s timeout

respectively.

Task

COMBO

with join

JOINER

splittable

attrition

67 ± 0

75 ± 0

67 ± 0

buttons

91 ± 0

100 ± 0

buttons-g

100 ± 0

90 ± 0

98 ± 2

centipede

100 ± 0

coins

62 ± 0

100 ± 0

80 ± 0

coins-g

97 ± 0

horseshoe

67 ± 0

69 ± 8

100 ± 0

rainbow

50 ± 0

100 ± 0

rps

50 ± 0

82 ± 0

100 ± 0

sukoshi

50 ± 0

99 ± 0

imdb1

100 ± 0

imdb2

100 ± 0

imdb3

100 ± 0

pharma1

50 ± 0

56 ± 2

57 ± 1

pharma2

50 ± 0

99 ± 1

98 ± 1

pharma3

50 ± 0

97 ± 1

96 ± 1

zendo1

94 ± 3

99 ± 0

99 ± 1

zendo2

70 ± 1

92 ± 1

93 ± 1

zendo3

79 ± 1

80 ± 1

zendo4

93 ± 1

99 ± 1

string1

50 ± 0

53 ± 1

string2

50 ± 0

53 ± 1

string3

50 ± 0

51 ± 2

100 ± 0

denoising 1c

52 ± 2

100 ± 0

denoising mc

60 ± 10

99 ± 1

fill

52 ± 2

100 ± 0

flip

50 ± 0

85 ± 9

hollow

55 ± 5

mirror

57 ± 1

61 ± 4

move 1p

50 ± 0

100 ± 0

move 2p

50 ± 0

80 ± 8

88 ± 4

move 2p dp

55 ± 2

87 ± 8

88 ± 9

move 3p

50 ± 0

62 ± 9

64 ± 9

move dp

55 ± 2

65 ± 6

62 ± 3

padded fill

52 ± 2

66 ± 3

pcopy 1c

50 ± 0

86 ± 9

84 ± 9

pcopy mc

68 ± 6

97 ± 2

recolor cmp

51 ± 1

55 ± 4

recolor cnt

50 ± 0

51 ± 1

recolor oe

50 ± 0

52 ± 2

scale dp

61 ± 8

96 ± 2

97 ± 2

Table 2: Predictive accuracies with a 60s timeout. We round accura-

cies to integer values. The error is standard error.

E.2 Comparison against other systems

Tables 4 and 5 show the results with 60s and 600s timeout

respectively.

Page 15

Listing 1: zendo1

zendo1 (A): − piece (A,C) , size (C,B) , blue (C) , small (B) , contact (C,D) , red (D).

Listing 2: zendo2

zendo2 (A): − piece (A,B) , piece (A,D) , piece (A,C) , green (D) , red (B) , blue (C) .

zendo2 (A): − piece (A,D) , piece (A,B) , coord1 (B,C) , green (D) , lhs (B) , coord1 (D,C) .

Listing 3: zendo3

zendo3 (A): − piece (A,D) , blue (D) , coord1 (D,B) , piece (A,C) , coord1 (C,B) , red (C) .

zendo3 (A): − piece (A,D) , contact (D,C) , rhs (D) , size (C,B) , large (B).

zendo3 (A): − piece (A,B) , upright (B) , contact (B,D) , blue (D) , size (D,C) , large (C).

Listing 4: zendo4

zendo4 (A): − piece (A,C) , contact (C,B) , strange (B) , upright (C).

zendo4 (A): − piece (A,D) , contact (D,C) , coord2 (C,B) , coord2 (D,B) .

zendo4 (A): − piece (A,D) , contact (D,C) , size (C,B) , red (D) ,medium(B) .

zendo4 (A): − piece (A,D) , blue (D) , lhs (D) , piece (A,C) , size (C,B) , small (B).

Listing 5: pharma1

active (A): − atm (A,B) , typec (B) , bond1 (B,C) , typeo (C) , atm (A, F) , types (F) , bonddu (F ,G) ,

typen (G) , atm (A,D) , typeh (D) , bond2 (D,E) , typedu (E ) .

Listing 6: pharma2

active (A): − atm (A,B) , types (C) , bonddu (C,B) , atm (A,G) , typeh (F) , bond2 (F ,G) ,

atm (A,E) , typec (D) , bond1 (D,E ) .

active (A): − atm (A,B) , typena (C) , bondv (B,C) , atm (A,N) , typen (O) , bondu (N,O) ,

atm (A, F) , typedu (G) , bondar (G, F ) .

Listing 7: pharma3

active (A): − atm (A,D) , bond2 (D,E) , typeh (E) , atm (A,C) , bond1 (C,B) , typec (B) .

active (A): − atm (A,B) , typena (C) , bondv (B,C) , atm (A,N) , typen (O) , bondu (N,O) .

Listing 8: string2

f 1(A):− head(A,B),cm(B).

f 1(A):− tail(A,B),f 1(B).

f 2(A):− head(A,B),cd(B).

f 2(A):− tail(A,B),f 2(B).

f 3(A):− head(A,B),co(B).

f 3(A):− tail(A,B),f 3(B).

f 4(A):− head(A,B),cr(B).

f 4(A):− tail(A,B),f 4(B).

f 5(A):− head(A,B),cu(B).

f 5(A):− tail(A,B),f 5(B).

f(A):− f 1(A),f 2(A),f 3(A),f 4(A),f 5(A).

Figure 8: Example solutions.

Page 16

Listing 9: minimal decay

next value(A,B):−c player(D),c pressButton(C),c5(B),does(A,D,C).

next value(A,B):−c player(C),my true value(A,E),does(A,C,D),my succ(B,E),c noop(D).

Listing 10: rainbow

terminal(A):− mypos r3(M),true color(A,M,L),hue(L),mypos r6(H),true color(A,H,I),

hue(I),mypos r5(F),true color(A,F,G),hue(G),mypos r2(J),

true color(A,J,K),hue(K),mypos r1(D),true color(A,D,E),

hue(E),mypos r4(B),true color(A,B,C),hue(C).

Listing 11: horseshoe

terminal(A):− int 20(B),true step(A,B).

terminal(A):− mypos b(E),mypos c(G),true cell(A,E,F),true cell(A,G,F),mypos d(D),

true cell(A,D,C),true cell(A,B,C),adjacent(B,D).

terminal(A):− mypos c(B),mypos a(H),true cell(A,H,I),mypos b(J),true cell(A,J,I),

mypos e(E),true cell(A,B,D),true cell(A,E,F),true cell(A,G,F),

adjacent (E,G) , true cell (A,C,D) , adjacent (C,B).

Listing 12: buttons

next(A,B):−c p(B),c c(C),does(A,D,C),my true(A,B),my input(D,C).

next(A,B):−my input(C,E),c p(D),my true(A,D),c b(E),does(A,C,E),c q(B).

next(A,B):−my input(C,D),not my true(A,B),does(A,C,D),c p(B),c a(D).

next(A,B):−c a(C),does(A,D,C),my true(A,B),c q(B),my input(D,C).

next(A,B):−my input(C,E),c p(B),my true(A,D),c b(E),does(A,C,E),c q(D).

next(A,B):−c c(D),my true(A,C),c r(B),role(E),does(A,E,D),c q(C).

next(A,B):−my true(A,C),my succ(C,B).

next(A,B):−my input(C,D),does(A,C,D),my true(A,B),c r(B),c b(D).

next(A,B):−my input(C,D),does(A,C,D),my true(A,B),c r(B),c a(D).

next(A,B):−my true(A,E),c c(C),does(A,D,C),c q(B),c r(E),my input(D,C).

Listing 13: rps

next score(A,B,C):−does(A,B,E),different(G,B),my true score(A,B,F),beats(E,D),

my succ(F,C),does(A,G,D).

next score(A,B,C):−different(G,B),beats(D,F),my true score(A,E,C),

does (A,G,D) , does (A,E, F ) .

next score(A,B,C):−my true score(A,B,C),does(A,B,D),does(A,E,D),different(E,B).

Listing 14: coins

next cell(A,B,C):−does jump(A,E,F,D),role(E),different(B,D),

my true cell(A,B,C),different(F,B).

next cell(A,B,C):−my pos(E),role(D),c zerocoins(C),does jump(A,D,B,E).

next cell(A,B,C):−role(D),does jump(A,D,E,B),c twocoins(C),different(B,E).

next cell(A,B,C):−does jump(A,F,E,D),role(F),my succ(E,B),

my true cell(A,B,C),different(E,D).

Listing 15: coins-goal

goal(A,B,C):− role(B),pos 5(D),my true step(A,D),score 100(C).

goal(A,B,C):− c onecoin(E),my pos(D),my true cell(A,D,E),score 0(C),role(B).

Figure 8: Example solutions.

Page 17

Task

COMBO

with join

JOINER

splittable

attrition

67 ± 0

73 ± 2

67 ± 0

buttons

100 ± 0

buttons-g

100 ± 0

96 ± 2

centipede

100 ± 0

coins

85 ± 9

100 ± 0

coins-g

97 ± 0

horseshoe

67 ± 0

100 ± 0

rainbow

52 ± 0

100 ± 0

rps

77 ± 0

100 ± 0

sukoshi

100 ± 0

imdb1

100 ± 0

imdb2

100 ± 0

imdb3

100 ± 0

pharma1

50 ± 0

100 ± 0

pharma2

50 ± 0

98 ± 1

97 ± 1

pharma3

58 ± 5

96 ± 1

zendo1

100 ± 0

zendo2

72 ± 1

93 ± 1

100 ± 0

zendo3

79 ± 1

78 ± 1

79 ± 1

zendo4

92 ± 1

98 ± 1

string1

50 ± 0

56 ± 1

100 ± 0

string2

51 ± 0

56 ± 1

100 ± 0

string3

49 ± 0

100 ± 0

denoising 1c

52 ± 2

100 ± 0

denoising mc

60 ± 10

99 ± 1

97 ± 2

fill

59 ± 5

100 ± 0

flip

58 ± 8

94 ± 4

96 ± 4

hollow

55 ± 5

85 ± 10

90 ± 6

mirror

57 ± 1

78 ± 10

70 ± 3

move 1p

100 ± 0

move 2p

50 ± 0

81 ± 9

91 ± 9

move 2p dp

55 ± 2

89 ± 9

100 ± 0

move 3p

50 ± 0

64 ± 9

85 ± 3

move dp

55 ± 2

58 ± 2

83 ± 5

padded fill

52 ± 2

66 ± 3

86 ± 5

pcopy 1c

50 ± 0

84 ± 9

92 ± 6

pcopy mc

68 ± 6

97 ± 2

95 ± 3

recolor cmp

51 ± 1

58 ± 5

68 ± 5

recolor cnt

50 ± 0

61 ± 7

80 ± 4

recolor oe

50 ± 0

69 ± 8

75 ± 7

scale dp

61 ± 8

96 ± 2

100 ± 0

Table 3: Predictive accuracies with a 600s timeout. We round accura-

cies to integer values. The error is standard error.

Task

ALEPH

COMBO

JOINER

attrition

50 ± 0

67 ± 0

buttons

50 ± 0

91 ± 0

100 ± 0

buttons-g

50 ± 0

100 ± 0

98 ± 2

centipede

100 ± 0

coins

50 ± 0

62 ± 0

80 ± 0

coins-g

100 ± 0

97 ± 0

horseshoe

50 ± 0

67 ± 0

100 ± 0

97 ± 0

100 ± 0

rainbow

50 ± 0

100 ± 0

rps

50 ± 0

100 ± 0

sukoshi

50 ± 0

99 ± 0

imdb1

60 ± 10

100 ± 0

imdb2

50 ± 0

100 ± 0

imdb3

50 ± 0

100 ± 0

pharma1

50 ± 0

57 ± 1

pharma2

50 ± 0

98 ± 1

pharma3

50 ± 0

96 ± 1

zendo1

100 ± 0

94 ± 3

99 ± 1

zendo2

100 ± 0

70 ± 1

93 ± 1

zendo3

99 ± 0

79 ± 1

80 ± 1

zendo4

99 ± 1

93 ± 1

99 ± 1

string1

50 ± 0

53 ± 1

string2

50 ± 0

53 ± 1

string3

50 ± 0

100 ± 0

denoising 1c

50 ± 0

52 ± 2

100 ± 0

denoising mc

50 ± 0

60 ± 10

99 ± 1

fill

50 ± 0

52 ± 2

100 ± 0

flip

50 ± 0

85 ± 9

hollow

50 ± 0

55 ± 5

mirror

50 ± 0

57 ± 1

61 ± 4

move 1p

50 ± 0

100 ± 0

move 2p

50 ± 0

88 ± 4

move 2p dp

50 ± 0

55 ± 2

88 ± 9

move 3p

50 ± 0

64 ± 9

move dp

50 ± 0

55 ± 2

62 ± 3

padded fill

50 ± 0

52 ± 2

66 ± 3

pcopy 1c

50 ± 0

84 ± 9

pcopy mc

50 ± 0

68 ± 6

97 ± 2

recolor cmp

50 ± 0

51 ± 1

55 ± 4

recolor cnt

50 ± 0

51 ± 1

recolor oe

50 ± 0

52 ± 2

scale dp

50 ± 0

61 ± 8

97 ± 2

Table 4: Predictive accuracies with a 60s timeout. We round accura-

cies to integer values. The error is standard error.

Page 18

Task

ALEPH

COMBO

JOINER

attrition

50 ± 0

67 ± 0

buttons

100 ± 0

buttons-g

100 ± 0

96 ± 2

centipede

100 ± 0

coins

50 ± 0

85 ± 9

100 ± 0

coins-g

100 ± 0

97 ± 0

horseshoe

65 ± 0

67 ± 0

100 ± 0

97 ± 0

100 ± 0

rainbow

50 ± 0

52 ± 0

100 ± 0

rps

100 ± 0

77 ± 0

100 ± 0

sukoshi

50 ± 0

100 ± 0

imdb1

100 ± 0

imdb2

50 ± 0

100 ± 0

imdb3

50 ± 0

100 ± 0

pharma1

50 ± 0

100 ± 0

pharma2

50 ± 0

97 ± 1

pharma3

50 ± 0

58 ± 5

96 ± 1

zendo1

100 ± 0

zendo2

100 ± 0

72 ± 1

100 ± 0

zendo3

100 ± 0

79 ± 1

zendo4

99 ± 0

92 ± 1

98 ± 1

string1

50 ± 0

100 ± 0

string2

50 ± 0

51 ± 0

100 ± 0

string3

50 ± 0

49 ± 0

100 ± 0

denoising 1c

50 ± 0

52 ± 2

100 ± 0

denoising mc

50 ± 0

60 ± 10

97 ± 2

fill

50 ± 0

59 ± 5

100 ± 0

flip

50 ± 0

58 ± 8

96 ± 4

hollow

55 ± 5

90 ± 6

mirror

60 ± 8

57 ± 1

70 ± 3

move 1p

50 ± 0

100 ± 0

move 2p

50 ± 0

91 ± 9

move 2p dp

50 ± 0

55 ± 2

100 ± 0

move 3p

50 ± 0

85 ± 3

move dp

50 ± 0

55 ± 2

83 ± 5

padded fill

50 ± 0

52 ± 2

86 ± 5

pcopy 1c

50 ± 0

92 ± 6

pcopy mc

50 ± 0

68 ± 6

95 ± 3

recolor cmp

50 ± 0

51 ± 1

68 ± 5

recolor cnt

56 ± 6

50 ± 0

80 ± 4

recolor oe

50 ± 0

75 ± 7

scale dp

50 ± 0

61 ± 8

100 ± 0

Table 5: Predictive accuracies with a 600s timeout. We round accura-

cies to integer values. The error is standard error.