2020 Kuhl
2020 Kuhl
2020 Kuhl
An unbiased molecular approach using 3’UTRs resolves the avian family-level tree of life.
Kuhl H1,2,3, Frankl-Vilches C1, Bakker A1, Mayr G4, Nikolaus G1, Boerno ST2, Klages S2,
Timmermann B2, Gahr M1.
Abstract
Presumably, due to a rapid early diversification, major parts of the higher-level phylogeny of birds
are still resolved controversially in different analyses or are considered unresolvable. To address this
problem, we produced an avian tree of life, which includes molecular sequences of one or several
species of ~ 90% of the currently recognized family-level taxa (429 species, 379 genera) including
all 106 for the non-passerines and 115 for the passerines (Passeriformes). The unconstrained
analyses of noncoding 3-prime untranslated region (3’UTR) sequences and those of coding
sequences yielded different trees. In contrast to the coding sequences, the 3’UTR sequences resulted
in a well-resolved and stable tree topology. The 3’UTR contained, unexpectedly, transcription factor
binding motifs that were specific for different higher-level taxa. In this tree, grebes and flamingos
are the sister clade of all other Neoaves, which are subdivided into five major clades. All non-
passerine taxa were placed with robust statistical support including the long-time enigmatic hoatzin
(Opisthocomiformes), which was found being the sister taxon of the Caprimulgiformes. The
comparatively late radiation of family-level clades of the songbirds (oscine Passeriformes) contrasts
with the attenuated diversification of non-passeriform taxa since the early Miocene. This correlates
with the evolution of vocal production learning, an important speciation factor, which is ancestral
for songbirds and evolved convergent only in hummingbirds and parrots. Since 3’UTR-based
phylotranscriptomics resolved the avian family-level tree of life, we suggest that this procedure will
also resolve the all-species avian tree of life
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the
original work is properly cited.
2
Introduction
The phylogeny of birds has been intensively studied during the last 20 years using anatomical and
molecular data. Several recent molecular approaches, based either on genomes of a limited number
of bird families (Jarvis et al. 2014; Suh et al. 2015) or on a large number of bird families, but only a
selection of molecular sequences (Ericson et al. 2006; Hackett et al. 2008; Prum et al. 2015),
delivered important new insights in the avian tree of life, such as the close relationships of
passerines, parrots and falcons. However, these studies also yielded strongly conflicting results or
Results
The non-coding 3’UTR sequences yield a stable molecular tree of avian family-level taxa. The
transcriptomes of 308 species were assembled de novo, clustered, and integrated with publicly
available transcriptomes (n = 80) and orthologous sequences derived from available genomic data
(n = 59 bird species; n = 2 alligator species) and newly generated genomic data (n = 7) (Tab. S1,
Fig. S1). The new genome assemblies provided in this study were sequenced to about 60-fold,
which, although resulting in fragmented genome assemblies (N50 contig size of 10 to 40 kbp, see
Table 2C), were sufficient for whole genome alignment and phylogenetic tree inference.
4
We performed several tests to estimate the quality of each de novo transcriptome assembly
(Tab. S2). In summary, the median N50 transcript length of the transcriptomes was 2698 ± 811 bp
(mean ± SD) and the median of complete BUSCO genes (Aves data set) was 53.5% ± 19.4% (mean
± SD). The median number of nucleotides aligned to the reference genome’s 3’UTRs and coding
regions were 7.9 ± 3.6 Mbp and 9.0 ± 3.7 Mbp, respectively (Tab. S2). We found differences
between tissue types used for RNA extraction. Transcriptomes from brain exhibited highest
numbers of nucleotides aligned to 3’UTR and CDS of the reference genome (13.6 Mbp and
Comparison of the 3’UTR based tree and trees based on coding sequences. The relationships of
many higher-level taxa in our 3’UTR tree differed from those of the coding sequence trees (Fig.
S4A-S4C). In particular, the coding trees resulted in unlikely relationships of certain higher-level
taxa and did not support monophyly of several currently recognized higher-level taxa. For example,
in the CODON tree, the Caprimulgiformes were split into distantly related sub-groups, and parrots
(Psittaciformes) were moved away from the Passeriformes, which resulted as the sister taxon of the
mousebirds (Coliiformes) (Fig. S4A). In the CODON, CODON12 and the AAS trees the
Falconiformes and Cariamiformes were moved away from the Passeriformes and Psittaciformes to
form an assemblage of birds of prey, embedded deeply in the phylogeny (Fig. 2, Fig. S4A, S4B). In
the AAS tree, even the Strigiformes were enclosed in the birds of prey assemblage (Fig. S4C). By
contrast, in all recent molecular approaches (e.g. Kimball et al. 2013; Jarvis et al. 2014; Prum et al.
2015) including our 3’UTR tree (Fig. 2, Fig. 3A), Passeriformes, Psittaciformes and Falconiformes
are closely related and part of the taxon Australaves (Ericson 2012). This latter clade obtained
strong support from a previous molecular phylogeny (Suh et al. 2011) and the sister group
relationship of Psittaciformes and Passeriformes also conforms with paleontological data (Mayr G
2017). In the CODON12 and AAS tree, the Coliiformes were the closest relatives of the
7
Passeriformes and Psittaciformes (Fig. S4B, S4C). Although there is some anatomical support for
such a relationship (Berman and Raikow 1982), the Coliiformes were grouped with the
Trogoniformes and other Higher Landbird clades in a phylogeny that was based on an analysis of a
large number of morphological characters (Livezey and Zusi 2007). The type of sequences (non-
coding versus coding sequences) for tree construction had no impact on the relationships of higher-
level taxa at the base of the trees, i.e. on the basal position of Palaeognathae and Galloanserae (Fig.
S4), even though noncoding and coding trees differed in the interrelationships of some taxa within
In summary, at the level of composition of higher taxa, the CODON tree (Fig. S4A), the
CODON12 tree (Fig. S4B), and the AAS tree (Fig. S4C) differed considerably from the 3’UTR tree
(Fig. 2) and from currently accepted relationships of avian higher-level taxa.
The 3’UTRs contain motifs specific for higher-level and lower-level clades. To identify signals
specific to higher-level clades that are present in 3’UTRs we compared such sequences from the
Caprimulgiformes, Charadriiformes, and selected subclades of the Passeriformes. The analysis of
putative binding sites of RNA binding proteins and of micro RNAs did not show taxon specific
pattern. However, we found that the presence of putative transcription factor binding sites (TFBS)
differed between higher-level clades, and between clades within the Passeriformes (Fig. 1D): Z-
score analysis of the abundance of TFBS in 3’UTRs of 97 randomly selected transcribed genes
showed high similarity between the closely related estrildid and fringillid songbird families (both in
clade OHC10B), lower similarity between estrildid species and species of basal songbird families
(in clade OHC1-OHC3), and even lower similarity with charadriiform and caprimulgiform species,
8
as expected from their phylogenetic distance (Fig. 1D; see Supplemental Information for discussion
of oscine higher-level clades (named OHCs)).
Furthermore, we analysed the pattern of TFBS in detail within family-level taxa of which we
had multiple species belonging to at least three genera. As an example, we present the pattern of
TFBS in the 3’UTR of the gene EMC1 of the Spheniscidae, the penguins (Fig. S7). The presence of
TFBS in that 3’UTR shows a family specific signature (Fig. S7A) as well genus-specific signatures
for each of the three included genera, that is, Aptenodytes (Fig. S7B), Eudyptes (Fig. S7C) and
The higher-level (order-level) avian tree of life (Fig. 2). The 3’UTR based tree of life resolved the
relationship of all avian orders including the Opisthocomiformes (hoatzins) with good statistical
support. In that phylogeny, extant birds fall into 7 major clades (Fig. 2). Clade 1 represents the
Palaeognathae and Clades 2-7 encompass the Neognathae, which are subdivided into the
Galloanserae (landfowl and waterfowl; Clade 2) and the Neoaves (Clades 3-7) (Fig. 2). Among the
Neoaves, Clade 3 includes the Mirandornithes, the flamingos and grebes, Clade 4 represents the
“Basal Landbirds”, Clade 5 encompasses the “Aquatic and Semiaquatic Birds”, Clade 6 is the
“Higher Landbird Clade”, and Clade 7 represents the Australaves (Ericson 2012; Kimball et al.
2013) (Fig. 2). Four of the 35 order-level relationships were sensitive to the amount of data: if we
decreased the gappiness from 110 to 100 missing samples, the statistical support values of this four
branching points dropped from {99%, 99%, 93%, 89%} support to {88%, 32%, 88%, 83%}
support, but stayed 100% for all other branching points (Fig. 2). In particular, the relationship of the
Strigiformes (SH-aLRT: 88 respectively 32; UFBS: 69) requires further attention.
It should be noted that the composition and interrelationships of Clades 3-6 differ
substantially from previous phylogenies (Hackett et al. 2008; Jetz et al. 2012; Jarvis et al. 2014;
Prum et al. 2015) as discussed below for family-level taxa. Thus, trivial names (Basal Landbirds,
Higher Landbirds, Aquatic & Semiaquatic Birds) used in previous publications and in the current
paper comprise different sets of bird order- and family-level taxa. Nevertheless, the interfamilial
9
relationships within some higher-level subclades were similar between the present study and
previous reports (Prum et al. 2015; Jarvis et al. 2014) (see below).
In the Neoaves (Clades 3-7; Fig. 3A), the Mirandornithes (Clade 3; Fig. 3A), are the sister
taxon of all other taxa, which is in contrast to all previous molecular trees (e.g. Ericson et al. 2006;
Hackett et al. 2008; Jarvis et al. 2014; Prum et al. 2015). In previous works, either a clade
composed of Mirandornithes, Pterocliformes, Mesitornithiformes and Columbiformes (Jarvis et al.
2014) or the Charadriiformes (Prum et al. 2015) were suggested to be the sister group of all other
The avian family-level tree of life. We included all 106 currently recognized non-passerine
families and 90% (115) of the passerine family-level clades, which significantly increased the taxon
sampling compared to previous comprehensive phylogenies (Hackett et al. 2008: 93 non-passerine,
24 passerine family-level clades; Jarvis et al. 2014: 39 non-passerine, 2 passerine family-level
clades; Prum et al. 2015: 91 non-passerine, 31 passerine family-level clades; Oliveros et al. 2019: 5
non-passerine, 125 passerine family-level clades). Since adding families impacts the entire tree, a
phylogeny missing many family-level taxa is unlikely to maintain its higher-level topology, if
further families were included in the phylogenetic analysis. Due to the low amount of sequence data
included in the species-rich study of Hackett et al. (2008) and due to the low number of families in
the sequence-rich study of Jarvis et al. (2014), we restrict the following comparisons mainly to the
Prum et al. (2015) study for non-passerines and to the Oliveros et al. (2019) phylogeny for
passerines. For more detailed considerations of the interfamilial relationships than those presented
below we refer to the Supplemental Discussion.
Within the Palaeognathae (Clade 1) and the Galloanserae (Clade 2), the interrelationships of
the family-level taxa (Fig. 3A) agree with the phylogeny of Prum et al. (2015). Interestingly, among
the Palaeognathae, these relationships differ from those reported on the basis of conserved
noncoding elements and a coalescent inference procedure, in which rheas are the sister group of
cassowaries, emus and kiwis (Sackton et al. 2019). The differences are due to the tree inference
procedure (see Supplementary Information). Furthermore, we confirm the interrelationships of the
seabird subclade of the water bird group of Prum et al. (2015), here informally named Aquatic &
Semiaquatic Birds (Clade 5), but not its sister group relationship to the Caprimulgiformes and
Mirandornithes (Fig. 3A).
A major difference to the higher-level clades recognized before (Prum et al. 2015) concerns
the Clade 4 (Fig. 3A) of our tree, the Basal Landbirds, which comprises two subclades. One of these
(Fig. 3A, Clade 4A) includes Charadriiformes (shorebirds and allies) and Gruiformes (cranes and
10
allies), whereas the other subclade (Fig. 3A, Clade 4B) unites the Musophagiformes (turacos),
Otidiformes (bustards), Mesitornithiformes (mesites), Pterocliformes (sandgrouse), Columbiformes
(doves) and Cuculiformes (cuckoos) on the one hand, and Opisthocomiformes (hoatzins) and
Caprimulgiformes on the other. In the phylogeny of Prum et al. (2015), by contrast, the early
diverging landbirds were subdivided in three higher-level clades and Charadriiformes resulted in a
clade that also contained the aquatic and semiaquatic birds. Furthermore, the interrelationships of
Columbiformes, Cuculiformes, Otidiformes, Musophagiformes, Mesitornithiformes and
Time calibration of the family-level phylogeny. We used DPPDiv (Heath et al. 2012) for time
calibration of our family-level phylogeny. DPPDiv uses the dirichlet process prior (DPP) model or
the uncorrelated gamma-distributed rates (UGR) model. Although these two models yielded broadly
congruent divergence dates for many clades (difference between models: 2.4 ± 4.7 My [mean ±
SD]), they show various differences in detail and none of the results is entirely congruent with the
fossil record (Tab. S4). In general, the UGR model (Fig. 2 & 3) led to divergence times of families
that showed less conflict with time-calibrated fossil data as compared to the use of the DPP model.
E.g., the estimated divergence time of Galliformes and Anseriformes of 62.5 Mya fits well with a
recently reported Mesozoic fossil (66.7 Mya) that is close to the last common ancestor of
Galloanserae (Field et al., 2020) (Tab. S4). Furthermore, for phasianine and odontophorine
Galliformes, the calculated divergence time is 37 Mya, while the earliest record of a galliform
belonging to the clade Odontophorinae + Phasianinae, the taxon Palaeortyx, stems from the early
Oligocene, some 32 Mya (Million years ago) (Mayr G 2017; Zelenkov 2019). The divergence dates
of the UGR model also conform with the fossil record of crown group Procellariiformes,
Gruiformes and Accipitriformes, with fossils of the procellariiform Diomedeidae, the gruiform
Rallidae and the accipitriform Pandioninae having been described from the early Oligocene, some
32-34 Mya (Mayr G 2009; Mayr G 2017). For Mirandornithes, by contrast, the calculated
divergence date of 46 Mya for Podicipediformes and Phoenicopteriformes distinctly predates their
earliest known fossils, the earliest fossil Podicipediformes being from the late Oligocene/earliest
Miocene (about 20 Mya), and the earliest Phoenicopteriformes being from the early Oligocene (32
Mya; Mayr G 2017) (Tab. S4). This suggests substantial ghost lineages for both Podicipediformes
and Phoenicopteriformes. However, most calculated branching points come with large confidence
intervals (Fig. 3A, Fig. S6). Nevertheless, the overall disparity of fossil and molecular age
determinations is rather low, being between 9-11 My, if all fossil data are considered (Tab. S4).
These discrepancies are likely due to either (1) the limited fossil record of certain clades (Mayr G
2017), (2) the existence of clades with a single extant species, which does not allow molecular
12
dating of the diversification of the crown group (van Tuinen et al. 2006), (3) the limited species
sampling and large confidence intervals of some molecular age determinations.
The time-calibrated phylogeny (Fig. 2, Fig. 3A & 3B, Fig. S6) shows divergence dates for
Palaeognathae and Neognathae (94 Mya) and Galloanserae and Neoaves (81 Mya) that are much
earlier than those suggested previously (Prum et al. 2015). The divergence dates of the
Mirandornithes and those of the Basal Landbirds and of the Aquatic & Semiaquatic Bird lineages
precede the K-Pg boundary (Fig. 2). The initial divergences within many other neoavian lineages
By including representatives of all non-passerine families and most passerine families, we show (1)
that the molecular tree based on 3’UTRs is, in bioinformatical terms, the most stable tree as
compared to trees computed from coding sequences and (2) that the 3’UTR tree resolves the higher-
level relationships of all included taxa without any ad-hoc assumptions such as the selection of
certain genes (Prum et al. 2015), or the arbitrary combination of coding and non-coding sequences
(Jarvis et al. 2014). (3) The tree-building capacity of 3’UTRs reflects a strong phylogenetic signal,
which might be related to the presence of transcription factor binding site (TFBS) motifs in the
3’UTRs. (4) Our phylogeny suggests that the avian tree of life can be resolved using a moderate
amount of sequencing data derived from transcriptomes. This avoids specialist knowledge for
assembling entire genomes as well as high bioinformatics costs of comparing large numbers of
large genomes. (5) The resulting 3’UTR-tree shows a well-resolved topology including all avian
order-level taxa, while dividing the Neoaves into five major clades that differ from previous
phylogenies. (6) The Mirandornithes (flamingos and grebes) are the sister group of all Neoaves, and
the hoatzin, a previous phylogenetic enigma, is shown to be closely related to the
Caprimulgiformes. (7) The negative correlation in the temporal diversification of passerine and
non-passerine family-level clades might be due to the vocal learning capacity of oscine passerines
(see Fig. 4).
Are 3’UTRs ideal for molecular tree building? The increased length and the evolution of alternative
3’UTRs, as they are observed in vertebrates, the amount and type of TFBS, as well as protein
binding sites are expected to increase the complexity of species-specific tissue-specific gene
expression regulation (Sandberg et al. 2008; Lianoglou et al. 2013; Cohen et al. 2014; Mayr G
2017; Lee and Mayr 2019; Xu et al. 2019). Here, we demonstrate that 3’UTRs based molecular
trees resolve the avian tree of life with good statistical support throughout. On one hand, the taxon-
specific presence of TFBS in 3’UTRs might just be seen as an indicator of conserved sequences
with yet unknown function. On the other hand, there are increasing observations of a functional role
14
of transcription factor binding to 3’UTRs for transcriptional and post-transcriptional processes.
Regarding the transcriptional role, a simultaneous binding of transcription factors to 5’UTR and
3’UTR has been shown (Tan Wong et al. 2012; Sun et al. 2018; Jash et al. 2012). This suggests that
transcription factors may mediate intra- and inter-molecular loop interactions bringing structurally
together promoter and terminator, which would ensure the RNA polymerase to reload on the
promoter efficiently (Tan Wong et al. 2012; Sun et al. 2018; Jash et al. 2012). Furthermore,
transcription factors bind to RNA; e.g., Wilm’s tumor 1 regulates RNAs through binding to the
Whether the resolution of the avian–level tree of life is due to particular features of 3’UTRs and
their potential importance for avian speciation, or whether it might also be achieved with other
types of non-coding sequences, is open for discussion. Due to the short length of the 5’UTR and to
the nature of transcription, the number of 5’UTR and of intronic sequences in our data were too few
to allow testing of whether these sequences also contain enough evolutionary signal to properly
resolve the tree avian tree of life. Ultra-conserved elements appear to yield a well-resolved
phylogenetic tree of passerine family-level taxa (Oliveros et al. 2019), very similar to the passerine
part of our tree. However, it should be noted that the coding sequences, too, delivered a passerine
part of the avian tree that had many similarities with the 3’UTR tree despite the strong differences
15
concerning the interrelationships of non-passerine orders (data not shown). Furthermore, there are
certain differences between Oliveros-tree (Oliveros et al. 2019) and the passerine part of our 3’UTR
tree, which might be due to the taxa used or the different data types. Thus, we need to see, whether
coding sequences, ultra-conserved elements and 3’UTRs converge on the same phylogenetic tree of
passerine families in case that enough taxa and sequences are included in the tree calculation.
Retroposons are another type of sequence that might resolve the avian phylogeny. However, the
techniques involved in their study require very large genomic data sets. Even though analyses of
Why are trees based on 3’UTR sequences different from those based on coding sequences?
Molecular phylogenetic trees based on bioinformatics tools clearly deliver different trees based on
the function of the sequences used, as has been demonstrated before (Jarvis et al. 2014; Reddy et al.
2017). Trees based on coding sequences (CODON, CODON12, AAS) and non-coding based trees
(3’UTR) are expected to be different due to species-specific selection pressures that favour removal
of single nucleotide mutations of coding DNA as compared to non-coding sequences. In contrast,
mutations in the 3’UTR would only affect certain regulatory elements and in consequence might
affect gene expression in some tissues (Mayr C 2016; Mayr C 2017), but would unlikely stop
protein expression body-wide, as might occur in case of mutations in the coding sequences. Thus,
similar selection pressures due to similar environmental conditions might favour convergent
developments in protein coding sequences of distantly related species. Such an example might
concern the birds of prey that were grouped together in one assemblage in the coding sequence trees
(Fig. S4). However, in the case of vocal learning, another rare avian phenotype, which is present in
the hummingbirds, parrots and passerines (Jarvis et al. 2000; Gahr 2000), these three taxa were not
grouped together in the coding trees (Fig. S4).
16
Molecular sequences and anatomy based trees. Concerning some clades, molecular phylogenies, in
particular those spanning the entire class of birds (e.g. this study; Prum et al. 2015; Jarvis et al.
2014), are substantially different from phylogenies derived from anatomical data (e.g., Livezey and
Zusi 2007). Since the present phylogeny includes all non-passerine families, the differences
between anatomical and molecular trees are unlikely to be due to different taxonomic samplings but
due to the type of data analyzed. Clearly, the relationships of certain higher-level clades in the
molecular phylogenies, such as the relationship of tropicbirds, kagus and sunbitterns and their
Implications of the evolution of vocal learning for the avian tree of life. The peaks of family-level
diversification during the evolution of birds may have been caused by drastic changes of
macroecological niches due to events such as global cooling, the related drop in sea levels and thus
increased connectivity between landmasses and reduced CO2 levels that favoured the spread of
grassland or the desiccation of landmasses (Zachos et al. 2001). Opposite scenarios of
macroecological changes exist for global warming (Zachos et al. 2001). There are, however, no
clear catastrophic or macroecological events except for the progressing global cooling that parallels
the massive passerine radiation in the late Oligocene and the Miocene (Hansen et al. 2013). A recent
paper studying the evolution of the Passeriformes suggested that more complex mechanisms than
temperature change or vacant ecological niches are responsible for passerine radiation events
(Oliveros et al. 2019). Whatever the scenarios may have been, the significant radiation of oscine
passerine family-level taxa since the Miocene strongly contrasts with the subdued diversification of
new non-passerine clades among most arboreal birds and of suboscine passerine family-level taxa
(Fig. 3B, Fig. 4).
The hallmark of songbirds is their singing behavior, which is important for mate choice and
territorial defense. A key-feature of songbird singing behavior is that the songs are learned (Goller
and Shizuka, 2018). Thus, we discuss, whether the evolution of vocal learning contributes to the
17
extraordinary success of songbirds (comprising about half of all avian family-level taxa and species
[ca 4500 species]) and the attenuation of the evolution of non-passerine families in the last twenty
My. Vocal production learning of males occurs in songbird families (suborder oscines of the
Passeriformes), parrot families (Psittaciformes) and in the hummingbird family (Trochilidae of the
Caprimulgiformes) (Baptista and Schuchmann 1990; Cruickshank et al. 1993; Gahr 2000; Jarvis et
al. 2000). The statistical analysis of the distribution of avian families that utter learned vocalizations
showed that vocal production learning evolved three times independently (Fig. S8), as hypothesized
In summary, we suggest that the 3’UTRs contain significant evolutionary signals that result in true
relationships, if used in unbiased phylogenetic tree solving procedures. Therefore, since we
included all non-passerine family-level taxa, the presented phylogenetic tree shows the relationship
of all these taxa, i.e. of all avian orders. The evolutionary timing of the divergences of family-level
taxa suggest that the strong radiation of oscine passerine family-level taxa attenuated the evolution
of new forms of non-passerine taxa in the last twenty My, a process that might involve the evolution
of the vocal learning behaviour of songbirds.
Species and tissue samples (Tab. S1): We produced whole transcriptomes of various tissues,
preferentially brain or blood samples. If possible, brain tissue was used to provide the most complex
transcriptome libraries in terms of number of expressed transcripts. However, for animal protection
reasons, in many cases we used small blood samples or cryopreserved museum specimens (mainly
liver or muscle) that were kindly provided by various collections (Tab. S1).
Different taxonomists recognize somewhat different genera as discrete family-level taxa.
Thus, in 2017, the “Handbook of the Birds of the World [HBW]” recognized 243 families while the
19
“International Ornithological Union [IOU]” recognized 234 families (Del Hoyo and Collar 2014,
2016; Gill and Donsker, 2017). Especially within passerines, the number and identity of recognized
bird families is currently very dynamic and the number of families has increased continuously over
the last 10 years. In 2013, when we started this study, our avian tree of life would have represented
98% (215 of 220) of all IOU families, which is 92% (215 of 234) of all IOU families recognized in
2017 (Gill and Donsker 2013, 2017). Thus, we are missing certain families, because they were only
recognized after our study began. This is primarily due to a split of previous families into several
RNA preparation and sequencing: Isolation of RNA was carried out using Qiagen RNAeasy Mini
Kits (Cat No. 74106) according to the manufacturer’s instructions following the optional DNAse
digestion step using 20 mg of tissue or 50 μl of blood. Blood samples were processed with Sigma
TRI Reagent BD (T3809) according to manufacturer’s instructions. The RNA was extracted from
the aqueous phase according the protocol of the Qiagen RNAeasy Mini Kit (74106). The RNA
quality was assessed with the Agilent 2100 Bioanalyzer Instrument (Model G2939A, Agilent
Technologies RNA). Concentrations were measured with the Nanodrop 1000 spectrometer (Thermo
Fisher Scientific). About 1µg of total RNA per sample was used to construct RNA sequencing
20
libraries using the Trueseq RNA Sample Preparation Kit, v2 (Illumina Inc., San Diego, CA, USA).
The resulting libraries were barcoded and analyzed on Illumina Hiseq 2500 and HiSeq 4000
systems. The sequencing protocol was set to high output mode with paired end 50 or 75 b reads. We
aimed at an output of 60-100 million reads per sample.
De novo transcript assembly: RNA sequencing short read data was de novo assembled using the
Genomic data sources: To extend our species list, we extracted the putative transcriptomes (i.e.
sequences homologous to those of our sequenced transcriptomes) from published bird genomes
(Tab. S1). Genome assemblies (57 bird species, 2 alligators) were down-loaded from
NCBI/ENSEMBL repositories. The seven de novo assembled genomes are available at Dryad
(doi:10.5061/dryad.ngf1vhhpx) (Tab. S1).
Transcript and genome multiple alignments to reference genome: The canary (Serinus canaria)
genome was used as reference genome during the subsequent mapping steps of all transcriptomes
(Fig. S1). To construct pairwise alignments of genomes and transcriptomes we used LAST aligner
version 266 (Kielbasa et al. 2011), as it provides high sensitivity to align even distantly related
genomes and transcriptomes in a computationally effective manner. Output MAF (multiple
alignment format) was filtered for orthologous alignments using single_cov2 from the
TBA/MULTIZ package (Blanchette et al., 2004) (2-way filtering, ref→query and query→ref). The
pairwise transcriptome/genome alignments were combined into a multiple genome alignment using
MULTIZ. All required steps were run on split parts of the reference genome by custom scripts using
GNU PARALLEL (Tange 2011) to enable the use of multi-threaded CPUs. The final MAF is
available from Dryad (doi:10.5061/dryad.ngf1vhhpx).
21
Extraction of coding, non-coding and codon-based multiple alignments: We used our annotation
of the canary genome (http://public-genomes-ngs.molgen.mpg.de) to define bed files with
coordinates of the coding, 3’UTR and 5’UTR, intronic and intergenic regions of the genome. The
mafsInRegion tool from the Kent utilities (Kent et al. 2002) was used to extract the different
fractions of the genome into coding/non-coding MAF files according to the bed files. Further
processing of the alignments included removing alignments that did not align with an outgroup
species to remove potential reference bias; here we used the ostrich (Struthio camelus) as a „must
Finding a suitable gap versus data content for the multiple alignments: We filtered the multiple
alignment fasta files to allow only a certain amount of missing data per column. In this regard, we
generated multiple alignments with the following numbers of allowed gaps per alignment column:
10, 20, 40, 60, 80, 90, 100, 110, 120, 140, 160. Alignments for 3’UTR, CODON or AAS with
different gap cut-offs are available from Dryad (doi:10.5061/dryad.ngf1vhhpx). For each alignment
10 trees were calculated using different maximum parsimony starting trees and RAxML (v8.2.4;
Stamatakis, 2014) for fast approximate tree inference (parameter: –f E –m GTRCAT or PROTCAT
for amino acid sequence) and subsequent Nearest Neighbour Interchange (NNI) refinement and SH-
aLRT support calculation (parameter –f J –m GTRGAMMA or PROTGAMMA for amino acid
sequence). RAxML and other tools for ML tree inference use heuristic methods to infer tree
topologies and are often unable to find the best fitting tree topology by a single run. We found that
for our dataset 10 replicates were a good trade-off between computational time needed and
probability of finding the best fitting topology when using 3’UTR alignments. To assess tree
22
topology convergence, 10 trees for each alignment file were compared to each other by calculating
pair-wise Robinson-Foulds (RF) distances (RAxML –f r option).
Additionally, we computed coalescent consensus trees for the 10 trees per alignment file
(subsets) by ASTRAL-III v5.6.1 (Zhang et al., 2018). We then calculated the Robinson-Foulds
distances of the neighbouring subset coalescent trees (subset1 vs. subset2; subset2 vs. subset3;…).
Final phylogenetic species tree calculations by a coalescent approach: Besides the tree inference
from concatenated alignments, we also tested inferring the species tree using a coalescence
approach. We calculated 5,127 trees for 3’UTRs of distinct gene models using iqtree (version
Time-calibrated phylogenetic tree: We used DPPDiv (Heath et al. 2012) for time calibration of our
family-level phylogeny, which used the dirichlet process prior (DPP) model or the uncorrelated
gamma-distributed rates (UGR) model. This tool has the advantage of being able to use parallel
computation. Nevertheless, we had to downscale the included amount of data to an alignment length
of 10,000 - 100,000 nucleotides to finish computation within a reasonable amount of time (several
days to weeks) on high-power computing servers (96 or 192 CPU threads). Calculations were
performed twice using different starting conditions (with/without parameter: -ubl) and we checked
for convergence for both, the dirichlet process prior (DPP) model and the uncorrelated gamma-
distributed rates model (parameter: -urg), named UGR model (Heath et al. 2012). The analyses were
run until linear correlation of the median divergence times between the two runs of the same model
reached R2-values larger than 0.99 and a slope of 1.0. Eighteen nodes in the tree were calibrated
with fossil data, which were also used before (Jarvis et al. 2014); the divergence date for the split of
pigeons and Mirandornithes was omitted due to differences between the Jarvis tree (Jarvis et al.
2014) and our 3’UTR tree (Fig. 3A). The time-calibrated tree was visualized with FigTree
(http://tree.bio.ed.ac.uk/ software/figtree/).
Extracting 3’UTR sequences from the RNAseq assemblies for the detection of transcription
binding sites and transcription binding sites models: To detect phylogenetically relevant signals in
24
3’UTRs we compared species of the orders Charadriiformes, Caprimulgiformes and Passeriformes.
For family comparisons within the Passeriformes, we compared species of the Estrildidae, the
Fringillidae and of Basal Oscines; the latter is an artificial group including species of the basal
radiations of oscines (see Tab. 3 for species). We compared the 3’UTRs of 97 randomly selected
genes (Tab. S3) among the species.
We carried out analyses using the Genomatix software suite (Precigen Bioinformatics
Germany GmbH) combining several mining sources (Over-representation of transcription binding
Literature-based analysis of singing: Vocal production learning, abbreviated in this paper as vocal
learning, was considered present in a species, if studies had reported imitation of conspecifics,
mimicry of heterospecifics or mimicry of non-bird sounds in that species, or if studies had reported
local dialects in that species. As sources, we studied all available publications as well as various
encyclopaedias, the Handbook of the Birds of the World, the Handbook of Western Palearctic Birds,
the Handbook of Australian, New Zealand and Antarctic Birds, and The Birds of Africa. For the
family level analysis, we considered vocal learning as present in a family, if at least one species of a
family fulfilled the criteria above. The family-level taxa and related references to vocal learning are
listed in Tab. S5. To test the association between the phylogenetic tree and the occurrence of vocal
learning, in the family-level taxa we used TreeBreaker (https://github.com/ansariazim/treeBreaker),
an inference procedure based on a Bayesian statistical method (Ansari and Didelot 2016). The
software uses a Bayesian model to deduce whether the phenotype of interest is randomly distributed
25
on the tips of the tree and to estimate which clades, if any, have a distinct distribution from the rest
of the tree (Ansari et al. 2019).
Supplementary Material
Supplementary Figures
Supplementary Tables
Supplementary Discussion
Acknowledgements
Funding
This work was funded through a grant of the president of the Max Planck Society to M. Gahr. Some
methods applied here were developed for a project of H. Kuhl, funded by the German Research
Foundation (DFG) “Eigene Stelle” grant to Heiner Kuhl: KU 3596/1-1; project number 324050651.
Authors contributions
HK: design of bioinformatic pipeline, data processing and analysis, and manuscript writing. CF:
Comparative analysis of 3’UTR structure. AB: Preparation of all RNAs and DNAs. GM: evaluation
of time-calibration and fossil data. GN: tissue sampling. STB, SK and BT: Sequencing. MG:
concept, tissue sampling, meta-analysis of vocal learning, writing of the manuscript.
Data availability
26
Transcriptome assemblies used in this study have been made available through as a Dryad archive
(https://doi.org/10.5061/dryad.ngf1vhhpx). Transcriptome sequencing reads are available under the
BioProject accession number: PRJNA599522.
References
Abascal F, Zardoya R, Telford MJ. 2010. TranslatorX: multiple alignment of nucleotide sequences
Aggerbeck M, Fjeldsa J, Christidis L, Fabre PH, Jonsson KA. 2014. Resolving deep lineage
divergences in core corvoid passerine birds supports a proto-Papuan island origin. Mol Phylogenet
Evol. 70:272-285.
Ansari MA, Aranday-Cortes E, Ip CL, da Silva Filipe A, Lau SH, Bamford C, Bonsall D, Trebes A,
Piazza P, Sreenu V, et al. 2019. Interferon lambda 4 impacts the genetic diversity of hepatitis C
virus. Elife. 8:e42463
Ansari MA, Didelot X. 2016. Bayesian Inference of the Evolution of a Phenotype Distribution on a
Phylogenetic Tree. Genetics. 204:89-98.
Armstrong EA. 1963. A study of bird song. London: Oxford University Press.
Baker AJ, Pereira SL, Paton TA. 2007. Phylogenetic relationships and divergence times of
Charadriiformes genera: multigene evidence for the Cretaceous origin of at least 14 clades of
shorebirds. Biol Lett. 3:205-209.
Baptista LF, Schuchmann KL. 1990. Song Learning in the Anna Hummingbird. Ethol. 84:15-26.
Barker FK, Cibois A, Schikler P, Feinstein J, Cracraft J. 2004. Phylogeny and diversification of the
largest avian radiation. Proc Natl Acad Sci USA 101:11040-11045.
Bates GL. 1930. Handbook of the birds of West Africa. London: Bale Sons and Danielson.
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K,
Clawson H, Green ED, et al. 2004. Aligning multiple genomic sequences with the threaded blockset
aligner. Genome Res. 14:708-715.
Brown RP, Yang Z. 2011. Rate variation and estimation of divergence times using strict and relaxed
clocks. BMC Evol Biol. 11:271.
27
Burgess SJ, Reyna-Llorens I, Stevenson SR, Singh P, Jaeger K, Hibberd JM. 2019. Genome-Wide
Transcription Factor Binding in Leaves from C3 and C4 Grasses. The Plant cell 31: 2297-2314.
Chen A, White ND, Benson RBJ, Braun MJ, Field DJ. 2019. Total-Evidence Framework Reveals
Complex Morphological Evolution in Nightbirds (Strisores). Diversity. 11(9):143.
Chung PJ, Jung H, Choi YD, Kim JK. 2018. Genome-wide analyses of direct target genes of four
rice NAC-domain transcription factors involved in drought tolerance. BMC genomics. 19:40.
Claramunt S, Cracraft J. 2015. A new time tree reveals Earth history's imprint on the evolution of
modern birds. Sci Adv 1:e1501005.
Cohen JE, Lee PR, Fields RD. 2014. Systematic identification of 3'-UTR regulatory elements in
activity-dependent mRNA stability in hippocampal neurons. Philos Trans R Soc Lond B Biol Sci
369(1652):20130509.
Cruickshank AJ, Gautier J-P, Chappuis C. 1993. Vocal mimicry in wild African Grey Parrots
Psittacus erithacus. Ibis. 135:293-299.
Del Hoyo J, Collar NJ. 2014. HBW and BirdLife International Illustrated Checklist of the Birds of
the World. Volume 1: Non-passerines. Barcelona, España: Lynx Edicions.
Del Hoyo J, Collar NJ. 2016. HBW and BirdLife International Illustrated Checklist of the Birds of
the World. Volume 2: Passerines. Barcelona, España: Lynx Edicions.
Ericson PGP, Anderson CL, Britton T, Elzanowski A, Johansson US, Kallersjo M, Ohlson JI,
Parsons TJ, Zuccon D, Mayr G. 2006. Diversification of Neoaves: integration of molecular
sequence data and fossils. Biol Lett. 2:543-547.
Ericson PGP. 2012. Evolution of terrestrial birds in three continents: biogeography and parallel
radiations. J Biogeogr. 39:813–824.
Ericson PGP, Klopfstein S, Irestedt M, Nguyen JMT, Nylander JAA. 2014. Dating the
diversification of the major lineages of Passeriformes (Aves). BMC Evol Biol. 14:8-8.
Ferdous MM, Bao Y, Vinciotti V, Liu X, Wilson P. 2018. Predicting gene expression from genome
wide protein binding profiles. Neurocomputing. 275:1490-1499.
Field DJ, Benito J, Chen A, Jagt JWM, Ksepka DT (2020) Late Cretaceous neornithine from
Europe illuminates the origins of crown birds. Nature. 579:397-401.
Gahr M. 2000. Neural song control system of hummingbirds: comparison to swifts, vocal learning
(Songbirds) and nonlearning (Suboscines) passerines, and vocal learning (Budgerigars) and
nonlearning (Dove, owl, gull, quail, chicken) nonpasserines. J Comp Neurol. 426:182-196.
Gill F, Donsker D. 2017. IOC World Bird List (version 7.3). International Ornithologists’ Union
(doi: 10.14344/IOC.ML.7.2).
Goller M, Shizuka D. 2018. Evolutionary origins of vocal mimicry in songbirds. Evol Lett. 2:417-
426.
Hackett SJ, Kimball RT, Reddy S, Bowie RC, Braun EL, Braun MJ, Chojnowski JL, Cox WA, Han
K-L, Harshman J. 2008. A phylogenomic study of birds reveals their evolutionary history. Science.
320:1763-1768.
Hansen J, Sato M, Russell G, Kharecha P. 2013. Climate sensitivity, sea level and atmospheric
carbon dioxide. Phil Trans Royal Soc A: Math Phys Eng Sci. 371:20120294.
Heath TA, Holder MT, Huelsenbeck JP. 2012. A Dirichlet process prior for estimating lineage-
specific substitution rates. Mol Biol Evol. 29:939-55.
Ho Sui SJ, Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, Wasserman WW. 2005.
oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed
genes. Nucl Acids Res. 33:3154-3164.
Hooper DM, Price TD. 2015. Rates of karyotypic evolution in Estrildid finches differ between
island and continental clades. Evol. 69:890-903.
Houde P, Braun EL, Narula N, Minjares U, Mirarab S. 2019. Phylogenetic Signal of Indels and the
Neoavian Radiation. Diversity. 11:108.
Houde P, Braun EL, Zhou L. 2020. Deep-Time Demographic Inference Suggests Ecological
Release as Driver of Neoavian Adaptive Radiation. Diversity. 12(4): 164.
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SY, Faircloth BC, Nabholz B, Howard
JT, et al. 2014. Whole-genome analyses resolve early branches in the tree of life of modern birds.
Science. 346:1320-1331.
Jarvis ED, Ribeiro S, da Silva ML, Ventura D, Vielliard J, Mello CV. 2000. Behaviourally driven
gene expression reveals song nuclei in hummingbird brain. Nature. 406:628-632.
Jash A, Yun K, Sahoo A, So JS, Im SH. 2012. Looping mediated interaction between the promoter
and 3' UTR regulates type II collagen expression in chondrocytes. PloS one 7:e40828.
29
Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO. 2012. The global diversity of birds in space
and time. Nature. 491:444.
Johansson US, Ekman J, Bowie RC, Halvarsson P, Ohlson JI, Price TD, Ericson PG. 2013. A
complete multilocus species phylogeny of the tits and chickadees. Mol Phylogenet Evol. 69:852-
860.
Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7:
improvements in performance and usability. Mol Biol Evol. 30:772-780.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The
human genome browser at UCSC. Genome Res 12:996-1006.
Kielbasa SM, Wan R, Sato K, Horton P, Frith MC. 2011. Adaptive seeds tame genomic sequence
comparison. Genome Res. 21:487-493.
Kimball RT, Wang N, Heimer-McGinn V, Ferguson C, Braun EL. 2013. Identifying localized biases
in large datasets: A case study using the Avian Tree of Life. Mol Phylogenet Evol 69:1021–1032.
Kimball RT, Oliveros CH, Wang N, White ND, Barker FK, Field DJ, Ksepka DT, Chesser RT,
Moyle RG, Braun MJ, et al. 2019. A Phylogenomic Supertree of Birds. Diversity. 11:109.
Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. 2019. RAxML-NG: a fast, scalable and
user-friendly tool for maximum likelihood phylogenetic inference, Bioinformatics. 21:4453–4455
Ksepka DT, Stidham TA, Williamson TE. 2017. Early Paleocene landbird supports rapid
phylogenetic and morphological diversification of crown birds after the K–Pg mass extinction. Proc
Nat Acad Sci. USA 114:8047-8052.
Kuramoto T, Nishihara H, Watanabe M, Okada N. 2015. Determining the Position of Storks on the
Phylogenetic Tree of Waterbirds by Retroposon Insertion Analysis. Genome Biol Evol. 7:3180-
3189.
Lee SH, Mayr C. 2019. Gain of Additional BIRC3 Protein Functions through 3'-UTR-Mediated
Protein Complex Formation. Molecular Cell. 74(4):701-712. e9. doi: 10.1016/j.molcel.2019.03.006.
Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or
nucleotide sequences. Bioinformatics. 22:1658-1659.
Lianoglou S, Garg V, Yang JL, Leslie CS, Mayr C. 2013. Ubiquitously transcribed genes use
alternative polyadenylation to achieve tissue-specific expression. Genes Dev. 27:2380-2396.
Liu WC, Wada K, Jarvis ED, Nottebohm F. 2013. Rudimentary substrates for vocal learning in a
suboscine. Nat Commun. 4:2082.
Livezey BC, Zusi RL. 2007. Higher-order phylogeny of modern birds (Theropoda, Aves:
Neornithes) based on comparative anatomy. II. Analysis and discussion. Zool J Linn Soc. 149:1-95.
30
Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, Warren WC, Mello CV. 2014.
Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 15:565.
Mason NA, Burns KJ, Tobias JA, Claramunt S, Seddon N, Derryberry EP. 2017. Song evolution,
speciation, and vocal learning in passerine birds. Evolution. 71:786-796.
Mayr C. 2016. Evolution and biological roles of alternative 3′ UTRs. Trends Cell Biol. 26:227-237.
Mayr G. 2006. The contribution of fossils to the reconstruction of the higher-level phylogeny of
birds. Species, Phylogeny and Evolution 1:59-64.
Mayr G. 2010. Phylogenetic relationships of the paraphyletic ‘caprimulgiform’ birds (nightjars and
allies). J Zool Syst Evol Res. 48:126-137.
Mayr G. 2014. The origins of crown group birds: molecules and fossils. Palaeontol. 57:231-242.
Mayr G. 2017. Avian Evolution: The Fossil Record of Birds and Its Paleobiological Significance.
Chichester: John Wiley & Sons Ltd.
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R.
2020. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic
Era. Mol Biol Evol. 37(5):1530-1534.
Mirarab S, Reaz R, Bayzid S, Zimmermann T, Swenson MS, Warnow T. 2014. ASTRAL: Genome-
Scale Coalescent-Based Species Tree Estimation. Bioinformatics. 30:i541–i548.
Nottebohm F, Stokes TM, Leonard CM. 1976. Central control of song in the canary, Serinus
canarius. J Comp Neurol. 165:457-486.
Oliveros CH, Field DJ, Ksepka DT, Barker FK, Aleixo A, Andersen MJ, Alstrom P, Benz BW,
Braun EL, Braun MJ, et al. 2019. Earth history and the passerine superradiation. Proc Natl Acad Sci
USA 116:7916-7925.
Pacheco MA, Battistuzzi FU, Lentino M, Aguilar RF, Kumar S, Escalante AA. 2011. Evolution of
modern birds revealed by mitogenomics: timing the radiation and origin of major orders. Mol Biol
Evol. 28(6):1927-1942.
Peña -Hernández R, Marques M, Hilmi K, Zhao T, Saad A, del Rincon SV, Ashworth T, Roy AL,
Emerson BM, Witcher M. 2015. Genome-wide targeting of the epigenetic regulatory protein CTCF
to gene promoters by the transcription factor TFII-I. Proc Natl Acad Sci USA. 112: E677-E686.
Peng Y, Leung HC, Yiu SM, Chin FY. 2012. IDBA-UD: a de novo assembler for single-cell and
metagenomic sequencing data with highly uneven depth. Bioinformatics. 28:1420-1428.
31
Penrad-Mobayed M, Perrin C, L'Hote D, Contremoulins V, Lepesant JA, Boizet-Bonhoure B,
Poulat F, Baudin X, Veitia RA. 2018. A role for SOX9 in post-transcriptional processes: insights
from the amphibian oocyte. Sci Rep. 8:7191.
Posada D. 2003. Using MODELTEST and PAUP* to select a model of nucleotide substitution.
Current protocols in bioinformatics. New York. John Wiley & Sons.
Prum RO, Berv JS, Dornburg A, Field DJ, Townsend JP, Lemmon EM, Lemmon AR. 2015. A
comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature.
526:569-573.
Robinson CM, Snyder KT, Creanza N. 2019. Correlated evolution between repertoire size and song
plasticity predicts that sexual selection on song promotes open-ended learning. Elife. 8:e44454.
Robinson FN, Curtis HS. 1996. The Vocal Displays of the Lyrebirds (Menuridae). Emu - Austral
Ornithol. 96:258-275.
Sackton TB, Grayson P, Cloutier A, Hu Z, Liu JS, Wheeler NE, Garnder PP, Clarke JA, Baker AJ,
Clamp N, Edwards SV. 2019. Convergent regulatory evolution and loss of flight in paleognathous
birds. Science 364:74-78.
Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. 2008. Proliferating cells express mRNAs
with shortened 3' untranslated regions and fewer microRNA target sites. Science. 320:1643-1647.
Sibley CG, Ahlquist JE. 1990. Phylogeny and Classification of the Birds. A Study in Molecular
Evolution: Yale University Press.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing
genome assembly and annotation completeness with single-copy orthologs. Bioinformatics.
31:3210-12.
Smith BT, McCormack JE, Cuervo AM, Hickerson MJ, Aleixo A, Cadena CD, Perez-Eman J,
Burney CW, Xie X, Harvey MG, et al. 2014. The drivers of tropical speciation. Nature. 515:406-
409.
Sorenson MD, Balakrishnan CN, Payne RB. 2004. Clade-limited colonization in brood parasitic
finches (Vidua spp.). Syst Biol. 53:140-153.
Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large
phylogenies. Bioinformatics. 30:1312-1313.
Suh A, Smeds L, Ellegren H. 2015. The Dynamics of Incomplete Lineage Sorting across the
Ancient Adaptive Radiation of Neoavian Birds. PLOS Biol. 13:e1002224.
Sun X, Wang X, Tang Z, Grivainis M, Kahler D, Yun C, Mita P, Fenyö D, Boeke JD. 2018.
Transcription factor profiling reveals molecular choreography and key regulators of human
retrotransposon expression. Proc Natl Acad Sci USA 115: E5526–E5535.
Tan-Wong SM, Zaugg JB, Camblong J, Xu Z, Zhang DW, Mischo HE, Ansari AZ, Luscombe NM,
Steinmetz LM, Proudfoot NJ. 2012. Gene loops enhance transcriptional directionality. Science.
338:671-675.
Tavaré S. 1986. Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences.
Lect Math Life Sci. 17:57–86
Tuğrul M, Paixão T, Barton NH, Tkačik G. 2015. Dynamics of Transcription Factor Binding Site
Evolution. PLoS Genet.11(11):e1005639.
Van Tuinen M, Butvill DB, Kirsch JA, Hedges SB. 2001. Convergence and divergence in the
evolution of aquatic birds. Proc R Soc B. 268:1345-1350.
Van Tuinen M, Stidham TA, Hadly EA. 2006. Tempo and mode of modern bird evolution observed
with large-scale taxonomonc sampling. Hist Biol. 18:209-225.
Vicario DS. 2004. Using learned calls to study sensory-motor integration in songbirds. Ann N Y
Acad Sci. 1016:246-262.
Wirthlin M, Lima NC, Guedes RLM, Soares AE, Almeida LGP, Cavaleiro NP, de Morais GL,
Chaves AV, Howard JT, de Melo Teixeira M. 2018. Parrot genomes and the evolution of heightened
longevity and cognition. Curr Biol. 28:4001-4008. e4007.
Xiong P, Hulsey CD, Meyer A, Franchini P. 2018. Evolutionary divergence of 3' UTRs in cichlid
fishes. BMC Genomics. 19:433.
Xu L, Peng L, Gu T, Yu D, Yao Y-G. 2019. The 3′UTR of human MAVS mRNA contains multiple
regulatory elements for the control of protein expression and subcellular localization. BBA - Gene
Regul Mech. 1862:47-57.
Yin ZT, Zhu F, Lin FB, Jia T, Wang Z, Sun DT, Li GS, Zhang CL, Smith J, Yang N, et al. 2019.
Revisiting avian 'missing' genes from de novo assembled transcripts. BMC Genomics. 20:4.
Zachos J, Pagani M, Sloan L, Thomas E, Billups K. 2001. Trends, rhythms, and aberrations in
global climate 65 Ma to present. Science. 292:686-693.
33
Zann R. 1990. Song and call learning in wild zebra finches in south-east Australia. Anim Behav.
40:811-828.
Zelenkov N. 2019. Systematic Position of Palaeortyx (Aves, Phasianidae) and Notes on the
Evolution of Phasianidae. J Paleontol. 53:194-202.
Zhang C, Rabiee M, Sayyari E, Mirarab S. 2018. ASTRAL-III: polynomial time species tree
reconstruction from partially resolved gene trees. BMC Bioinformatics. 19:153.
Zharkikh A. 1994. Estimation of evolutionary distances between nucleotide sequences. J Mol Evol
Zuccon D, Prŷs-Jones R, Rasmussen PC, Ericson PGP. 2012. The phylogenetic relationships and
generic limits of finches (Fringillidae). Mol Phylogenet Evol. 62:581-596.
34
Figure Legends
Fig. 1: Analysis of tree topology congruency for different non-coding and coding data types (A, B,
C) and taxon-specific sequences in 3’UTRs (D). In A, multiple tree inferences using distinct starting
trees and subsequent refinement by NNI (= Nearest Neighbour Interchange) moves resulted in a
better tree topology congruency (lower Robinson-Foulds distance) for 3‘UTR trees (UTR = 3’UTRs
of all species; UTR393 = 3’UTRs including only 7 genomes of which no transcriptomes were
Fig. 2: Order-level phylogeny of the birds resulting from the analysis of 3’UTRs of 224 avian
family-level taxa including 379 genera and 429 species (see Fig. 3A & 3B for all families, Fig. S6
for all species). In contrast to all previous phylogenies spanning the entire avian class, the statistical
35
support values are high throughout, i.e. the approximate likelihood-based measures of branch
supports was maximal (SH-aLRT = 100) in most cases, except for four branching points (red
values). If we reduced the number of missing samples (gappiness) from 110 to 100, the support
levels of these four branching points dropped (blue values) while all others remained maximal. In
case of SH-aLRT values below 100, we provide the support values from IQTREE2 ultrafast
bootstrapping (green values). The tree is subdivided in seven higher-level clades, the Palaeognathae,
the Galloanserae, the Mirandornithes, the Basal Landbirds, the Aquatic & Semiaquatic Birds, the
Fig. 3: A family-level phylogeny of birds based on 3’UTR sequences including all (106) non-
passerine (Fig. 3A) and most (115) passerine (Fig. 3B) family-level taxa. For simplicity, each of the
families is represented by one species, listed as the species name, followed by the family name and
the order name. In Fig. 3A, the family-level taxa of the seven higher-level clades, the
Palaeognathae, the Galloanserae, the Mirandornithes, the Basal Landbirds, the Aquatic &
Semiaquatic Birds, the Higher Landbirds, and the Australaves are shown. The higher-level clades
are color-coded as in Fig. 2. Of the Passeriformes (Fig. 3B), the suborders Acanthisitti (New
Zealand wrens), Tyranni (sub-oscines) and Passeri (oscines or songbirds) are indicated and the
Passeri is subdivided into ten oscine-higher clades (OHCs). The tree was calculated by RAxML-ng
using a large concatenated alignment of 3'UTR residues as input (2,584,785 analyzable patterns,
maximum 100 or 110 missing taxa (gappiness). Approximate likelihood-based measures of branch
support delivered maximal values (SH-aLRT = 100) except those shown in red (for 110-gappiness)
and blue (for 100-gappiness). SH-aLRT values are considered as quite conservative. In case of SH-
aLRT values below 100, we also provide support values from IQTREE2 ultrafast bootstrapping
(UFBS, green values). In the few cases were SH-aLRT support was below 80 (two for 110-
gappiness; seven for 100-gappiness), the UFBS approach still reached good values of support in the
36
range of 86 – 99. The timing of the branching points was calculated by DPPDiv. The entire tree
including all 429 species is provided in Fig. S6. Error bars are confidence intervals (95%). Time
scale and divergence times are in Million years ago. Diagonal bars indicate the part of the tree that
is not scaled in order to reduce the size of the tree and the PDF.
Fig. 4: The diversification of oscine passerine families (red) contrasts with that of suboscine
passerine families (green) and of non-passerine families (blue) after the early Miocene epoch. The