2020 Kuhl

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

1

An unbiased molecular approach using 3’UTRs resolves the avian family-level tree of life.

Kuhl H1,2,3, Frankl-Vilches C1, Bakker A1, Mayr G4, Nikolaus G1, Boerno ST2, Klages S2,
Timmermann B2, Gahr M1.

1: Max Planck Institute for Ornithology, Department Behavioural Neurobiology, Seewiesen,

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Germany; 2: Max Planck Institute for Molecular Genetics, Sequencing Core facility, Berlin,
Germany; 3: Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Dept. Ecophysiology
and Aquaculture, Berlin, Germany. 4: Senckenberg Research Institute, Frankfurt am Main,
Germany

Corresponding author: Manfred Gahr. [email protected]

Abstract
Presumably, due to a rapid early diversification, major parts of the higher-level phylogeny of birds
are still resolved controversially in different analyses or are considered unresolvable. To address this
problem, we produced an avian tree of life, which includes molecular sequences of one or several
species of ~ 90% of the currently recognized family-level taxa (429 species, 379 genera) including
all 106 for the non-passerines and 115 for the passerines (Passeriformes). The unconstrained
analyses of noncoding 3-prime untranslated region (3’UTR) sequences and those of coding
sequences yielded different trees. In contrast to the coding sequences, the 3’UTR sequences resulted
in a well-resolved and stable tree topology. The 3’UTR contained, unexpectedly, transcription factor
binding motifs that were specific for different higher-level taxa. In this tree, grebes and flamingos
are the sister clade of all other Neoaves, which are subdivided into five major clades. All non-
passerine taxa were placed with robust statistical support including the long-time enigmatic hoatzin
(Opisthocomiformes), which was found being the sister taxon of the Caprimulgiformes. The
comparatively late radiation of family-level clades of the songbirds (oscine Passeriformes) contrasts
with the attenuated diversification of non-passeriform taxa since the early Miocene. This correlates
with the evolution of vocal production learning, an important speciation factor, which is ancestral
for songbirds and evolved convergent only in hummingbirds and parrots. Since 3’UTR-based
phylotranscriptomics resolved the avian family-level tree of life, we suggest that this procedure will
also resolve the all-species avian tree of life

© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.  
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the 
original work is properly cited. 
2
Introduction
The phylogeny of birds has been intensively studied during the last 20 years using anatomical and
molecular data. Several recent molecular approaches, based either on genomes of a limited number
of bird families (Jarvis et al. 2014; Suh et al. 2015) or on a large number of bird families, but only a
selection of molecular sequences (Ericson et al. 2006; Hackett et al. 2008; Prum et al. 2015),
delivered important new insights in the avian tree of life, such as the close relationships of
passerines, parrots and falcons. However, these studies also yielded strongly conflicting results or

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


had low statistical support for a number of neoavian clades (Prum et al. 2015). This was interpreted
either as a result of a hard-to-resolve ancient diversification of modern birds (Jarvis et al. 2014) or
as a result of incomplete lineage sorting (Suh et al. 2015). In particular, these studies did not contain
all non-passerine families. Furthermore, although oscine passerines, the songbirds, constitute the
majority of the extant avian diversity, previous studies aiming on resolving the entire avian tree of
life have only included limited numbers of oscine taxa in their analyses (Prum et al. 2015; Jarvis et
al. 2014). The scope of broader previous approaches to songbird families using molecular
information were based either on few selected genes and little sequence information (Barker et al.
2004), or these analyses included no or few non-passerine species (Barker et al. 2004; Oliveros et
al. 2019).
Here, we present a family-level avian tree of life that is based on transcriptome sequences or
their genomic orthologs and that involved 214 of the 227 families recognized by both the
International Ornithological Union (IOU) (Gill and Donsker 2017) and the Handbook of the Birds
of the World (HBW) (Del Hoyo and Collar 2014, 2016), which is a widely used standard reference
covering all extant avian species (Tab. S1). In particular, our analysis covers all 106 currently
recognized non-passerine families.
Previous molecular approaches to the avian tree of life resulted in important differences
dependent on the usage of coding nuclear genome sequences, non-coding nuclear genome
sequences, a mixture of both, or the use of mitochondrial genomes (e.g. Hackett et al. 2008; Jetz et
al. 2012; Jarvis et al. 2014; Prum et al. 2015; Reddy et al. 2017). Non-coding sequences seem
favourable to resolve the avian tree of life, since increased taxon sampling had a positive impact on
phylogenetic procedures in the case of non-coding sequences, but not in the case of coding
sequence sampling (Reddy et al. 2017; Houde et al. 2019).
By contrast, we based our phylogeny on transcriptomes, which is a novel approach.
Transcriptomes are composed of coding and non-coding sequences; the latter include 3’
untranslated regions (3’UTRs), 5’ untranslated regions (5’UTRs), non-coding RNAs, and,
unspecifically, few intronic and intergenic sequences. From the sequenced RNA, we produced de
novo assembled transcriptomes and mapped them to the backbone of the recently published canary
3
genome (Frankl-Vilches et al. 2015). We used the canary genome because it is well annotated for
coding sequences and UTR sequences. For some species (representing their respective families), we
used available genomes in order to extract coding and UTR sequences that were homologous to the
gene models of the canary and many other taxa. Thus, we were able to produce phylogenetic trees
from varying amounts of coding or non-coding sequences in order to evaluate at which taxonomic
level (order, family, genus) these trees differ and which might be the least error-prone and
statistically most stable representation of the avian phylogeny.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


3’UTRs are located directly downstream of protein-coding DNA sequences and contain cis-
regulatory elements that control mRNA stability, mRNA expression levels, mRNA localization,
protein-protein interactions and diversification of protein function (Mayr C 2016; Mayr C 2017).
New sequencing technologies and genome wide analysis via ChIP-on-chip and Chip-seq showed
that up to 5% of DNA-protein binding sites are located within 3’UTRs (Ferdous et al. 2018;
Stergachis et al. 2013; Burges et al. 2019, Peña-Hernandez et al. 2014, Chung et al. 2018). The
increased and variable length of alternative 3’UTRs, as they are observed in vertebrates, and the
amount and types of binding sites for transcription factors and RNA binding proteins are expected
to promote species-specific tissue-specific gene expression (Sandberg et al. 2008; Lianoglou et al.
2013; Cohen et al. 2014; Mayr C 2017; Lee and Mayr 2019; Xu et al. 2019). In relation to the
3’UTR based tree, we present 3’UTR sequences that appear to be higher-level taxon- (“order-“,
“family-“, “genus-“) specific.
In summary, we constructed several family-level phylogenies through an unbiased approach
in which we included molecular sequences derived from transcriptomes or their genomic orthologs
in a concatenation bioinformatics procedure including all non-passerine families. One of these
phylogenies, the 3’UTR-tree, for the first time shows a stable and highly significant relationship
between all avian orders and their respective non-passerine and passerine family-level taxa.

Results

The non-coding 3’UTR sequences yield a stable molecular tree of avian family-level taxa. The
transcriptomes of 308 species were assembled de novo, clustered, and integrated with publicly
available transcriptomes (n = 80) and orthologous sequences derived from available genomic data
(n = 59 bird species; n = 2 alligator species) and newly generated genomic data (n = 7) (Tab. S1,
Fig. S1). The new genome assemblies provided in this study were sequenced to about 60-fold,
which, although resulting in fragmented genome assemblies (N50 contig size of 10 to 40 kbp, see
Table 2C), were sufficient for whole genome alignment and phylogenetic tree inference.
4
We performed several tests to estimate the quality of each de novo transcriptome assembly
(Tab. S2). In summary, the median N50 transcript length of the transcriptomes was 2698 ± 811 bp
(mean ± SD) and the median of complete BUSCO genes (Aves data set) was 53.5% ± 19.4% (mean
± SD). The median number of nucleotides aligned to the reference genome’s 3’UTRs and coding
regions were 7.9 ± 3.6 Mbp and 9.0 ± 3.7 Mbp, respectively (Tab. S2). We found differences
between tissue types used for RNA extraction. Transcriptomes from brain exhibited highest
numbers of nucleotides aligned to 3’UTR and CDS of the reference genome (13.6 Mbp and

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


14.0 Mbp, respectively) (Fig. S2A; Tab. S2). Miscellaneous tissue types that were termed “body”
were neither brain nor blood samples and mainly represent samples from museum specimens.
Statistically, the transcriptome size of brain tissue was similar to skin tissue, while blood, liver, skin
and muscle tissue had similar transcriptome sizes (see Fig. S2A , Tab. S2D for statistical data).
Since brain samples were scattered across 11 orders and since all but four families (four oscines)
were represented by multiple tissues, it is highly unlikely that tissue type affected the phylogenies.
In particular, due to the use of a gappiness criteria (see below) and the necessary alignment in a
phylogenetically basal species (the ostrich, Struthio camelus), transcripts that are expressed in only
some species or higher-level taxa or are expressed in only certain tissue types were filtered out of
the data set used for the calculation of the phylogenetic trees. The aligned portions of tissue specific
transcriptomes remaining after applying the gappiness criteria of 90 - 110 missing samples were
very similar in size (Tab. S2E; Fig. S2B for statistical data).
The 452 avian assemblies, which represent 429 species (389 genera) of 221 avian family-
level taxa (214 recognized by IOU) and all 35 currently recognized orders were aligned to the
canary reference genome (Fig. S1) and divided into coding and non-coding sequences (intergenic,
intronic, 5’UTR, 3’UTR). Due to the nature of transcriptomes, 3’UTRs represented more than 90%
of all sequences in our non-coding multiple alignments and provided numbers of sequences
comparable to the codon aligned coding sequences alignments. Thus, we compared molecular trees
based on non-coding sequences (3’UTR), on coding sequences (CODON), on the first and second
nucleotide of a codon removing the highly variable third position of the codons (CODON12), and
based on the corresponding translated amino acid sequence (AAS).
Of these trees, the concatenated 3’UTR alignment delivered the most stable tree topologies
of the included species based upon repeated subsampling of molecular data (different cut-off levels
of allowed-missing data in alignment columns) and repeated tree calculations using different
starting trees (Fig. 1A, 1B). 3’UTR sequences either derived from trancriptomes and few genomes
or derived only from transcriptomes resulted in a much higher congruency of repeated tree
calculations with varying starting trees as compared to CDS (AAS, CODON, CODON12)
sequences (Fig. 1A). Besides separation of different sequence types (Fig. 1A; Fig. S5), we found
5
that it is highly important to identify a suitable trade-off for allowed-missing data and total
alignment length. This is especially true for transcriptome datasets in which some mRNAs were not
detected or not expressed, so that not allowing gaps would result in losing most of the aligned data.
For repeated tree inferences from alignments with distinct gappiness (defined as: filtering alignment
patterns with a maximum allowed amount of gap characters = gap cut-off), tree convergence
reached an optimum between 90 to 110 missing samples (Fig. 1B, green line). Fewer fluctuations
were observed in that gappiness range (Fig. 1B, red line) when comparing the congruency of

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


average trees of “neighbouring alignments” [average tree (gap cut-offn-1) versus average tree (gap
cut-offn)]. Interestingly, the rate of change of average per site likelihood scores between distinct
alignments predicted an optimal gapiness quite well (Fig. 1B, blue line minimum at gap cut-off 100
[100 gappiness]), and it was computationally very efficient to calculate since it required just a single
inference instead of multiple inferences per alignment. Thus, we used a gap cut-off (gappiness) of
maximal 100 respective 110 gappiness for calculations of phylogenetic trees (see Fig. 2, Fig. 3A,
3B).
In addition to tree inference from concatenated alignments, we also calculated the species
tree using a coalescence approach (Mirarab et al. 2014) including up to 5,127 trees derived from
gene specific 3’UTR sequences. The coalescent 3’UTR based avian species tree resulted in nine
higher-level clades (Fig. S4D). The order-level and family-level relationships of both bioinformatic
procedures converged for some clades, in particular the Galloanserae and the Australaves (Fig. 2;
Fig. S4D). However, other clades of the coalescent species tree, in particular the relationships of
family-level and order-level taxa comprised in the Mirandornithes (Sangster 2005), the Basal
Landbirds, the Aquatic & Semiaquatic Birds of the concatenated tree (clades 4-6 of Fig. 2) were
disarranged in ways (clades 4-7 of Fig. S4D) that are not compatible with morphological and
molecular evidence. We think that this is due to many very short 3’UTR alignments for single
genes, which fail to generate 3’UTR gene trees of sufficient quality to be used in the coalescent
approach. This shortcoming of the coalescent approach in combination with 3‘UTRs might be
solved in future by statistical binning methods. From these data, we argue that a concatenation
based inference method is better suited than a coalescent approach to resolve the entire avian tree of
life using 3’UTR sequences. The coalescent approach with coding sequence data was not
considered, since it did not deliver a comprehensive tree before (Jarvis et al. 2014).
In summary, by careful selection of sequence datatype (3’UTR) and content of missing data
(gappiness) we found identical best log-likelihood scoring tree topologies, if using fast approximate
or exhaustive tree inference methods and the concatenation method. The relatively conservative
SH-aLRT branch support (values >80 are typically considered as high support) was maximal (SH-
aLRT = 100) for most of the branches (98.2% in case of 110 gappiness; 97.1% in case of 100
6
gappiness) and was fully supported by ultrafast bootstrapping (UFBS = 100) for 93.6% of the
branching points (Fig. 3; Fig. 3A, 3B). In the few cases were SH-aLRT support was below 80 (n = 2
[0.5%] for 110 gappiness; n = 7 [1.7%] for 100 gappiness), UFBS still reached good values of
support in the range of 86 – 99 (Fig. 2; Fig. 3A, 3B). Only the branch leading to the Strigiformes in
the Higher Landbirds had weaker support values for both SH-aLRT (88 for 100 gappiness, 32 for
110 gappiness) and UFBS (32). Concerning well-established clades, such as the Charadriiformes
and the Piciformes, our 3’UTR family-level topologies agreed entirely or largely with previous

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


studies that focused on these clades (e.g. Baker et al. 2007; Winkler 2015). At the lower taxonomic
level, in each case in which several species of a well-known family were sequenced, these species
were correctly assigned to that family (e.g. Tinamidae, Trochilidae, Picidae, Paridae, Estrildidae,
Fringillidae; Fig. S6). Similarly, species of the same genus were in each case correctly assigned to
the expected genus in the 3’UTR tree (e.g. genus Podiceps of the Podicipediformes, Columba of
Columbiformes, Charadrius of Charadriiformes, Falco of Falconiformes, Uraeginthus and
Amadina of Passeriformes; Fig. S6). In all these cases, statistical support was 100%.
We therefore conclude that, given sufficiently large datasets of non-coding 3’UTR sequences
in terms of number of taxa and alignment length (Fig. 1A, 1B), RAxML’s fast approximate method
enables computationally efficient phylogenomics, even for difficult-to-resolve phylogenies such as
the avian tree of life (Fig. 2; Fig. 3A, 3B; Fig. S6).

Comparison of the 3’UTR based tree and trees based on coding sequences. The relationships of
many higher-level taxa in our 3’UTR tree differed from those of the coding sequence trees (Fig.
S4A-S4C). In particular, the coding trees resulted in unlikely relationships of certain higher-level
taxa and did not support monophyly of several currently recognized higher-level taxa. For example,
in the CODON tree, the Caprimulgiformes were split into distantly related sub-groups, and parrots
(Psittaciformes) were moved away from the Passeriformes, which resulted as the sister taxon of the
mousebirds (Coliiformes) (Fig. S4A). In the CODON, CODON12 and the AAS trees the
Falconiformes and Cariamiformes were moved away from the Passeriformes and Psittaciformes to
form an assemblage of birds of prey, embedded deeply in the phylogeny (Fig. 2, Fig. S4A, S4B). In
the AAS tree, even the Strigiformes were enclosed in the birds of prey assemblage (Fig. S4C). By
contrast, in all recent molecular approaches (e.g. Kimball et al. 2013; Jarvis et al. 2014; Prum et al.
2015) including our 3’UTR tree (Fig. 2, Fig. 3A), Passeriformes, Psittaciformes and Falconiformes
are closely related and part of the taxon Australaves (Ericson 2012). This latter clade obtained
strong support from a previous molecular phylogeny (Suh et al. 2011) and the sister group
relationship of Psittaciformes and Passeriformes also conforms with paleontological data (Mayr G
2017). In the CODON12 and AAS tree, the Coliiformes were the closest relatives of the
7
Passeriformes and Psittaciformes (Fig. S4B, S4C). Although there is some anatomical support for
such a relationship (Berman and Raikow 1982), the Coliiformes were grouped with the
Trogoniformes and other Higher Landbird clades in a phylogeny that was based on an analysis of a
large number of morphological characters (Livezey and Zusi 2007). The type of sequences (non-
coding versus coding sequences) for tree construction had no impact on the relationships of higher-
level taxa at the base of the trees, i.e. on the basal position of Palaeognathae and Galloanserae (Fig.
S4), even though noncoding and coding trees differed in the interrelationships of some taxa within

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


the Palaeognathae (data not shown).
Mixed sequence data that contained defined amounts of coding and 3’UTR sequences to
analyze the influence of mixing 3’UTR and CDS on the resulting tree topology resulted in
topologies that are different from both pure CDS and pure 3’UTR topologies (Fig. 1C).
Interestingly, adding relatively low amounts (e.g. 20%) of 3’UTR to CDS had a strong impact on
the resulting topologies, while adding low amounts of CDS (e.g. 20%) to the 3’UTR had much
lower impact on the resulting tree (Fig. 1C). 3’UTR sequences seem to contain a stronger
phylogenetic signal than CDS. Thus, mixing the percentage of 3’UTR and CDS in evolutionary
models (Jarvis et al. 2014) is an arbitrary procedure, since there is no linear relation between the
amount of either sequence type and the 100% model for 3’UTRs and CDS, respectively (Fig. 1C).
The difference of our 3’UTR tree as compared to trees derived from combined coding and
noncoding sequence data is due to sequence type and taxa sampling (Fig. S5). These results agree
with the conclusions drawn of a re-analysis of previous molecular phylogenies (Jarvis et al, 2014;
Prum et al., 2015) by Reddy et al (2017).

In summary, at the level of composition of higher taxa, the CODON tree (Fig. S4A), the
CODON12 tree (Fig. S4B), and the AAS tree (Fig. S4C) differed considerably from the 3’UTR tree
(Fig. 2) and from currently accepted relationships of avian higher-level taxa.

The 3’UTRs contain motifs specific for higher-level and lower-level clades. To identify signals
specific to higher-level clades that are present in 3’UTRs we compared such sequences from the
Caprimulgiformes, Charadriiformes, and selected subclades of the Passeriformes. The analysis of
putative binding sites of RNA binding proteins and of micro RNAs did not show taxon specific
pattern. However, we found that the presence of putative transcription factor binding sites (TFBS)
differed between higher-level clades, and between clades within the Passeriformes (Fig. 1D): Z-
score analysis of the abundance of TFBS in 3’UTRs of 97 randomly selected transcribed genes
showed high similarity between the closely related estrildid and fringillid songbird families (both in
clade OHC10B), lower similarity between estrildid species and species of basal songbird families
(in clade OHC1-OHC3), and even lower similarity with charadriiform and caprimulgiform species,
8
as expected from their phylogenetic distance (Fig. 1D; see Supplemental Information for discussion
of oscine higher-level clades (named OHCs)).
Furthermore, we analysed the pattern of TFBS in detail within family-level taxa of which we
had multiple species belonging to at least three genera. As an example, we present the pattern of
TFBS in the 3’UTR of the gene EMC1 of the Spheniscidae, the penguins (Fig. S7). The presence of
TFBS in that 3’UTR shows a family specific signature (Fig. S7A) as well genus-specific signatures
for each of the three included genera, that is, Aptenodytes (Fig. S7B), Eudyptes (Fig. S7C) and

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Pygoscelis (Fig. S7D). The combinatorial pattern of the TFBS distinguished the genera.
Comparable results were obtained for songbird families (Estrildidae, Fringillidae) of which we had
multiple species of three genera (data not shown). Obviously, a complete representation of all 19
penguin species would be desirable to further solidify that result. Although we did not analyze the
TFBS in the 3’UTRs of all taxa included in our study, the analysis nevertheless suggests that the
presence of the TFBS is taxon-specific at the order, family and genus level. These taxon-specific
sequences, likely, represent the evolutionary signals extracted by our bioinformatic procedure used
to construct the 3’UTR avian tree of life.

The higher-level (order-level) avian tree of life (Fig. 2). The 3’UTR based tree of life resolved the
relationship of all avian orders including the Opisthocomiformes (hoatzins) with good statistical
support. In that phylogeny, extant birds fall into 7 major clades (Fig. 2). Clade 1 represents the
Palaeognathae and Clades 2-7 encompass the Neognathae, which are subdivided into the
Galloanserae (landfowl and waterfowl; Clade 2) and the Neoaves (Clades 3-7) (Fig. 2). Among the
Neoaves, Clade 3 includes the Mirandornithes, the flamingos and grebes, Clade 4 represents the
“Basal Landbirds”, Clade 5 encompasses the “Aquatic and Semiaquatic Birds”, Clade 6 is the
“Higher Landbird Clade”, and Clade 7 represents the Australaves (Ericson 2012; Kimball et al.
2013) (Fig. 2). Four of the 35 order-level relationships were sensitive to the amount of data: if we
decreased the gappiness from 110 to 100 missing samples, the statistical support values of this four
branching points dropped from {99%, 99%, 93%, 89%} support to {88%, 32%, 88%, 83%}
support, but stayed 100% for all other branching points (Fig. 2). In particular, the relationship of the
Strigiformes (SH-aLRT: 88 respectively 32; UFBS: 69) requires further attention.
It should be noted that the composition and interrelationships of Clades 3-6 differ
substantially from previous phylogenies (Hackett et al. 2008; Jetz et al. 2012; Jarvis et al. 2014;
Prum et al. 2015) as discussed below for family-level taxa. Thus, trivial names (Basal Landbirds,
Higher Landbirds, Aquatic & Semiaquatic Birds) used in previous publications and in the current
paper comprise different sets of bird order- and family-level taxa. Nevertheless, the interfamilial
9
relationships within some higher-level subclades were similar between the present study and
previous reports (Prum et al. 2015; Jarvis et al. 2014) (see below).
In the Neoaves (Clades 3-7; Fig. 3A), the Mirandornithes (Clade 3; Fig. 3A), are the sister
taxon of all other taxa, which is in contrast to all previous molecular trees (e.g. Ericson et al. 2006;
Hackett et al. 2008; Jarvis et al. 2014; Prum et al. 2015). In previous works, either a clade
composed of Mirandornithes, Pterocliformes, Mesitornithiformes and Columbiformes (Jarvis et al.
2014) or the Charadriiformes (Prum et al. 2015) were suggested to be the sister group of all other

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Neoaves. A close relationship between Phoenicopteriformes (flamingos) and the Podicipediformes
(grebes) is also supported by morphological data (Mayr et al. 2004).

The avian family-level tree of life. We included all 106 currently recognized non-passerine
families and 90% (115) of the passerine family-level clades, which significantly increased the taxon
sampling compared to previous comprehensive phylogenies (Hackett et al. 2008: 93 non-passerine,
24 passerine family-level clades; Jarvis et al. 2014: 39 non-passerine, 2 passerine family-level
clades; Prum et al. 2015: 91 non-passerine, 31 passerine family-level clades; Oliveros et al. 2019: 5
non-passerine, 125 passerine family-level clades). Since adding families impacts the entire tree, a
phylogeny missing many family-level taxa is unlikely to maintain its higher-level topology, if
further families were included in the phylogenetic analysis. Due to the low amount of sequence data
included in the species-rich study of Hackett et al. (2008) and due to the low number of families in
the sequence-rich study of Jarvis et al. (2014), we restrict the following comparisons mainly to the
Prum et al. (2015) study for non-passerines and to the Oliveros et al. (2019) phylogeny for
passerines. For more detailed considerations of the interfamilial relationships than those presented
below we refer to the Supplemental Discussion.
Within the Palaeognathae (Clade 1) and the Galloanserae (Clade 2), the interrelationships of
the family-level taxa (Fig. 3A) agree with the phylogeny of Prum et al. (2015). Interestingly, among
the Palaeognathae, these relationships differ from those reported on the basis of conserved
noncoding elements and a coalescent inference procedure, in which rheas are the sister group of
cassowaries, emus and kiwis (Sackton et al. 2019). The differences are due to the tree inference
procedure (see Supplementary Information). Furthermore, we confirm the interrelationships of the
seabird subclade of the water bird group of Prum et al. (2015), here informally named Aquatic &
Semiaquatic Birds (Clade 5), but not its sister group relationship to the Caprimulgiformes and
Mirandornithes (Fig. 3A).
A major difference to the higher-level clades recognized before (Prum et al. 2015) concerns
the Clade 4 (Fig. 3A) of our tree, the Basal Landbirds, which comprises two subclades. One of these
(Fig. 3A, Clade 4A) includes Charadriiformes (shorebirds and allies) and Gruiformes (cranes and
10
allies), whereas the other subclade (Fig. 3A, Clade 4B) unites the Musophagiformes (turacos),
Otidiformes (bustards), Mesitornithiformes (mesites), Pterocliformes (sandgrouse), Columbiformes
(doves) and Cuculiformes (cuckoos) on the one hand, and Opisthocomiformes (hoatzins) and
Caprimulgiformes on the other. In the phylogeny of Prum et al. (2015), by contrast, the early
diverging landbirds were subdivided in three higher-level clades and Charadriiformes resulted in a
clade that also contained the aquatic and semiaquatic birds. Furthermore, the interrelationships of
Columbiformes, Cuculiformes, Otidiformes, Musophagiformes, Mesitornithiformes and

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Pterocliformes within the Basal Landbirds differ from previous works (Hackett et al. 2008; Prum et
al. 2015). In further contrast to the results of Prum et al. (2015), the Caprimulgiformes are not the
sister group of all other Neoaves, but are the sister group of the Opisthocomiformes (hoatzin) in our
study. Previous works suggested closest relationship of the hoatzin with varying groups, such as
Pelecaniformes (Hackett et al. 2008), Gruiformes and Charadriiformes (Jarvis et al. 2014), or placed
it in an isolated clade somewhere between aquatic and semiaquatic birds and birds of prey (Prum et
al. 2015).
The interrelationships of the taxa of our Higher Landbirds (Clade 6) agree with the tree
topology of Prum et al. (2015) and correspond to the Afroaves group of Jarvis et al. (2014) –
although the latter study missed many of the included families. Clade 7 (Fig. 3A) consists of the
Australaves (Suh et al. 2011; Hackett et al. 2008) that include the seriemas (Cariamiformes), the
falcons (Falconiformes), the parrots (Psittaciformes) and the passerines (Passeriformes), which is
also consistent with previous phylogenetic results (Hackett et al. 2008; Suh et al. 2011, Jarvis et al.
2014; Prum et al. 2015).
In the Passeriformes, the 92-103 studied (the count depends on the classification of IOU or
HBW, see Methods) family-level taxa of songbirds (Oscines) fall into ten higher-level clades (Fig.
3B; Fig. S6; Supplemental Information). The earliest divergence (Fig. 3) is represented by the
lyrebirds (Menuridae) and the evolutionarily youngest clades include most taxa previously
summarized as the Passerida (Sibley and Ahlquist 1990; Barker et al. 2004). The present passerine
phylogeny differs widely from all previous ones in the composition and interrelationships of the
oscine higher-level clades (OHCs; Fig. 3) (e.g. Barker et al. 2004; Aggerbeck et al. 2014), with the
exception of the recent study of Oliveros et al. (2019), which was based on ultra-conserved
molecular elements. Since the Oliveros-tree of oscine family-level taxa (Oliveros et al. 2019) and
that part of our tree are very similar, these trees might converge on the true phylogeny of songbird
families. The minor differences between the passerine part of the present tree and the Oliveros-tree
might be due to the use of different genera. Despite low SH-aLRT support values, the corvid clades
had high UFBS support (85-99) (Fig. 3B); the future addition of further genera, in particular of the
corvid lineages, might solve the ambiguities between the Oliveros-tree and the present tree. The
11
strong differences to other passerine trees (e.g. Barker et al. 2004; Jetz et al. 2012; Aggerbeck et al.
2014; Claramunt and Cracraft 2015) are likely due to their lower sampling of families, the overall
number of sequences analysed, and/or the use of mitochondrial and nuclear coding sequences.
Further details of interfamilial relationships of oscine families are discussed in the Supplemental
Information. Here we just mention that our data support the removal of the taxon Hylia from the
Scotocercidae into its own family-level taxon Hyliidae (Bates 1930; Fregin et al. 2012) and that the
the split of Erythrocercidae, Scotocercidae and Cettiidae altogether needs reconsideration (Fig. S3,

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Fig. S6 and Supplemental Information).

Time calibration of the family-level phylogeny. We used DPPDiv (Heath et al. 2012) for time
calibration of our family-level phylogeny. DPPDiv uses the dirichlet process prior (DPP) model or
the uncorrelated gamma-distributed rates (UGR) model. Although these two models yielded broadly
congruent divergence dates for many clades (difference between models: 2.4 ± 4.7 My [mean ±
SD]), they show various differences in detail and none of the results is entirely congruent with the
fossil record (Tab. S4). In general, the UGR model (Fig. 2 & 3) led to divergence times of families
that showed less conflict with time-calibrated fossil data as compared to the use of the DPP model.
E.g., the estimated divergence time of Galliformes and Anseriformes of 62.5 Mya fits well with a
recently reported Mesozoic fossil (66.7 Mya) that is close to the last common ancestor of
Galloanserae (Field et al., 2020) (Tab. S4). Furthermore, for phasianine and odontophorine
Galliformes, the calculated divergence time is 37 Mya, while the earliest record of a galliform
belonging to the clade Odontophorinae + Phasianinae, the taxon Palaeortyx, stems from the early
Oligocene, some 32 Mya (Million years ago) (Mayr G 2017; Zelenkov 2019). The divergence dates
of the UGR model also conform with the fossil record of crown group Procellariiformes,
Gruiformes and Accipitriformes, with fossils of the procellariiform Diomedeidae, the gruiform
Rallidae and the accipitriform Pandioninae having been described from the early Oligocene, some
32-34 Mya (Mayr G 2009; Mayr G 2017). For Mirandornithes, by contrast, the calculated
divergence date of 46 Mya for Podicipediformes and Phoenicopteriformes distinctly predates their
earliest known fossils, the earliest fossil Podicipediformes being from the late Oligocene/earliest
Miocene (about 20 Mya), and the earliest Phoenicopteriformes being from the early Oligocene (32
Mya; Mayr G 2017) (Tab. S4). This suggests substantial ghost lineages for both Podicipediformes
and Phoenicopteriformes. However, most calculated branching points come with large confidence
intervals (Fig. 3A, Fig. S6). Nevertheless, the overall disparity of fossil and molecular age
determinations is rather low, being between 9-11 My, if all fossil data are considered (Tab. S4).
These discrepancies are likely due to either (1) the limited fossil record of certain clades (Mayr G
2017), (2) the existence of clades with a single extant species, which does not allow molecular
12
dating of the diversification of the crown group (van Tuinen et al. 2006), (3) the limited species
sampling and large confidence intervals of some molecular age determinations.
The time-calibrated phylogeny (Fig. 2, Fig. 3A & 3B, Fig. S6) shows divergence dates for
Palaeognathae and Neognathae (94 Mya) and Galloanserae and Neoaves (81 Mya) that are much
earlier than those suggested previously (Prum et al. 2015). The divergence dates of the
Mirandornithes and those of the Basal Landbirds and of the Aquatic & Semiaquatic Bird lineages
precede the K-Pg boundary (Fig. 2). The initial divergences within many other neoavian lineages

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


occurred about ten My (Million years) after the K-Pg boundary, in the Eocene. Although an early
Cenozoic neoavian radiation is strongly supported by fossil data (Ksepka et al. 2017; Mayr G 2014,
2017), the confidence intervals of our molecular divergence times (Fig. 3A) do not exclude the
possibility that some Neoaves lineages evolved before the K-Pg mass extinction as suggested before
in molecular phylogenies (Houde et al., 2020; Suh et al., 2015). This is also suggested by the
occurrence in the earliest Cenozoic of stem group representatives of various only distantly related
and deeply neoavian taxa, such as penguins and owls (Mayr G 2014). During the Oligocene epoch,
a second major diversification event occurred, which concerned both non-passerine and passerine
families (50 families of 12 orders), the highest diversification rate of new family-level clades (3.0
for non-passerine and 2.0 for passerine family-level clades/My) taking place between 35-25 Mya
during the Rupelian and Chattian stages of the Oligocene (Fig. 4). This pattern of divergence times
is reminiscent to that previously suggested by van Tuinen et al. (2006).
The third major diversification event concerned mainly passerine family-level clades, having
a peak 25-15 Mya in the Aquitanian and Burdigalian stages of the early Miocene (1.6 for non-
passerine, 7.1 for passerine families/My) (Fig. 4). This passerine diversification during the early
Miocene (71 new family-level passerine clades; for diversification times see Fig. 3, Fig. S6)
contrasts with the diversification of non-passerine taxa (16 new family-level clades; for
diversification times see Fig. 3, Fig. S6) (Fig. 4). Since the middle Miocene (15 Mya), 22 extant
passerine family-level clades (all of the oscine passerines), but only two extant non-passerine
family-level clades, both of the Charadriiformes, evolved. The diversification times of oscine
family-level taxa agree with those estimated by Oliveros et al. (2019), but are much later than those
calculated in previous studies using coding sequences (Ericson et al. 2014). Thus, although our
phylogeny includes only passerines clades that survived until today, there may have been a strong
negative impact of the passerine radiation on the evolution of new clades in most other higher-level
(ordinal) bird taxa. Alternatively, non-passerine family-level taxa might have radiated earlier than
the oscines families and achieved optimal family-level diversity before the Miocene. Remarkably,
however, family-level taxa that underwent speciation during the last ten My are as common among
oscine passerines as they are among non-passerines: Of family-level taxa that were represented with
13
more than one species in our study, 54% (22 of 41) of the non-passerine and 58% (21 of 36) of the
passerine clades underwent considerable diversification (Fig. S6).
The overall reduced number of family-level taxa during the last ten My (Fig. 3, Fig. 4) may
be due to the fact that this comparatively short time interval did not allow for the accumulation of
much morphological disparity; many family-level clades that evolved during that interval were
distinguished based on molecular differences.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Discussion

By including representatives of all non-passerine families and most passerine families, we show (1)
that the molecular tree based on 3’UTRs is, in bioinformatical terms, the most stable tree as
compared to trees computed from coding sequences and (2) that the 3’UTR tree resolves the higher-
level relationships of all included taxa without any ad-hoc assumptions such as the selection of
certain genes (Prum et al. 2015), or the arbitrary combination of coding and non-coding sequences
(Jarvis et al. 2014). (3) The tree-building capacity of 3’UTRs reflects a strong phylogenetic signal,
which might be related to the presence of transcription factor binding site (TFBS) motifs in the
3’UTRs. (4) Our phylogeny suggests that the avian tree of life can be resolved using a moderate
amount of sequencing data derived from transcriptomes. This avoids specialist knowledge for
assembling entire genomes as well as high bioinformatics costs of comparing large numbers of
large genomes. (5) The resulting 3’UTR-tree shows a well-resolved topology including all avian
order-level taxa, while dividing the Neoaves into five major clades that differ from previous
phylogenies. (6) The Mirandornithes (flamingos and grebes) are the sister group of all Neoaves, and
the hoatzin, a previous phylogenetic enigma, is shown to be closely related to the
Caprimulgiformes. (7) The negative correlation in the temporal diversification of passerine and
non-passerine family-level clades might be due to the vocal learning capacity of oscine passerines
(see Fig. 4).

Are 3’UTRs ideal for molecular tree building? The increased length and the evolution of alternative
3’UTRs, as they are observed in vertebrates, the amount and type of TFBS, as well as protein
binding sites are expected to increase the complexity of species-specific tissue-specific gene
expression regulation (Sandberg et al. 2008; Lianoglou et al. 2013; Cohen et al. 2014; Mayr G
2017; Lee and Mayr 2019; Xu et al. 2019). Here, we demonstrate that 3’UTRs based molecular
trees resolve the avian tree of life with good statistical support throughout. On one hand, the taxon-
specific presence of TFBS in 3’UTRs might just be seen as an indicator of conserved sequences
with yet unknown function. On the other hand, there are increasing observations of a functional role
14
of transcription factor binding to 3’UTRs for transcriptional and post-transcriptional processes.
Regarding the transcriptional role, a simultaneous binding of transcription factors to 5’UTR and
3’UTR has been shown (Tan Wong et al. 2012; Sun et al. 2018; Jash et al. 2012). This suggests that
transcription factors may mediate intra- and inter-molecular loop interactions bringing structurally
together promoter and terminator, which would ensure the RNA polymerase to reload on the
promoter efficiently (Tan Wong et al. 2012; Sun et al. 2018; Jash et al. 2012). Furthermore,
transcription factors bind to RNA; e.g., Wilm’s tumor 1 regulates RNAs through binding to the

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


3’UTR (Bharathavikru et al. 2017) and binding of SOX9 to 3’UTR was associated with post-
transcriptional processes (Penrad-Mobayed et al. 2018).
Tugrŭl et al. (2015) used a biophysical model for directional selection on gene expression to
estimate the rates of gain and loss of TFBSs. They showed that multiple TFBSs can evolve
simultaneously allowing a biophysical cooperativity between transcription factors, and that the
presence of pre-sites for transcription factor binding would facilitate the gain of TFBS. Here, we
show the first evidence that links the enrichment of TFBS or TF pre-sites to systematic differences
between taxa. Thus, even if there is little difference in the repertoire of protein-coding genes
between species, the evolutionary divergence of 3’UTRs is suggested to be an exceptionally
important mechanism for rapid evolution, such as the speciation of cichlid fishes, through increased
regulatory complexity of area-specific gene expression (Xiong et al. 2018). Such a scenario might
be present as well in birds. In relation to this, most genes thought to be songbird-specific (Lovell et
al. 2014) or parrot-specific (Wirthlin et al. 2018) are detectable in improved genomes by including
extensive transcriptome data (Yin et al. 2019). Thus, in contrast to the common repertoire of avian
genes, our data show strong differences in 3’UTR sequences between avian orders, between
families within orders, and between genera of a family (Fig.1D). Since these sequence differences
concern the presence of TFBS motifs, 3’UTRs contain, potentially, an evolutionary signal of
speciation.

Whether the resolution of the avian–level tree of life is due to particular features of 3’UTRs and
their potential importance for avian speciation, or whether it might also be achieved with other
types of non-coding sequences, is open for discussion. Due to the short length of the 5’UTR and to
the nature of transcription, the number of 5’UTR and of intronic sequences in our data were too few
to allow testing of whether these sequences also contain enough evolutionary signal to properly
resolve the tree avian tree of life. Ultra-conserved elements appear to yield a well-resolved
phylogenetic tree of passerine family-level taxa (Oliveros et al. 2019), very similar to the passerine
part of our tree. However, it should be noted that the coding sequences, too, delivered a passerine
part of the avian tree that had many similarities with the 3’UTR tree despite the strong differences
15
concerning the interrelationships of non-passerine orders (data not shown). Furthermore, there are
certain differences between Oliveros-tree (Oliveros et al. 2019) and the passerine part of our 3’UTR
tree, which might be due to the taxa used or the different data types. Thus, we need to see, whether
coding sequences, ultra-conserved elements and 3’UTRs converge on the same phylogenetic tree of
passerine families in case that enough taxa and sequences are included in the tree calculation.
Retroposons are another type of sequence that might resolve the avian phylogeny. However, the
techniques involved in their study require very large genomic data sets. Even though analyses of

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


retroposon insertions of bird genomes provided important new insights (Suh et al. 2011), they could
not fully resolve the avian phylogenetic tree (Suh et al. 2015).
Interestingly, there are many similarities between nuclear sequence, or mitochondrial
sequence derived phylogenetic trees and the non-coding sequence derived tree within certain orders
and within families while nuclear and mitochondrial data types seem to fail for data sets spanning
many orders (Pacheco et al. 2011; Jarvis et al. 2014; Prum et al. 2015). Thus, multiple types of
molecular sequences including mitochondrial sequences or nuclear sequences might resolve taxon
relationships locally (e.g. within families), while the global resolution of the avian tree of life might
require particular non-coding sequences such as 3’UTRs (this study) and ultra-conserved elements
(Oliveros et al. 2019).

Why are trees based on 3’UTR sequences different from those based on coding sequences?
Molecular phylogenetic trees based on bioinformatics tools clearly deliver different trees based on
the function of the sequences used, as has been demonstrated before (Jarvis et al. 2014; Reddy et al.
2017). Trees based on coding sequences (CODON, CODON12, AAS) and non-coding based trees
(3’UTR) are expected to be different due to species-specific selection pressures that favour removal
of single nucleotide mutations of coding DNA as compared to non-coding sequences. In contrast,
mutations in the 3’UTR would only affect certain regulatory elements and in consequence might
affect gene expression in some tissues (Mayr C 2016; Mayr C 2017), but would unlikely stop
protein expression body-wide, as might occur in case of mutations in the coding sequences. Thus,
similar selection pressures due to similar environmental conditions might favour convergent
developments in protein coding sequences of distantly related species. Such an example might
concern the birds of prey that were grouped together in one assemblage in the coding sequence trees
(Fig. S4). However, in the case of vocal learning, another rare avian phenotype, which is present in
the hummingbirds, parrots and passerines (Jarvis et al. 2000; Gahr 2000), these three taxa were not
grouped together in the coding trees (Fig. S4).
16
Molecular sequences and anatomy based trees. Concerning some clades, molecular phylogenies, in
particular those spanning the entire class of birds (e.g. this study; Prum et al. 2015; Jarvis et al.
2014), are substantially different from phylogenies derived from anatomical data (e.g., Livezey and
Zusi 2007). Since the present phylogeny includes all non-passerine families, the differences
between anatomical and molecular trees are unlikely to be due to different taxonomic samplings but
due to the type of data analyzed. Clearly, the relationships of certain higher-level clades in the
molecular phylogenies, such as the relationship of tropicbirds, kagus and sunbitterns and their

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


proximity to the Aquatic & Semiaquatic Birds, are unexpected and ask for new ontogenetic and
morphological studies in order to assess the anatomical plausibility of these findings. In some cases,
other suprising clades derived from analyses of molecular data have already been confirmed by
morphological data, such as the sister group relationship between the morphologically and
behaviorally very disparate grebes and flamingos (Van Tuinen et al. 2001; Mayr G, 2004). More
recently, it was also hypothesized that the plesiomorphic presence of a large lacrimal bone may
support the basal position of the Caprimulgidae within Caprimulgiformes, with this bone being
reduced in other Caprimulgiformes (Chen et al. 2019; contra Mayr G, 2010). Much future
anatomical work is, however, needed for an improved integrative understanding of avian phylogeny
beyond insights derived from molecular sequence data.

Implications of the evolution of vocal learning for the avian tree of life. The peaks of family-level
diversification during the evolution of birds may have been caused by drastic changes of
macroecological niches due to events such as global cooling, the related drop in sea levels and thus
increased connectivity between landmasses and reduced CO2 levels that favoured the spread of
grassland or the desiccation of landmasses (Zachos et al. 2001). Opposite scenarios of
macroecological changes exist for global warming (Zachos et al. 2001). There are, however, no
clear catastrophic or macroecological events except for the progressing global cooling that parallels
the massive passerine radiation in the late Oligocene and the Miocene (Hansen et al. 2013). A recent
paper studying the evolution of the Passeriformes suggested that more complex mechanisms than
temperature change or vacant ecological niches are responsible for passerine radiation events
(Oliveros et al. 2019). Whatever the scenarios may have been, the significant radiation of oscine
passerine family-level taxa since the Miocene strongly contrasts with the subdued diversification of
new non-passerine clades among most arboreal birds and of suboscine passerine family-level taxa
(Fig. 3B, Fig. 4).
The hallmark of songbirds is their singing behavior, which is important for mate choice and
territorial defense. A key-feature of songbird singing behavior is that the songs are learned (Goller
and Shizuka, 2018). Thus, we discuss, whether the evolution of vocal learning contributes to the
17
extraordinary success of songbirds (comprising about half of all avian family-level taxa and species
[ca 4500 species]) and the attenuation of the evolution of non-passerine families in the last twenty
My. Vocal production learning of males occurs in songbird families (suborder oscines of the
Passeriformes), parrot families (Psittaciformes) and in the hummingbird family (Trochilidae of the
Caprimulgiformes) (Baptista and Schuchmann 1990; Cruickshank et al. 1993; Gahr 2000; Jarvis et
al. 2000). The statistical analysis of the distribution of avian families that utter learned vocalizations
showed that vocal production learning evolved three times independently (Fig. S8), as hypothesized

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


previously (Jarvis et al. 2014). The alternative scenario, that vocal learning evolved twice in the
hummingbirds and in the common ancestor of parrots and passerines and was lost twice in the New
Zealand wrens and suboscine passerines, is not favoured by the statistical models (Fig. S8).
Among all family-level taxa of songbirds, the songs of males are generally assumed to be
learned whereas short vocalizations, the calls, are generally thought to be innate (Vicario 2004). If
we combine data from the wild (mainly based on anecdotal observations and the observation of
vocal dialects) and from laboratory conditions, species of 70 families show vocal production
learning (Tab. S5). Whether vocal learning occurs in all songbird families needs to be assessed,
since calls might be also learned in some species (Zann 1990) and the distinction between songs and
calls is a species-specific problem. This is particularly relevant for the about 20 family-level taxa
that are made up of only a few species of which the vocalizations are not well-known, such as the
Rhagologidae or Scotocercidae. Nevertheless, even if we assumed that all oscine family-level taxa,
which are currently not known to exhibit vocal learning indeed lack this capacity, the statistical
analysis supported the hypothesis that vocal learning evolved just once among the passerines with
the emergence of the oscines (Fig. S8). In connection with this ancestral origin of song learning
among oscines, males of both the phylogenetically basal lyrebirds and scrubbirds do learn or are
even superb song imitators (Robinson and Curtis 1996; Armstrong 1963).
Vocal (song and call) production learning of songbirds requires the development of the so-
called song control system (Nottebohm et al. 1976), a neural circuit that orchestrates the movements
of the syrinx. The ancestral status of vocal learning suggests the presence of the song control system
in all songbird families. Indeed, the anatomical identification of parts of the song control system in
species of 43 songbird family-level taxa including fairywrens and gerygones (OHC3) at the base of
and true finches (OHC10) at the top of oscine diversification suggests the ancestral evolution of the
song control system among oscines (Tab. S6). Because we found the song system in the brain of
each songbird species that we could study (Tab. S6; Gahr M, unpublished data), we also expect to
find it in lyrebirds and scrubbirds, at the base of the songbird clade. Whether the evolution of
precursors of the oscine forebrain song control system had already occurred in subocines (Liu et al.
18
2013) or whether some suboscine genera convergently developed forebrain song control areas
requires further comparative study.
The negative correlation between the evolution of new songbird families and those of new
suboscine passerine and non-passerine families during the last twenty My suggests a faster
speciation and exploitation of ecological niches of oscine species. This might be due to vocal
learning that is ancestral in oscine passerines (Fig. S8). A comparison of oscine and suboscine
South-American taxa showed that evolutionary bursts in rates of speciation and song evolution

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


coincide (Mason et al. 2017). However, overall rates of vocal evolution are higher among taxa with
learned songs as compared to taxa with innate songs (Mason et al. 2017). Furthermore, sexual
selection of song promotes the capacity of adult song learning (Robinson et al. 2019). Since mate
choice and territoriality are highly dependent on vocal displays in songbirds, vocal production
learning is likely a key-invention for fast speciation and the macroevolutionary pattern of species
richness of the oscines. Vicariance and dispersal, too, likely play a major role in the evolution of
Neotropical avifauna (Smith et al. 2014). Therefore, we suggest that the combination of vocal
learning and dispersal behaviors of songbirds allowed songbirds to attenuate the evolution of non-
vocal learning taxa competing for the same ecological niches ever since songbirds emigrated out of
Australia in the Miocene some 20 Mya (Claramunt and Cracraft 2015; Oliveros et al. 2019).

In summary, we suggest that the 3’UTRs contain significant evolutionary signals that result in true
relationships, if used in unbiased phylogenetic tree solving procedures. Therefore, since we
included all non-passerine family-level taxa, the presented phylogenetic tree shows the relationship
of all these taxa, i.e. of all avian orders. The evolutionary timing of the divergences of family-level
taxa suggest that the strong radiation of oscine passerine family-level taxa attenuated the evolution
of new forms of non-passerine taxa in the last twenty My, a process that might involve the evolution
of the vocal learning behaviour of songbirds.

Material and Methods.

Species and tissue samples (Tab. S1): We produced whole transcriptomes of various tissues,
preferentially brain or blood samples. If possible, brain tissue was used to provide the most complex
transcriptome libraries in terms of number of expressed transcripts. However, for animal protection
reasons, in many cases we used small blood samples or cryopreserved museum specimens (mainly
liver or muscle) that were kindly provided by various collections (Tab. S1).
Different taxonomists recognize somewhat different genera as discrete family-level taxa.
Thus, in 2017, the “Handbook of the Birds of the World [HBW]” recognized 243 families while the
19
“International Ornithological Union [IOU]” recognized 234 families (Del Hoyo and Collar 2014,
2016; Gill and Donsker, 2017). Especially within passerines, the number and identity of recognized
bird families is currently very dynamic and the number of families has increased continuously over
the last 10 years. In 2013, when we started this study, our avian tree of life would have represented
98% (215 of 220) of all IOU families, which is 92% (215 of 234) of all IOU families recognized in
2017 (Gill and Donsker 2013, 2017). Thus, we are missing certain families, because they were only
recognized after our study began. This is primarily due to a split of previous families into several

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


new families, most with just one or very few species (species number of the missing families: 2 ± 1
[mean ± SD] species). We use the definitions of bird “families” and bird “orders” according to the
IOU and HBW lists but use the terms “family-level” and “order-level” to hint to the arbitrary nature
of these higher-level clades, which is reflected in the ever-changing number of avian families
recognized by the above mentioned standard references.
We sequenced RNA of 308 bird species, included published transcriptomes of 80 bird
species, and 3’UTR sequences extracted from genomes of 66 (64 bird and 2 alligator species)
species of which 7 bird species were sequenced by us and the others were publicly available (Tab.
S1). From 27 of these 429 species, we had either two transcriptomes or a transcriptome and a
genome derived sequence (Tab. S1). Of 9 family-level taxa we had only genome-derived data
available for the bioinformatics analysis (the 7 MPIO sequenced genomes, Acanthisittidae, and
Mesitornithidae; Tab. S1). In total, in the construction of the molecular trees we included RNA
sequences, or their orthologous genome-derived sequences of 429 bird species comprising all avian
orders (Tab. S1). Thus, if we consider family-level taxa recognized by the different nomenclatures,
we studied between 209 to 221 bird family-level taxa: 209 of 227 recognized by both IOU and
HBW, 214 of 234 recognized by IOU, 215 of 243 recognized by HBW, and 220 of 250 recognized
by either IOU or HBW. Furthermore, we suggest one additional family not recognized by either
HBW or IOU, the Hyliidae, first suggested by Bates (1930). In the shown family-level trees (Fig. 3,
Fig. S6), all 221 potential families are labelled as such.

RNA preparation and sequencing: Isolation of RNA was carried out using Qiagen RNAeasy Mini
Kits (Cat No. 74106) according to the manufacturer’s instructions following the optional DNAse
digestion step using 20 mg of tissue or 50 μl of blood. Blood samples were processed with Sigma
TRI Reagent BD (T3809) according to manufacturer’s instructions. The RNA was extracted from
the aqueous phase according the protocol of the Qiagen RNAeasy Mini Kit (74106). The RNA
quality was assessed with the Agilent 2100 Bioanalyzer Instrument (Model G2939A, Agilent
Technologies RNA). Concentrations were measured with the Nanodrop 1000 spectrometer (Thermo
Fisher Scientific). About 1µg of total RNA per sample was used to construct RNA sequencing
20
libraries using the Trueseq RNA Sample Preparation Kit, v2 (Illumina Inc., San Diego, CA, USA).
The resulting libraries were barcoded and analyzed on Illumina Hiseq 2500 and HiSeq 4000
systems. The sequencing protocol was set to high output mode with paired end 50 or 75 b reads. We
aimed at an output of 60-100 million reads per sample.

De novo transcript assembly: RNA sequencing short read data was de novo assembled using the

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


IDBA transcriptome assembler version 1.1 (Peng et al. 2012). We used default parameters for the
transcriptome assembly. Assembled transcripts were clustered using cd-hit-test (Fu et al. 2012; Li
and Godzik 2006) to filter for the longest assembled transcript of a cluster of alternatively
spliced/assembled transcripts. The qualities of transcriptomes were measured by basic statistics
(N50 transcript length, total assembled transcript length) as well as BUSCO (gene numbers; default
transcriptome parameters using the aves_odb9 gene set; Simao et al. 2015) and by counting
nucleotide matches of 3’UTR and of CDS with the canary reference genome. The reads of all
sequenced species have been provided to the Sequence Read Archive of NCBI. Transcriptome de
novo assemblies are available at Dryad (doi:10.5061/dryad.ngf1vhhpx).

Genomic data sources: To extend our species list, we extracted the putative transcriptomes (i.e.
sequences homologous to those of our sequenced transcriptomes) from published bird genomes
(Tab. S1). Genome assemblies (57 bird species, 2 alligators) were down-loaded from
NCBI/ENSEMBL repositories. The seven de novo assembled genomes are available at Dryad
(doi:10.5061/dryad.ngf1vhhpx) (Tab. S1).

Transcript and genome multiple alignments to reference genome: The canary (Serinus canaria)
genome was used as reference genome during the subsequent mapping steps of all transcriptomes
(Fig. S1). To construct pairwise alignments of genomes and transcriptomes we used LAST aligner
version 266 (Kielbasa et al. 2011), as it provides high sensitivity to align even distantly related
genomes and transcriptomes in a computationally effective manner. Output MAF (multiple
alignment format) was filtered for orthologous alignments using single_cov2 from the
TBA/MULTIZ package (Blanchette et al., 2004) (2-way filtering, ref→query and query→ref). The
pairwise transcriptome/genome alignments were combined into a multiple genome alignment using
MULTIZ. All required steps were run on split parts of the reference genome by custom scripts using
GNU PARALLEL (Tange 2011) to enable the use of multi-threaded CPUs. The final MAF is
available from Dryad (doi:10.5061/dryad.ngf1vhhpx).
21
Extraction of coding, non-coding and codon-based multiple alignments: We used our annotation
of the canary genome (http://public-genomes-ngs.molgen.mpg.de) to define bed files with
coordinates of the coding, 3’UTR and 5’UTR, intronic and intergenic regions of the genome. The
mafsInRegion tool from the Kent utilities (Kent et al. 2002) was used to extract the different
fractions of the genome into coding/non-coding MAF files according to the bed files. Further
processing of the alignments included removing alignments that did not align with an outgroup
species to remove potential reference bias; here we used the ostrich (Struthio camelus) as a „must

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


match“. Both the coding and non-coding multiple alignments, were written into a concatenated
multiple fasta alignment file using mafToFa (Kent utilities) followed by custom scripts for
concatenation and adding “-“ characters for missing data to the sequence.
To generate a codon based alignment we extracted coding exons for each gene from the
coding MAF file, using bed-files defining the coding exons in the canary genome for plus and
minus strands separately. Exons of each gene were concatenated. Afterwards minus strand gene
sequences were reverse complemented by SEQTK. The coding gene sequences were aligned by
BLAT (Kent 2002) against their corresponding canary protein sequence to identify and remove
frameshifts by custom scripts. The final multiple codon alignment of gene sequences was performed
by TranslatorX (Abascal et al. 2010) choosing MAFFT for multiple alignment (Katoh and Standley
2013). All codon aligned genes were concatenated into a large alignment which was used to
translate codons into a multiple amino acid alignment or to extract files containing codon positions
(1, 2 or 1+2).

Finding a suitable gap versus data content for the multiple alignments: We filtered the multiple
alignment fasta files to allow only a certain amount of missing data per column. In this regard, we
generated multiple alignments with the following numbers of allowed gaps per alignment column:
10, 20, 40, 60, 80, 90, 100, 110, 120, 140, 160. Alignments for 3’UTR, CODON or AAS with
different gap cut-offs are available from Dryad (doi:10.5061/dryad.ngf1vhhpx). For each alignment
10 trees were calculated using different maximum parsimony starting trees and RAxML (v8.2.4;
Stamatakis, 2014) for fast approximate tree inference (parameter: –f E –m GTRCAT or PROTCAT
for amino acid sequence) and subsequent Nearest Neighbour Interchange (NNI) refinement and SH-
aLRT support calculation (parameter –f J –m GTRGAMMA or PROTGAMMA for amino acid
sequence). RAxML and other tools for ML tree inference use heuristic methods to infer tree
topologies and are often unable to find the best fitting tree topology by a single run. We found that
for our dataset 10 replicates were a good trade-off between computational time needed and
probability of finding the best fitting topology when using 3’UTR alignments. To assess tree
22
topology convergence, 10 trees for each alignment file were compared to each other by calculating
pair-wise Robinson-Foulds (RF) distances (RAxML –f r option).
Additionally, we computed coalescent consensus trees for the 10 trees per alignment file
(subsets) by ASTRAL-III v5.6.1 (Zhang et al., 2018). We then calculated the Robinson-Foulds
distances of the neighbouring subset coalescent trees (subset1 vs. subset2; subset2 vs. subset3;…).

Phylogenetic tree calculations by concatenated alignments: Computing time is a major bottleneck

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


in large phylogenomic projects. Replicates of our large datasets with millions of aligned bases for
more than 400 species can so far only be efficiently computed (as described above) using fast
approximate methods and using NNI-optimization and SH-aLRT calculation under the
GTRGAMMA model instead of a standard bootstrapping method to calculate branch supports. To
further improve trees from derived from the fast method candidate trees (derived from alignments
with gap cut-off (100 respectively 110) and exhibiting the best LogLH scores after NNI-
optimization under GTRGAMMA) were subjected to a thorough optimization using RAxML-NG
(Kozlov et al. 2019). After this final optimization, topological changes were zero for 3’UTR
underlining that the fast method was equivalent to the exhaustive method in case of 3’UTR. Few
changes were observed for CODON, CODON and AAS trees (17, 14 and 6 of 451 splits changed,
respectively) underlining that the phylogenetic signal is more difficult to resolve in these cases. In
Fig. 2, Fig. 3 and Fig. S6 we show statistical support values for gappiness 100 (current data set of
429 species) and for gappiness 110 (a previous data set of 427 species).
To corroborate the 3’UTR tree SH-aLRT branch supports by another method we calculated
1000 ultrafast bootstrapping (UFBS) trees using IQTREE2 (Minh et al. 2020). We considered all
branches as highly supported if the SH-aLRT values reached the maximum value (100). For
branches not meeting this criteria SH-aLRT and UFBS values are shown in the trees (Fig.2, Fig. 3A,
3B).
To test, if other evolutionary models than GTR (Tavaré 1986) would fit better to our
alignments, we split the concatenated alignment into chunks of 10 kbp and performed iqtree model
test on each chunk. For the 3’UTR the best fitting model to the majority of chunks was the GTR
model (51.2%), followed by TVM (18.6%) (Posada 2003) and SYM (18.2%) (Zharkik 1994)
models (Fig. S6). For CDS the SYM or GTR models were the best models for a nearly equal
fraction of chunks (34.8% and 33.5%, respectively) (Fig. S6). We refined the best-scoring tree
topology obtained from CDS and the GTR model by using RAxML-NG exhaustive tree topology
search with the SYM model, which did not change the tree topology.
Depicted are trees that either contain all species (Fig. S6) or, for clarity, only one species per
family (Fig. 3A & 3B) or just order-level taxa (Fig. 2, Fig. S4). The trees not showing all species
23
were prepared using FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and jsTree
(http://www.jstree.com/) from the all-species-trees. The depicted statistical values (Fig. 3A & 3B;
Fig. S6) are derived from the all-species-trees.

Final phylogenetic species tree calculations by a coalescent approach: Besides the tree inference
from concatenated alignments, we also tested inferring the species tree using a coalescence
approach. We calculated 5,127 trees for 3’UTRs of distinct gene models using iqtree (version

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


1.6.12; parameters: -alrt 1000 -m GTR+R4+FO). These trees were used to compute a species tree
by ASTRAL (version 5.6.1; default parameters). We calculated coalescent trees from different
amounts of input gene trees (252, 471, 862, 1600, 3200 and 5127), which were also sorted by the
amount of input data from which they were derived (highest first). Thus, we had a lower number of
higher quality gene trees from longer 3’UTR alignments and a high number of lower quality trees
from shorter 3’UTR alignments. We observed the highest similarities (Robinson-Foulds: 97.8%)
between coalescent species trees calculated from 471, 862 and 1600 input gene trees. Increasing
numbers of gene trees in the calculations (> 1600) reduced the similarity between repeated tree
calculations (data not shown).

Time-calibrated phylogenetic tree: We used DPPDiv (Heath et al. 2012) for time calibration of our
family-level phylogeny, which used the dirichlet process prior (DPP) model or the uncorrelated
gamma-distributed rates (UGR) model. This tool has the advantage of being able to use parallel
computation. Nevertheless, we had to downscale the included amount of data to an alignment length
of 10,000 - 100,000 nucleotides to finish computation within a reasonable amount of time (several
days to weeks) on high-power computing servers (96 or 192 CPU threads). Calculations were
performed twice using different starting conditions (with/without parameter: -ubl) and we checked
for convergence for both, the dirichlet process prior (DPP) model and the uncorrelated gamma-
distributed rates model (parameter: -urg), named UGR model (Heath et al. 2012). The analyses were
run until linear correlation of the median divergence times between the two runs of the same model
reached R2-values larger than 0.99 and a slope of 1.0. Eighteen nodes in the tree were calibrated
with fossil data, which were also used before (Jarvis et al. 2014); the divergence date for the split of
pigeons and Mirandornithes was omitted due to differences between the Jarvis tree (Jarvis et al.
2014) and our 3’UTR tree (Fig. 3A). The time-calibrated tree was visualized with FigTree
(http://tree.bio.ed.ac.uk/ software/figtree/).

Extracting 3’UTR sequences from the RNAseq assemblies for the detection of transcription
binding sites and transcription binding sites models: To detect phylogenetically relevant signals in
24
3’UTRs we compared species of the orders Charadriiformes, Caprimulgiformes and Passeriformes.
For family comparisons within the Passeriformes, we compared species of the Estrildidae, the
Fringillidae and of Basal Oscines; the latter is an artificial group including species of the basal
radiations of oscines (see Tab. 3 for species). We compared the 3’UTRs of 97 randomly selected
genes (Tab. S3) among the species.
We carried out analyses using the Genomatix software suite (Precigen Bioinformatics
Germany GmbH) combining several mining sources (Over-representation of transcription binding

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


sites and MatInspector tools). In order to determine the occurrence of transcription factor binding
sites (TFBS), we searched binding elements in the extracted 3’UTRs using the Over-Represented
TFBS tool (MatBase genomatix definition, Genomatix), which selects the presence of TFBS within
the input sequences, generates statistics of single TFBS and calculates Z-scores of the
representation of TFBS based on the TFBS abundances in the whole zebra finch genomic sequence
(Ho Sui et al. 2005). The TFBS occurrences were calculated with MatInspector. The Z-Score
correlation graph (Fig. 1D) was produced using the Estrildidae as a reference.
To investigate the pattern of TFBS of genera and families, we pursued deeper analysis of
3’UTRs using the FrameWorker-Genomatix tool suite (Precigen Bioinformatics Germany GmbH).
FrameWorker calculates the most complex models of TFBS that are common to sequences of the
included species. Models are defined as all TFBS that occur in the same order and in a certain
distance range in all (or a subset of) the input sequences (Cartharius et al. 2005). As an example, we
analyzed the TFBS within the family Spheniscidae (Fig. S7). For a genus specific model of TFBS,
we compared species of the genera Aptenodytes, Eudyptes and Pygoscelis using the EMC1 3’UTR
sequences. EMC1 codes for subunit 1 of the endoplasmatic reticulum membrane protein complex.

Literature-based analysis of singing: Vocal production learning, abbreviated in this paper as vocal
learning, was considered present in a species, if studies had reported imitation of conspecifics,
mimicry of heterospecifics or mimicry of non-bird sounds in that species, or if studies had reported
local dialects in that species. As sources, we studied all available publications as well as various
encyclopaedias, the Handbook of the Birds of the World, the Handbook of Western Palearctic Birds,
the Handbook of Australian, New Zealand and Antarctic Birds, and The Birds of Africa. For the
family level analysis, we considered vocal learning as present in a family, if at least one species of a
family fulfilled the criteria above. The family-level taxa and related references to vocal learning are
listed in Tab. S5. To test the association between the phylogenetic tree and the occurrence of vocal
learning, in the family-level taxa we used TreeBreaker (https://github.com/ansariazim/treeBreaker),
an inference procedure based on a Bayesian statistical method (Ansari and Didelot 2016). The
software uses a Bayesian model to deduce whether the phenotype of interest is randomly distributed
25
on the tips of the tree and to estimate which clades, if any, have a distinct distribution from the rest
of the tree (Ansari et al. 2019).

Supplementary Material
Supplementary Figures
Supplementary Tables
Supplementary Discussion

Acknowledgements

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


We thank Drs. Albertine Leitaõ, Christoph Gahr and Leo Joseph for valuable comments to the
manuscript and Dr. Leitaõ for help with statistics. We thank Sami Domisch and Enrico
Willenbücher (Leibniz-Institute of Freshwater Ecology and Inland Fisheries) for providing
additional compute server capacities. We thank the following institutions for donating tissues:
Alpenzoo Innsbruck, Austria; Australian National Wildlife Collection, CSIRO, Canberra, Australia;
Collection Kai Clausen, Germany; Collection Ludger Bremehr, Verl, Germany; Johannes-
Gutenberg-Universität Mainz, Germany; Landesmuseum Oldenburg, Germany; Lousiana State
University, Museum of Natural Science, Baton Rouge, USA; Museum für Naturkunde, Leibniz-
Institut für Evolutions- und Biodiversitätsforschung, Germany; Museo Nacional de Historia
Natural, Chile; Museo Argentino De Ciencias Naturales, Argentina; Museum Victoria Melbourne,
Australia; National Avian Research Center, Seihan, United Arab Emirates; Phillip Island Nature
Parks, Cowes, Australia; Tierpark Hagenbeck Hamburg, Germany; Tierpark Hellabrunn München,
Germany; Tierpark Berlin-Friedrichsfelde, Germany; Technical University of Munich, Germany;
Federal University of Para, Brazil; University of Giessen, Germany; University of Gdansk, Poland;
Université de La Réunion, Sainte Clotilde, France; Université Paris-Sud, Orsay, France; University
of Vienna, Austria; Wageningen University and Research, The Netherlands; University of
Washington, Burke Museum, Seattle, USA; Vogelpark Walsrode, Germany; Washington State
University, Pullman, USA; Wilhelma Zoologisch Botanischer Garten Stuttgart, Germany;
Zoological Museum, Lomonosov Moscow State University, Russia; Zoologischer Garten Berlin,
Germany; Zoologischer Garten Köln, Germany; Zoologischer Garten Wuppertal, Germany; Zoo
Heidelberg, Germany; Zoo Zürich, Switzerland.

Funding
This work was funded through a grant of the president of the Max Planck Society to M. Gahr. Some
methods applied here were developed for a project of H. Kuhl, funded by the German Research
Foundation (DFG) “Eigene Stelle” grant to Heiner Kuhl: KU 3596/1-1; project number 324050651.

Authors contributions
HK: design of bioinformatic pipeline, data processing and analysis, and manuscript writing. CF:
Comparative analysis of 3’UTR structure. AB: Preparation of all RNAs and DNAs. GM: evaluation
of time-calibration and fossil data. GN: tissue sampling. STB, SK and BT: Sequencing. MG:
concept, tissue sampling, meta-analysis of vocal learning, writing of the manuscript.

Data availability
26
Transcriptome assemblies used in this study have been made available through as a Dryad archive
(https://doi.org/10.5061/dryad.ngf1vhhpx). Transcriptome sequencing reads are available under the
BioProject accession number: PRJNA599522.

References
Abascal F, Zardoya R, Telford MJ. 2010. TranslatorX: multiple alignment of nucleotide sequences

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


guided by amino acid translations. Nucleic Acids Res. 38:W7-13.

Aggerbeck M, Fjeldsa J, Christidis L, Fabre PH, Jonsson KA. 2014. Resolving deep lineage
divergences in core corvoid passerine birds supports a proto-Papuan island origin. Mol Phylogenet
Evol. 70:272-285.

Ansari MA, Aranday-Cortes E, Ip CL, da Silva Filipe A, Lau SH, Bamford C, Bonsall D, Trebes A,
Piazza P, Sreenu V, et al. 2019. Interferon lambda 4 impacts the genetic diversity of hepatitis C
virus. Elife. 8:e42463

Ansari MA, Didelot X. 2016. Bayesian Inference of the Evolution of a Phenotype Distribution on a
Phylogenetic Tree. Genetics. 204:89-98.

Armstrong EA. 1963. A study of bird song. London: Oxford University Press.

Arnaiz-Villena A, Ruiz-del-Valle V, Gomez-Prieto P, Reguera R, Parga Lozano CH, Serrano-Vela J.


2009. Estrildinae Finches (Aves, Passeriformes) from Africa, South Asia and Australia: a Molecular
Phylogeographic Study. Open Ornithol J. 2:29-36.

Baker AJ, Pereira SL, Paton TA. 2007. Phylogenetic relationships and divergence times of
Charadriiformes genera: multigene evidence for the Cretaceous origin of at least 14 clades of
shorebirds. Biol Lett. 3:205-209.

Baptista LF, Schuchmann KL. 1990. Song Learning in the Anna Hummingbird. Ethol. 84:15-26.

Barker FK, Cibois A, Schikler P, Feinstein J, Cracraft J. 2004. Phylogeny and diversification of the
largest avian radiation. Proc Natl Acad Sci USA 101:11040-11045.

Bates GL. 1930. Handbook of the birds of West Africa. London: Bale Sons and Danielson.

Bharathavikru R, Dudnakova T, Aitken S, Slight J, Artibani M, Hohenstein P, Tollervey D, Hastie


N. (2017). Transcription factor Wilms' tumor 1 regulates developmental RNAs through 3' UTR
interaction. Gen Develop. 31:347-352.

Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K,
Clawson H, Green ED, et al. 2004. Aligning multiple genomic sequences with the threaded blockset
aligner. Genome Res. 14:708-715.

Brown RP, Yang Z. 2011. Rate variation and estimation of divergence times using strict and relaxed
clocks. BMC Evol Biol. 11:271.
27
Burgess SJ, Reyna-Llorens I, Stevenson SR, Singh P, Jaeger K, Hibberd JM. 2019. Genome-Wide
Transcription Factor Binding in Leaves from C3 and C4 Grasses. The Plant cell 31: 2297-2314.

Cartharius K, Frech K, Grote K, Klocke B, Haltmeier M, Klingenhoff A, Frisch M, Bayerlein M,


Werner T. 2005. MatInspector and beyond: promoter analysis based on transcription factor binding
sites. Bioinformatics. 21:2933-2942.

Chen A, White ND, Benson RBJ, Braun MJ, Field DJ. 2019. Total-Evidence Framework Reveals
Complex Morphological Evolution in Nightbirds (Strisores). Diversity. 11(9):143.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Chesser R, Have J. 2007. On the phylogenetic position of the scrub-birds (Passeriformes: Menurae:
Atrichornithidae) of Australia. J Ornithol. 148:471-476.

Chung PJ, Jung H, Choi YD, Kim JK. 2018. Genome-wide analyses of direct target genes of four
rice NAC-domain transcription factors involved in drought tolerance. BMC genomics. 19:40.

Claramunt S, Cracraft J. 2015. A new time tree reveals Earth history's imprint on the evolution of
modern birds. Sci Adv 1:e1501005.

Cohen JE, Lee PR, Fields RD. 2014. Systematic identification of 3'-UTR regulatory elements in
activity-dependent mRNA stability in hippocampal neurons. Philos Trans R Soc Lond B Biol Sci
369(1652):20130509.

Cruickshank AJ, Gautier J-P, Chappuis C. 1993. Vocal mimicry in wild African Grey Parrots
Psittacus erithacus. Ibis. 135:293-299.

Del Hoyo J, Collar NJ. 2014. HBW and BirdLife International Illustrated Checklist of the Birds of
the World. Volume 1: Non-passerines. Barcelona, España: Lynx Edicions.

Del Hoyo J, Collar NJ. 2016. HBW and BirdLife International Illustrated Checklist of the Birds of
the World. Volume 2: Passerines. Barcelona, España: Lynx Edicions.

Ericson PGP, Anderson CL, Britton T, Elzanowski A, Johansson US, Kallersjo M, Ohlson JI,
Parsons TJ, Zuccon D, Mayr G. 2006. Diversification of Neoaves: integration of molecular
sequence data and fossils. Biol Lett. 2:543-547.

Ericson PGP. 2012. Evolution of terrestrial birds in three continents: biogeography and parallel
radiations. J Biogeogr. 39:813–824.

Ericson PGP, Klopfstein S, Irestedt M, Nguyen JMT, Nylander JAA. 2014. Dating the
diversification of the major lineages of Passeriformes (Aves). BMC Evol Biol. 14:8-8.

Ferdous MM, Bao Y, Vinciotti V, Liu X, Wilson P. 2018. Predicting gene expression from genome
wide protein binding profiles. Neurocomputing. 275:1490-1499.

Field DJ, Benito J, Chen A, Jagt JWM, Ksepka DT (2020) Late Cretaceous neornithine from
Europe illuminates the origins of crown birds. Nature. 579:397-401.

Frankl-Vilches C, Kuhl H, Werber M, Klages S, Kerick M, Bakker A, de Oliveira EHC, Reusch C,


Capuano F, Vowinckel J, et al. 2015. Using the canary genome to decipher the evolution of
hormone-sensitive gene regulation in seasonal singing birds. Gen Biol. 16:1-25.
28
Fregin S, Haase M, Olsson U, Alström P. 2012. New insights into family relationships within the
avian superfamily Sylvioidea (Passeriformes) based on seven molecular markers. BMC Evol Biol.
12:157.

Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-generation


sequencing data. Bioinformatics. 28:3150-3152.

Gahr M. 2000. Neural song control system of hummingbirds: comparison to swifts, vocal learning
(Songbirds) and nonlearning (Suboscines) passerines, and vocal learning (Budgerigars) and
nonlearning (Dove, owl, gull, quail, chicken) nonpasserines. J Comp Neurol. 426:182-196.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Gill F, Donsker D. 2013. IOC world bird list (version 3.3). International Ornithologists’ Union (doi:
10. 14344/IOC. ML. 3.5).

Gill F, Donsker D. 2017. IOC World Bird List (version 7.3). International Ornithologists’ Union
(doi: 10.14344/IOC.ML.7.2).

Goller M, Shizuka D. 2018. Evolutionary origins of vocal mimicry in songbirds. Evol Lett. 2:417-
426.

Hackett SJ, Kimball RT, Reddy S, Bowie RC, Braun EL, Braun MJ, Chojnowski JL, Cox WA, Han
K-L, Harshman J. 2008. A phylogenomic study of birds reveals their evolutionary history. Science.
320:1763-1768.

Hansen J, Sato M, Russell G, Kharecha P. 2013. Climate sensitivity, sea level and atmospheric
carbon dioxide. Phil Trans Royal Soc A: Math Phys Eng Sci. 371:20120294.

Heath TA, Holder MT, Huelsenbeck JP. 2012. A Dirichlet process prior for estimating lineage-
specific substitution rates. Mol Biol Evol. 29:939-55.

Ho Sui SJ, Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, Wasserman WW. 2005.
oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed
genes. Nucl Acids Res. 33:3154-3164.

Hooper DM, Price TD. 2015. Rates of karyotypic evolution in Estrildid finches differ between
island and continental clades. Evol. 69:890-903.

Houde P, Braun EL, Narula N, Minjares U, Mirarab S. 2019. Phylogenetic Signal of Indels and the
Neoavian Radiation. Diversity. 11:108.

Houde P, Braun EL, Zhou L. 2020. Deep-Time Demographic Inference Suggests Ecological
Release as Driver of Neoavian Adaptive Radiation. Diversity. 12(4): 164.

Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SY, Faircloth BC, Nabholz B, Howard
JT, et al. 2014. Whole-genome analyses resolve early branches in the tree of life of modern birds.
Science. 346:1320-1331.

Jarvis ED, Ribeiro S, da Silva ML, Ventura D, Vielliard J, Mello CV. 2000. Behaviourally driven
gene expression reveals song nuclei in hummingbird brain. Nature. 406:628-632.

Jash A, Yun K, Sahoo A, So JS, Im SH. 2012. Looping mediated interaction between the promoter
and 3' UTR regulates type II collagen expression in chondrocytes. PloS one 7:e40828.
29

Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO. 2012. The global diversity of birds in space
and time. Nature. 491:444.

Johansson US, Ekman J, Bowie RC, Halvarsson P, Ohlson JI, Price TD, Ericson PG. 2013. A
complete multilocus species phylogeny of the tits and chickadees. Mol Phylogenet Evol. 69:852-
860.

Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7:
improvements in performance and usability. Mol Biol Evol. 30:772-780.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Kent WJ. 2002. BLAT--the BLAST-like alignment tool. Genome Res 12:656-664.

Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The
human genome browser at UCSC. Genome Res 12:996-1006.

Kielbasa SM, Wan R, Sato K, Horton P, Frith MC. 2011. Adaptive seeds tame genomic sequence
comparison. Genome Res. 21:487-493.

Kimball RT, Wang N, Heimer-McGinn V, Ferguson C, Braun EL. 2013. Identifying localized biases
in large datasets: A case study using the Avian Tree of Life. Mol Phylogenet Evol 69:1021–1032.

Kimball RT, Oliveros CH, Wang N, White ND, Barker FK, Field DJ, Ksepka DT, Chesser RT,
Moyle RG, Braun MJ, et al. 2019. A Phylogenomic Supertree of Birds. Diversity. 11:109.

Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. 2019. RAxML-NG: a fast, scalable and
user-friendly tool for maximum likelihood phylogenetic inference, Bioinformatics. 21:4453–4455

Ksepka DT, Stidham TA, Williamson TE. 2017. Early Paleocene landbird supports rapid
phylogenetic and morphological diversification of crown birds after the K–Pg mass extinction. Proc
Nat Acad Sci. USA 114:8047-8052.

Kuramoto T, Nishihara H, Watanabe M, Okada N. 2015. Determining the Position of Storks on the
Phylogenetic Tree of Waterbirds by Retroposon Insertion Analysis. Genome Biol Evol. 7:3180-
3189.

Lee SH, Mayr C. 2019. Gain of Additional BIRC3 Protein Functions through 3'-UTR-Mediated
Protein Complex Formation. Molecular Cell. 74(4):701-712. e9. doi: 10.1016/j.molcel.2019.03.006.

Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or
nucleotide sequences. Bioinformatics. 22:1658-1659.

Lianoglou S, Garg V, Yang JL, Leslie CS, Mayr C. 2013. Ubiquitously transcribed genes use
alternative polyadenylation to achieve tissue-specific expression. Genes Dev. 27:2380-2396.

Liu WC, Wada K, Jarvis ED, Nottebohm F. 2013. Rudimentary substrates for vocal learning in a
suboscine. Nat Commun. 4:2082.

Livezey BC, Zusi RL. 2007. Higher-order phylogeny of modern birds (Theropoda, Aves:
Neornithes) based on comparative anatomy. II. Analysis and discussion. Zool J Linn Soc. 149:1-95.
30
Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, Warren WC, Mello CV. 2014.
Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 15:565.

Mason NA, Burns KJ, Tobias JA, Claramunt S, Seddon N, Derryberry EP. 2017. Song evolution,
speciation, and vocal learning in passerine birds. Evolution. 71:786-796.

Mayr C. 2016. Evolution and biological roles of alternative 3′ UTRs. Trends Cell Biol. 26:227-237.

Mayr C. 2017. Regulation by 3′-untranslated regions. Annu Rev Genet. 51:171-194.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Mayr G. 2004. Morphological evidence for sister group relationship between flamingos (Aves:
Phoenicopteridae) and grebes (Podicipedidae). Zool J Linn Soc. 140:157-169.

Mayr G. 2006. The contribution of fossils to the reconstruction of the higher-level phylogeny of
birds. Species, Phylogeny and Evolution 1:59-64.

Mayr G. 2009. Paleogene fossil birds. Springer. Heidelberg.

Mayr G. 2010. Phylogenetic relationships of the paraphyletic ‘caprimulgiform’ birds (nightjars and
allies). J Zool Syst Evol Res. 48:126-137.

Mayr G. 2014. The origins of crown group birds: molecules and fossils. Palaeontol. 57:231-242.

Mayr G. 2017. Avian Evolution: The Fossil Record of Birds and Its Paleobiological Significance.
Chichester: John Wiley & Sons Ltd.

Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R.
2020. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic
Era. Mol Biol Evol. 37(5):1530-1534.

Mirarab S, Reaz R, Bayzid S, Zimmermann T, Swenson MS, Warnow T. 2014. ASTRAL: Genome-
Scale Coalescent-Based Species Tree Estimation. Bioinformatics. 30:i541–i548.

Nottebohm F, Stokes TM, Leonard CM. 1976. Central control of song in the canary, Serinus
canarius. J Comp Neurol. 165:457-486.

Oliveros CH, Field DJ, Ksepka DT, Barker FK, Aleixo A, Andersen MJ, Alstrom P, Benz BW,
Braun EL, Braun MJ, et al. 2019. Earth history and the passerine superradiation. Proc Natl Acad Sci
USA 116:7916-7925.

Pacheco MA, Battistuzzi FU, Lentino M, Aguilar RF, Kumar S, Escalante AA. 2011. Evolution of
modern birds revealed by mitogenomics: timing the radiation and origin of major orders. Mol Biol
Evol. 28(6):1927-1942.

Peña -Hernández R, Marques M, Hilmi K, Zhao T, Saad A, del Rincon SV, Ashworth T, Roy AL,
Emerson BM, Witcher M. 2015. Genome-wide targeting of the epigenetic regulatory protein CTCF
to gene promoters by the transcription factor TFII-I. Proc Natl Acad Sci USA. 112: E677-E686.

Peng Y, Leung HC, Yiu SM, Chin FY. 2012. IDBA-UD: a de novo assembler for single-cell and
metagenomic sequencing data with highly uneven depth. Bioinformatics. 28:1420-1428.
31
Penrad-Mobayed M, Perrin C, L'Hote D, Contremoulins V, Lepesant JA, Boizet-Bonhoure B,
Poulat F, Baudin X, Veitia RA. 2018. A role for SOX9 in post-transcriptional processes: insights
from the amphibian oocyte. Sci Rep. 8:7191.

Posada D. 2003. Using MODELTEST and PAUP* to select a model of nucleotide substitution.
Current protocols in bioinformatics. New York. John Wiley & Sons.

Prum RO, Berv JS, Dornburg A, Field DJ, Townsend JP, Lemmon EM, Lemmon AR. 2015. A
comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature.
526:569-573.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Reddy S, Kimball RT, Pandey A, Hosner PA, Braun MJ, Hackett SJ, Han KL, Harshman J,
Huddleston CJ, Kingston S, et al. 2017. Why Do Phylogenomic Data Sets Yield Conflicting Trees?
Data Type Influences the Avian Tree of Life more than Taxon Sampling. Syst Biol. 66:857-879.

Robinson CM, Snyder KT, Creanza N. 2019. Correlated evolution between repertoire size and song
plasticity predicts that sexual selection on song promotes open-ended learning. Elife. 8:e44454.

Robinson FN, Curtis HS. 1996. The Vocal Displays of the Lyrebirds (Menuridae). Emu - Austral
Ornithol. 96:258-275.

Sackton TB, Grayson P, Cloutier A, Hu Z, Liu JS, Wheeler NE, Garnder PP, Clarke JA, Baker AJ,
Clamp N, Edwards SV. 2019. Convergent regulatory evolution and loss of flight in paleognathous
birds. Science 364:74-78.

Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. 2008. Proliferating cells express mRNAs
with shortened 3' untranslated regions and fewer microRNA target sites. Science. 320:1643-1647.

Sangster G. 2005. A name for the flamingo-grebe clade. Ibis. 147:612–615.

Sibley CG, Ahlquist JE. 1990. Phylogeny and Classification of the Birds. A Study in Molecular
Evolution: Yale University Press.

Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing
genome assembly and annotation completeness with single-copy orthologs. Bioinformatics.
31:3210-12.

Smith BT, McCormack JE, Cuervo AM, Hickerson MJ, Aleixo A, Cadena CD, Perez-Eman J,
Burney CW, Xie X, Harvey MG, et al. 2014. The drivers of tropical speciation. Nature. 515:406-
409.

Sorenson MD, Balakrishnan CN, Payne RB. 2004. Clade-limited colonization in brood parasitic
finches (Vidua spp.). Syst Biol. 53:140-153.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large
phylogenies. Bioinformatics. 30:1312-1313.

Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, Raubitschek A, Ziegler S,


LeProust EM, Akey JM, Stamatoyannopoulos JA. 2013. Exonic transcription factor binding directs
codon choice and affects protein evolution. Science. 342: 1367-1372.
32
Suh A, Paus M, Kiefmann M, Churakov G, Franke FA, Brosius J, Kriegs JO, Schmitz J. 2011.
Mesozoic retroposons reveal parrots as the closest living relatives of passerine birds. Nature Comm.
2:443.

Suh A, Smeds L, Ellegren H. 2015. The Dynamics of Incomplete Lineage Sorting across the
Ancient Adaptive Radiation of Neoavian Birds. PLOS Biol. 13:e1002224.

Sun X, Wang X, Tang Z, Grivainis M, Kahler D, Yun C, Mita P, Fenyö D, Boeke JD. 2018.
Transcription factor profiling reveals molecular choreography and key regulators of human
retrotransposon expression. Proc Natl Acad Sci USA 115: E5526–E5535.

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Tange O. 2011. Gnu parallel-the command-line power tool. The USENIX Magazine 36:42-47.

Tan-Wong SM, Zaugg JB, Camblong J, Xu Z, Zhang DW, Mischo HE, Ansari AZ, Luscombe NM,
Steinmetz LM, Proudfoot NJ. 2012. Gene loops enhance transcriptional directionality. Science.
338:671-675.

Tavaré S. 1986. Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences.
Lect Math Life Sci. 17:57–86

Tuğrul M, Paixão T, Barton NH, Tkačik G. 2015. Dynamics of Transcription Factor Binding Site
Evolution. PLoS Genet.11(11):e1005639.

Van Tuinen M, Butvill DB, Kirsch JA, Hedges SB. 2001. Convergence and divergence in the
evolution of aquatic birds. Proc R Soc B. 268:1345-1350.

Van Tuinen M, Stidham TA, Hadly EA. 2006. Tempo and mode of modern bird evolution observed
with large-scale taxonomonc sampling. Hist Biol. 18:209-225.

Vicario DS. 2004. Using learned calls to study sensory-motor integration in songbirds. Ann N Y
Acad Sci. 1016:246-262.

Winkler H. 2015. Phylogeny, biogeography and systematics. Developments in woodpecker biology


36:7-35.

Wirthlin M, Lima NC, Guedes RLM, Soares AE, Almeida LGP, Cavaleiro NP, de Morais GL,
Chaves AV, Howard JT, de Melo Teixeira M. 2018. Parrot genomes and the evolution of heightened
longevity and cognition. Curr Biol. 28:4001-4008. e4007.

Xiong P, Hulsey CD, Meyer A, Franchini P. 2018. Evolutionary divergence of 3' UTRs in cichlid
fishes. BMC Genomics. 19:433.

Xu L, Peng L, Gu T, Yu D, Yao Y-G. 2019. The 3′UTR of human MAVS mRNA contains multiple
regulatory elements for the control of protein expression and subcellular localization. BBA - Gene
Regul Mech. 1862:47-57.

Yin ZT, Zhu F, Lin FB, Jia T, Wang Z, Sun DT, Li GS, Zhang CL, Smith J, Yang N, et al. 2019.
Revisiting avian 'missing' genes from de novo assembled transcripts. BMC Genomics. 20:4.

Zachos J, Pagani M, Sloan L, Thomas E, Billups K. 2001. Trends, rhythms, and aberrations in
global climate 65 Ma to present. Science. 292:686-693.
33
Zann R. 1990. Song and call learning in wild zebra finches in south-east Australia. Anim Behav.
40:811-828.

Zelenkov N. 2019. Systematic Position of Palaeortyx (Aves, Phasianidae) and Notes on the
Evolution of Phasianidae. J Paleontol. 53:194-202.

Zhang C, Rabiee M, Sayyari E, Mirarab S. 2018. ASTRAL-III: polynomial time species tree
reconstruction from partially resolved gene trees. BMC Bioinformatics. 19:153.

Zharkikh A. 1994. Estimation of evolutionary distances between nucleotide sequences. J Mol Evol

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


39:315-29.

Zuccon D, Prŷs-Jones R, Rasmussen PC, Ericson PGP. 2012. The phylogenetic relationships and
generic limits of finches (Fringillidae). Mol Phylogenet Evol. 62:581-596.
34
Figure Legends

Fig. 1: Analysis of tree topology congruency for different non-coding and coding data types (A, B,
C) and taxon-specific sequences in 3’UTRs (D). In A, multiple tree inferences using distinct starting
trees and subsequent refinement by NNI (= Nearest Neighbour Interchange) moves resulted in a
better tree topology congruency (lower Robinson-Foulds distance) for 3‘UTR trees (UTR = 3’UTRs
of all species; UTR393 = 3’UTRs including only 7 genomes of which no transcriptomes were

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


available) as compared to trees calculated from similar amounts of coding sequence data (CDN =
codons of all species; CDN12 = codon position 1 and 2 only, all species; AAS = amino acid
sequence, all species); tree inference RAxML fast mode (-f E), model GTRCAT (or
PROTCATJTTF) without or with NNI improvement under GTRGAMMA (PROTGAMMAJTTF)
RAxML(-f J)). In B, we compared the rate of change of average per-site likelihood (blue) with the
tree topology convergence (red; average Robinson-Foulds distances of 10 trees), and the
convergence of average trees from neighbouring data points (green; Robinson-Foulds distance; e.g.
average tree n compared to average tree n+1,..). The rate of change of average per-site likelihood
depends on the allowed-missing data in the alignments. The rate of change of average per-site
likelihood can be computed fast (single inference per alignment) as compared to tree topology
convergences (multiple inferences) and predicts an optimal number of allowed gaps per column in
3’UTR multiple sequence alignments of about 100 missing species per pattern. C, influence of
mixing 3’UTR and CDS (coding sequences) on the resulting tree topology. Adding relatively small
amounts of 3’UTR to CDS had already a strong impact on the resulting tree topologies (red line),
while adding small amounts of CDS to 3’UTR had a much lower impact on the resulting tree (blue
line). Note that both curves are different from the diagonal. D, the 3’UTRs of avian genes contain
evolutionary signals that distinguish order- and family-level taxa. The similarity of the presence of
transcription factor binding site motifs (TFBS) in 3’UTRs of species decreases with increasing
evolutionary distance between avian families. Shown are correlations (Z-values) of the abundance
of TFBS in 3’UTRs of 97 randomly selected genes expressed in the passerine family Estrildidae
versus Fringillidae, versus Basal Oscine families, versus family-level taxa of the order
Charadriiformes, and the order Caprimulgiformes. The correlation of TFBS abundance between
Charadriiformes and Caprimulgiformes (not shown) is R2 = 0.694. For the list of analyzed genes
and species see Tab. S3.

Fig. 2: Order-level phylogeny of the birds resulting from the analysis of 3’UTRs of 224 avian
family-level taxa including 379 genera and 429 species (see Fig. 3A & 3B for all families, Fig. S6
for all species). In contrast to all previous phylogenies spanning the entire avian class, the statistical
35
support values are high throughout, i.e. the approximate likelihood-based measures of branch
supports was maximal (SH-aLRT = 100) in most cases, except for four branching points (red
values). If we reduced the number of missing samples (gappiness) from 110 to 100, the support
levels of these four branching points dropped (blue values) while all others remained maximal. In
case of SH-aLRT values below 100, we provide the support values from IQTREE2 ultrafast
bootstrapping (green values). The tree is subdivided in seven higher-level clades, the Palaeognathae,
the Galloanserae, the Mirandornithes, the Basal Landbirds, the Aquatic & Semiaquatic Birds, the

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


Higher Landbirds, and the Australaves. Particular colors indicate each of the seven avian higher-
level clades in all phylogenetic trees of the study. Thus, trivial names (Basal Landbirds, Higher
Landbirds, Aquatic & Semiaquatic Birds) used in previous publications and in the current paper
comprise different sets of bird order- and family-level taxa. Note that the hoatzin
(Opisthocomiformes) resulted as the sister group of the Caprimulgiformes and that the flamingos
(Phoenicopteriformes) and grebes (Podicipediformes) form the sister group Mirandornithes of all
other Neoaves in our analysis. Black numbers at the nodes are the calculated divergence times of
the order-level taxa in million years ago (Mya). Most of the extant order-level taxa evolved in the
Paleocene, the other two during early Eocene and some lineages, likely, diverged already before the
K-Pg sixty-six Mya boundary. For illustration purpose, the branch lengths are not scaled. Bird
pictures are reproduced with permission of Lynx Edition.

Fig. 3: A family-level phylogeny of birds based on 3’UTR sequences including all (106) non-
passerine (Fig. 3A) and most (115) passerine (Fig. 3B) family-level taxa. For simplicity, each of the
families is represented by one species, listed as the species name, followed by the family name and
the order name. In Fig. 3A, the family-level taxa of the seven higher-level clades, the
Palaeognathae, the Galloanserae, the Mirandornithes, the Basal Landbirds, the Aquatic &
Semiaquatic Birds, the Higher Landbirds, and the Australaves are shown. The higher-level clades
are color-coded as in Fig. 2. Of the Passeriformes (Fig. 3B), the suborders Acanthisitti (New
Zealand wrens), Tyranni (sub-oscines) and Passeri (oscines or songbirds) are indicated and the
Passeri is subdivided into ten oscine-higher clades (OHCs). The tree was calculated by RAxML-ng
using a large concatenated alignment of 3'UTR residues as input (2,584,785 analyzable patterns,
maximum 100 or 110 missing taxa (gappiness). Approximate likelihood-based measures of branch
support delivered maximal values (SH-aLRT = 100) except those shown in red (for 110-gappiness)
and blue (for 100-gappiness). SH-aLRT values are considered as quite conservative. In case of SH-
aLRT values below 100, we also provide support values from IQTREE2 ultrafast bootstrapping
(UFBS, green values). In the few cases were SH-aLRT support was below 80 (two for 110-
gappiness; seven for 100-gappiness), the UFBS approach still reached good values of support in the
36
range of 86 – 99. The timing of the branching points was calculated by DPPDiv. The entire tree
including all 429 species is provided in Fig. S6. Error bars are confidence intervals (95%). Time
scale and divergence times are in Million years ago. Diagonal bars indicate the part of the tree that
is not scaled in order to reduce the size of the tree and the PDF.

Fig. 4: The diversification of oscine passerine families (red) contrasts with that of suboscine
passerine families (green) and of non-passerine families (blue) after the early Miocene epoch. The

Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020


numbers of new family-level taxa per Million year (My) were calculated from the family-level
phylogeny according to intervals of 5 My. After the K-Pg boundary (66 Mya), during the Paleocene
and early Eocene most neognath order-level taxa emerged with a rather steady rate of new family-
level taxa per My (“1”). During the Oligocene epoch, a major diversification event occurred (“2”),
which concerned both non-passerine and passerine family-level taxa (50 families of 12 orders), the
highest diversification rate of new family-level clades (3.0 non-passerine and 2.0 passerine family-
level clades / My) taking place between 35-25 Mya during the Rupelian and Chattian stages. A third
major diversification event (“3”) concerned mainly passerine family-level taxa, having a peak 25-15
Mya in the Aquitanian and Burdigalian stages of the early Miocene (1.6 non-passerine, 7.1
passerine families / My). Since the Miocene, the radiation of oscine family-level taxa contrasts
negatively with diversification rates of non-oscine passerine (New Zealand wrens and sub-oscines)
and non-passerine families. Arrows indicate the calculated emergence of family-level taxa that
evolved vocal learning, the parrots (a), the passerines (b) and the hummingbirds (c). The divergence
times of family-level clades were calculated with DPPDiv applying the uncorrected gamma-
distributed rate model (see Fig. 3A & 3B and Fig. S6).
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa191/5891114 by guest on 18 August 2020

You might also like