From Arabia to Iberia: A Y chromosome prospective

July 13, 2017 | Autor: Rene Herrera | Categoria: Genetics, Gene Flow, Spain, Middle East, Humans, Human Migration, Haplotypes, Gene, Human Migration, Haplotypes, Gene
Share Embed


Descrição do Produto

    From Arabia to Iberia: A Y chromosome prospective Mar´ıa Regueiro, Ralph Garcia-Bertrand, Karima Fadhlaoui-Zid, Joseph ´ Alvarez, Rene J. Herrera PII: DOI: Reference:

S0378-1119(15)00190-0 doi: 10.1016/j.gene.2015.02.042 GENE 40291

To appear in:

Gene

Received date: Revised date: Accepted date:

5 January 2015 8 February 2015 15 February 2015

Please cite this article as: Regueiro, Mar´ıa, Garcia-Bertrand, Ralph, Fadhlaoui-Zid, ´ Karima, Alvarez, Joseph, Herrera, Rene J., From Arabia to Iberia: A Y chromosome prospective, Gene (2015), doi: 10.1016/j.gene.2015.02.042

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT From Arabia to Iberia: a Y chromosome prospective

RI

PT

María Regueiro†,1, Ralph Garcia-Bertrand†,2,* Karima Fadhlaoui-Zid3, Joseph Álvarez1, Rene J Herrera1

1

SC

Department of Molecular and Human Genetics, College of Medicine, Florida International University, Miami FL33199, USA. 2

NU

Biology Department, Colorado College, Colorado Springs, CO 80903, USA

3

MA

Laboratoire de Genetique, Immunologie at Pathologies Humaines, Faculte des Sciences de Tunis,Campus Universitaire El Manar II, Universite el Manar, Tunis, Tunisia *Corresponding Author †

TE

D

These three authors contributed equally to the manuscript

AC CE P

Running head: Arabian, N African, Iberian Y chromosome relationships

Keywords: phylogenetic relationships, Y haplogroups E and J, Y-STR, Muslim migration of Iberia

Address for correspondence and reprints: Dr. Ralph Garcia-Bertrand Biology Department Colorado College 14 East Cache La Poudre Street Colorado Springs, CO 80903-3294 Phone: (719) 389-6402 Fax: (719) 389-6940 E-mail: [email protected]

1

ACCEPTED MANUSCRIPT Abstract At different times during recent human evolution, northern Africa has served as a conduit for

PT

migrations from the Arabian Peninsula. Although previous researchers have investigated the

RI

possibility of the Strait of Gibraltar as a conduit of migration from North Africa to Iberia, we now revisit this issue and theorize that although the Strait of Gibraltar, at the west end of this

SC

corridor, has acted as a barrier for human dispersal into Southwest Europe, it has not provided an

NU

absolute seal to gene flow. To test this hypothesis, here we use the spatial frequency distributions, STR diversity and expansion time estimates of Y chromosome haplogroups J1-P58

MA

and E-M81 to investigate the genetic imprints left by the Arabian and Berber expansions into the Iberian Peninsula, respectively. The data generated indicate that Arabian and Berber genetic

TE

D

markers are detected in Iberia. We present evidence that suggest that Iberia has received gene flow from Northwest Africa during and prior to the Islamic colonization of 711 A.D. It is

AC CE P

interesting that the highest frequencies of Arabia and Berber markers are not found in southern Spain, where Islam remained the longest and was culturally most influential, but in Northwest Iberia, specifically Galicia. We propose that Moriscos’ relocations to the north during the Reconquista, the migration of cryptic Muslims seeking refuge in a more lenient society and/or more geographic extensive pre-Islamic incursions may explain the higher frequencies and older time estimates of mutations in the north of the Peninsula. These scenarios are congruent with the higher diversities of some diagnostic makers observed in Northwest Iberia.

2

ACCEPTED MANUSCRIPT

PT

Introduction

RI

Western Europe, including the Iberian Peninsula, was first populated by humans relatively

SC

recently in the mid upper Paleolithic 35,000-40,000 years ago (ya) (Cavalli-Sforza et al., 1994).

NU

Although Europe is characterized by overall genetic homogeneity, in comparison to other biogeographical regions (Lao et al., 2008), Iberia, in particular, has been a recipient as well as a

MA

reservoir of human diversity. During the last Ice Age (18,000-80,000 ya), for example, the Peninsula became a human refugium as glaciers advanced, covering most of Europe

D

encapsulating human populations and genetic diversity in what is now Portugal and Spain

TE

(Richards et al., 2000; Torroni et al., 2000). When the ice sheet began to retreat, about 15,000 ya,

AC CE P

subsequent to the Last Glacial Maximum (LGM), this sanctuary was one of the sources that participated in the repopulation of Europe. Starting in the late Paleolithic, a number of archaeological sites begin to signal the impact of specific pre-historical cultures of diverse origins on the Iberian Peninsula. The most significant groups include the Tartessians (Koch, 2013), of unknown origins that appeared during the 9th century before the common era (BCE) (Marcos Garcia, 1987; Almagro-Gorbea, 2004), Proto-Indo Europeans from the steeps of Eastern Europe (Mallory et al., 1997) or Eastern Mediterranean Islands (Fernandez et al., 2014) early in the Neolithic (7,500-5,500 BCE), Iberians from the Eastern Mediterranean or North Africa in the 6th century BCE (Sanmartí, 2005) and Celts from Central Europe in about 450 BCE (Judice Gamito, 1994). In more recent historical times, Phoenicians from the Middle East, Greeks, Romans, Germanic tribes from

3

ACCEPTED MANUSCRIPT Central Europe, Vikings from Scandinavia, Arabs from the Persian Gulf region and Berbers from Northwest Africa as well as the Roma from India have contributed to the Iberian genetic make-

PT

up (de Hoz, 1982; Gieben, 1991). Several of the above mentioned migrations are clearly

RI

reflected in the collage of cultures, linguistic affinities, music and architecture that are on

SC

display in the various regions of Iberia today, yet it remains to be ascertained to what extent this extreme cultural diversity is in fact reflected in the gene pools of its extant populations. Gene

NU

flow from North Africa to Iberia has been in demonstrated in a number of previous publications (Rando et al., 1998; Pereira et al., 2005; Plaza et al., 2013; Santos et al., 2014) and higher

MA

frequency of some Y-chromosome North African lineages have been observed in northern Iberia

D

(Adams et al 2008).

TE

By the late 7th century CE, Arabian and Bedouin forces coming from the Arabian capital

AC CE P

of Damascus had reached the far west of North Africa (Maghreb). In 711 CE, a Berber-speaking army under Arabian suzerainty crossed over into the Iberian Peninsula and, within four years, had captured almost the entire Peninsula, with the exception of Asturias, the northern Basque country, Cantabria, Galicia and most of the Pyrenees in the north, which remained largely unoccupied. Arabian and Berber forces then remained in control of most of the Peninsula for more than five centuries, with a subsequent gradual withdrawal toward the southern region of Andalusia driven by the reconquest by Christian forces (La Reconquista) (Harvey, 2005). By the end of the 13th century, after almost 800 years of occupation, all Islamic political control ended in Iberia with the fall of the enclave of Granada late December 1492 (Domínguez Ortiz and Vincent, 1979). Although the Christian reconquest put an end to the Muslim control in Iberia, people of the Islamic faith were allowed to remain in what is now Spain and Portugal as part of the terms

4

ACCEPTED MANUSCRIPT of surrender at Granada in 1492. In fact, it was not until the mid 1520s that the Muslims were forced to convert to Christianity or were expelled from Iberia, and it was not until 114 years

PT

later, in 1614, that a final ultimatum was given to the last Muslims residents of al-Andalus (the

RI

Arabic name given to the region) (Barkai, 1984). Undoubtedly, during the 800 years of Islamic occupation, some degree of bidirectional gene flow between the two communities occurred as

SC

interreligious marriages and conversions took place. Further, during the century immediately

NU

following the fall of Granada, repatriation of Muslims, to the north of Spain and relocation of Christians to the vacant lands in Andalusia, in the south, provided for additional complexity to

MA

this major bidirectional exodus. Specific regions affected by this influx of Muslims that were forced to relocate out of al Andalus was the northwest corner of the Peninsula, mountainous and

TE

D

low in population density area, known today as Galicia (Harvey, 2005). It is thought that approximately half of the Muslim residents of al-Andalus selected to

AC CE P

convert (Jayyusi, 1994; O'Shea, 2006). Of those that converted, some became practicing Christians adopting new names. Others, the so called crypto-Muslims, were baptized and professed to be Christians but in fact continued practicing the old religion and/or cultural elements of Islam in secret (Stem, 1964; Guettat, 1980; Barkai, 1984; Jayyusi, 1994; Bahrami, 1995; Menocal et al., 2000). Considering the wide geographical extend of the occupation and duration of the Muslim dominion as well as the policy and magnitude of conversions, it is reasonable to expect that the Islamic dominion had a profound impact on the genetic make-up of Iberia. This contention is supported by a recent study of Botigué and colleagues (Botigué et al., 2013) that argues for most of the introgression from North Africa occurring during the past 300 years. Therefore, we hypothesize that the Islamic stay in Iberia affected the genetic constitution of populations in the Peninsula and that due to the policies of relocation of people after the fall

5

ACCEPTED MANUSCRIPT of Granada, the distribution of Arab and Berber markers are not uniformly distributed within the Iberian Peninsula.

PT

Previous studies have highlighted the genetic characteristics and similarities between the

RI

Middle East and northeastern Africa as well as the clear genetic differentiation among Northwest Africa and both Sub-Sahara Africa and Europe, including Iberia. Traditionally,

SC

studies accessing the role played by the Strait of Gibraltar tend to indicate that this short but

NU

treacherous 14 km stretch of water acted more like a barrier halting bidirectional migration (Cavalli-Sforza et al., 1994), yet more recent work suggest that high rates of gene flow has

MA

occurred since pre-Neolitic times (Currat et al., 2010).

Genetic studies indicate that the contemporary populations of Spain are not uniform. The

TE

D

Basque, for example, are characterized by several high frequency genetic markers including Y haplogroup R1b and its derivative R1b1b2 at 87.1% (highest in Western Europe)(Alonso et al.,

AC CE P

2005; Balaresque et al., 2010) and the blood groups’ Rh negative and O alleles, at 35% and 55% (Cavalli-Sforza et al., 1994; Capelli et al., 2009), respectively, but exhibits one of the lowest frequencies (2%) in Iberia of the Berber marker E1b1b1b1a-M81 on the Y chromosome (Flores et al., 2004). M81 originated in North Africa about 5,600 ya and it is thought to signal migrations connected with the Islamic dispersals (Cruciani et al., 2004) and possibly with the Roman and Carthaginian expansions. This marker was found by Flores and collaborators (2004) at highest frequencies in Malaga (11.5%), Galicia (10.5%) and Cantabria (8.6%) (Flores et al., 2004). Other investigators have reported commensurate levels of M81 in the provinces of Andalusia (Semino et al., 2004) and Catalonia (Adams et al., 2008), yet reduced levels or no M81 in other regions of Spain (Flores et al., 2004; Semino et al., 2004). E1b1b1c-M123 is another mutation exhibiting geographical partitioning within Iberia. It is thought to have originated in the region of the Near

6

ACCEPTED MANUSCRIPT East or Anatolia in the early Neolithic and its dispersal into Europe, the Levant and North Africa mirrors very closely the dissemination of farming during the Neolithic (Semino et al., 2004).

PT

Like M81, M123 is seen at its highest frequencies within Iberia in the extreme northwest

RI

province of Galicia (5.2%) (Adams et al., 2008), a level as high or higher than in Tunisia (5.2%) and Algeria (3.1%), respectively (Arredi et al., 2004). Much lower levels or no detection have

SC

been reported in various regions of central and northern Iberia (Flores et al., 2004; Adams et al.,

NU

2008). M123’s low frequencies in Western Europe, other than in Iberia, suggest that it may have penetrated the Peninsula from Northern Africa across the Mediterranean Sea and not by land,

MA

through continental Europe. Phoenician trading, which flourished from 1,200-300 BCE could have contributed to the distribution of this marker and its introduction into Iberia. E1b1b1a-M78

TE

D

is another Y chromosome marker that partitions non-uniformly within Iberia, with the province of Asturias, in the northwest of the Peninsula (just east of Galicia), exhibiting the highest

AC CE P

frequency (10.0%). Southern Spain and southern Portugal possess only 3.2% and 4.1%, respectively, of this marker (Cruciani et al., 2007). M78 is thought to have originated in Northeast Africa, specifically in what is now Egypt or Libya (Cruciani et al., 2007) about 17,000-20,000 ya. It is possible that M78 may represent a signature of the Phoenician dominion that spread across the Mediterranean. M78’s presence in Iberia dates back to at least the early Neolithic since 7,000 year old funeral remains of individuals carrying this mutation were discovered in a Catalonian cave in northeastern Spain (Lacan et al., 2011). The Y chromosome marker J1a2b-P58 with origin in the Middle East approximately 10,000 ya is associated with the expansion of Semitic herder-hunters into the Arabian Peninsula (Chiaroni et al., 2010). It has been reported that dispersals carrying different J1 subhaplogroups markers entered North Africa in historic times (Semino et al., 2004). Specifically, P58 is tied to

7

ACCEPTED MANUSCRIPT the Arabization of Northwest Africa (Ennafaa et al., 2011). J1 subhaplogroups, although highly abundant in Arabia, where it reaches levels of 40%-75%, in Iberia, it has only been detected in

PT

Andalusia in southern Spain and at minimal frequency of 1.1% (Semino et al., 2004).

RI

In terms of mtDNA, the North African-specific U6 haplogroup dated to about 50,000 ya has only been observed in the northwest of the Peninsula, with Galicia (2.2%) and northern

SC

Portugal (4.3%) exhibiting the highest frequencies (González et al., 2003). Based on this

NU

distribution and the higher diversity value (0.014 +/- 0.001) in this area compared to North Africa (0.006 +/- 0.001), it was suggested that historic events alone, such as the Muslim

MA

occupation, cannot be the sole cause of the U6’s presence in Iberia and that migrations dating from pre-Neolithic times are also responsible (González et al., 2003). The other maternal

TE

D

lineages that signal North African and Sub-Saharan gene flow belong to macro-haplogroup L. L1 in the Peninsula is attributed to a number of Middle Eastern and Arabic migrations into Northern

AC CE P

Africa driven by Phoenician and Arabian occupations, respectively (Cerezo et al., 2012). The highest frequencies of this haplogroup are seen in Cordoba in southern Spain at 8.30% (Casas et al., 2006; Hernandez et al., 2014) and Galicia at 3.70% (Achilli et al., 2007) while the lowest are observed in the Basque Country (0.64%) and Andalusia (1.75%) (Achilli et al., 2007). A recurrent and compelling theme reflected in many of the above-mentioned Y-specific and mtDNA studies is the relative higher frequencies of Arabian and Berber markers in the extreme northwest of Iberia, especially in Galicia, a region only very briefly occupied by the Islamic army, which retreated south in 739 A.D. In the present study, we focus on the Islamic dispersal into Iberia that left a profound impact in the sciences, arts, architecture, music and language. Our working hypothesis is that the Islamic occupation of Iberia also affected the genetic constitution of populations in the Peninsula

8

ACCEPTED MANUSCRIPT and that due to the relocation of Christian converts, the distribution of Arabic and Berber genetic markers are not uniformly distributed within Iberia. To test our hypothesis, we explore the

PT

genetic diversity of a number of key present-day populations from the Arabian Peninsula,

RI

northern Africa and Iberia, representing the putative regions transversed by migrations from

SC

Arabia to Iberia. We have assessed the genetic diversity of the J1a2b-P58 and E1b1b1b1a-M81 haplogroups and their derivative lineages at high resolutions and genotyped 17 Y-STR loci of

MA

NU

individuals belonging to informative SNP mutations.

Materials and Methods

D

Sample Collection

TE

A total of 1233 male individuals belonging to Tunisia (Sfax, n=56; Béja, n=72), Morocco

AC CE P

(Fes, n=108), Spain (Extremadura, n=63; Andalusia, n=167; Galicia, n=164), Qatar (n=72), Oman (n=118), Yemen (n=61), Egypt (n=148), UAE (n=163) and Bahrain (n=41) were analyzed for Y chromosome haplogroups E and J. Genealogical history was recorded from each donor for at least two generations in order to establish regional ancestry. Samples were procured with informed consent. DNA extractions from blood were performed using the standard phenolchloroform method, ethanol-precipitated as described previously and stored at −80 °C (Antunezde-Mayolo et al., 2002). The number of individuals and geographic location of each of the genotyped populations are provided in Table 1. Y-chromosome Haplotyping A total of eight Y-chromosome bi-allelic markers belonging to haplogroup J1a2b-P58, and four within E1b1b1b1a-M81 (Karafet et al., 2008) were hierarchically genotyped by standard

9

ACCEPTED MANUSCRIPT methods (Gayden et al., 2008). The phylogenetic relationship of these markers and the haplogroups that they define are included in gure 1. Haplogroup designation is in accordance

PT

with the International Society of Genetic Genealogy (2014).

RI

DNA samples under the background of Y-haplogroups J1a2b-P58 and E1b1b1b1a-

SC

M81were also typed for 17 Y-STR loci (DYS19, DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458,

NU

DYS635, GATA H4) using the AmpFlSTR® Yfiler kit (Applied Biosystems, Foster City, CA).

MA

PCR was performed as described by the manufacturer and the resulting amplicons were separated in an ABI Prism 3130xl Genetic Analyzer. The GeneMapper® software v 3.2 was

D

employed to determine fragment sizes and alleles were designated through comparisons to the

AC CE P

Data Analyses

TE

allelic ladder supplied by the manufacturer.

Median-joining (MJ) networks (Bandelt et al., 1999) based on the Y-STR profiles of individuals possessing haplogroups E1b1b1b1a-M81, J1a2b-P58 and their derivatives were constructed with the NETWORK 4.5.1.6 software package available at www.fluxusengineering.com. Networks were generated using the MJ algorithm, with the microsatellite loci weighted proportionally to the inverse of their variances. In order to reduce the Network complexity, all diagrams were post-processed using the Maximum Parsimony (MP) calculation. Genetic distances (using Rst) were calculated with the Arlequin program v3.5(Excoffier and Lischer 2010) and displayed in a Multidimensional Scaling (MDS) graph using the STATISTICA 8 package (http://www.statsoft.com). Genetic distances were estimated at the level of haplotypes based on the Y-STR loci.

10

ACCEPTED MANUSCRIPT All phylogenetic analyses were performed utilizing the 15 Y-STR loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439,

PT

DYS448, DYS456, DYS458, DYS635, GATA H4) in common among the collections listed in

RI

Table 1. DYS385 was excluded from the network, time estimates and haplotype diversity calculations because is not possible to discriminate between the DYS385a and DYS385b loci

SC

with the Y-filer kit. In addition, the size of the DYS389I allele was subtracted from the

NU

DYS389II for all analyses. See Supplementary Table 1 in the online edition for the 17 loci Y-

MA

STR haplotypes of samples derived for the above mentioned haplogroups. Average gene diversity (GD), based on 15 Y-STR loci, was computed according to Nei

D

(1987). The age of microsatellite variation within haplogroup was estimated as reported earlier

TE

by Zhivotovsky et al. (2004) and modified according to Sengupta et al. (2006) (Sengupta et al., 2006). Two Y-STR mutation rates were utilized; evolutionary (Zhivotovsky et al., 2004), 6.9 x

AC CE P

10-4 and genealogical (Goedbloed et al., 2009), 2.5 x 10-3 with generation times of 25 and 35 years, respectively. Due to the limitations and assumptions associated with the current calibrations of Y-STR mutation rates, (Zhivotovsky et al., 2004; Goedbloed et al., 2009; Ballantyne et al., 2010; Ravid-Amir and Rosset, 2010; Burgarella and Navascues 2011; Busby et al., 2012) the dates generated in this study should only be taken as relative estimates. As indicated by Ballantyne et al (2010) and Busby et al (2012), mutation rates differ considerable among Y-STR loci and, as a result, direct comparisons among studies employing different STR markers are difficult. In addition, time estimates based on Y-STR diversity are impacted by a number of factors that could inflate the values, including multiple migrations from different source populations, thus, the time estimates must be viewed as upper bounds. However, the relative values generated in the present investigation are useful for comparisons among the

11

ACCEPTED MANUSCRIPT populations examined in this study. Given that the diversity within a particular lineage should reflect its relative age, intra-haplogroup diversity (mean microsatellite variance: Vp) (Kayser et

PT

al., 2001) was also estimated across the 15 loci. Contour geographical representations of

RI

haplogroup frequencies and gene diversities were generated with the aid of the Surfer version 12 program (Golden software Inc., Colds Spring Harbor, NY, USA,

SC

http://www.goldensoftware.com) following the Kriging procedure (Relethford, 2008). Contour

NU

maps provide for the visualization of genetic distribution in the context of geographical maps. Contour outputs based on abundance and frequencies of genetic markers are plotted on the

MA

appropriate geographical maps. The exact locations of the frequencies and diversities on contours maps where data points are not available were generated by extrapolating from existing

AC CE P

geographic distances.

TE

D

data. These maps are particularly useful in representing clinal differences as a function of

Results

Distribution of Haplogroups J and E

We have examined a total of 1,233 individuals employing high-resolution SNP analysis involving four and eight biallelic markers within haplogroups E1b1b1b1a-M81 and J1a2b-P58, respectively. Figure 1 illustrates the phylogenetic relationships for both haplogroups and their frequencies in the genotyped populations from the Arabian Peninsula, North Africa and Spain. A total of 271 Y chromosomes were J1a2b-P58 and 136 belong to E1b1b1b1a-M81, representing a total of five J1a2b (J1a2b *, J1a2b1, J1a2b2*, J1a2b2a*, J1a2b2a1) and two E1b1b1b1a-M81 (E1b1b1b1a* and E1b1b1b1a1*) sub-haplogroups.

12

ACCEPTED MANUSCRIPT J1a2b2a*-L222.2 is the only J1a2b-lineage observed in the Northwest African (NA) populations of Tunisia and Morocco. In this region, this mutation displays frequencies of 25% in

PT

Sfax, 15% in Béja and 17% in Morocco. However, no individuals with this sub-haplogroup were

RI

found in Spain where only paragroup J1a2b2*-L147.1 is detected. The highest frequency of L222.2 is seen in Qatar (39%) followed by Oman (12%), Egypt and Bahrain (5%), and UAE and

SC

Yemen (2%). Conversely, the L147.1 mutation ranges from 59% in Yemen to 2% in Bahrain. It

NU

is interesting to note that J1a2b2a1-L65.2 is observed at low frequencies in Qatar (7%), Bahrain and Yemen (2%) but reaches 21% in UAE. Haplogroup J1a2b1-L92.1 was only detected in two

MA

populations and at low frequencies: UAE (4%) and Oman (3%). Undefined paragroup J1a2b*P58 is found at low frequencies in Yemen (6%), Bahrain (5%), Oman (2%) and Qatar (1%). The

TE

D

E1b1b1b1a-M81 haplogroup is absent in the Near East (NE: Qatar, Oman, Bahrain, UAE and Yemen) and Northeast Africa (Egypt) but observed in the Northwest African and Spanish

AC CE P

populations. It is noteworthy to indicate that while haplogroup E1b1b1b1a1*-M183 represents 100% of the E-M81 lineage in Béja (55%), Morocco (53%) and Sfax (27%), in the Spanish populations five samples were genotyped as undefined E1b1b1b1a*-M81 and 19 as E1b1b1b1a1*-M183, most of them in Galicia. Frequencies and diversities of the most informative haplogroups (J1-M267, J1a2b-P58, J1a2b2a-L222.2 and E1b1b1b1a-M81) together with previously published data from the literature were employed to generate contour maps (Figures 2a-d and Supplementary Figures 2ad). Haplogroup J1-M267 exhibits high densities (Figure 2a) in Yemen but the highest diversities are found in the Levant region (Figure 2b). Supplementary Figure 2a illustrates the levels of the J1a2b-P58 haplogroup which exhibits focal points of high frequencies in the south region of the Arabian Peninsula and in Tunisia. Haplogroup J1a2b2a-L222.2, on the other hand, is found at

13

ACCEPTED MANUSCRIPT high densities (Supplementary Figure 2c) and diversity (Supplementary Figure 2d) in Tunisia, yet in Egypt diversity is high compared with its low frequency (Supplementary Figures 2d and

PT

2c, respectively). Although certain populations from Tunisia exhibit low frequencies (Figure 2c)

RI

and diversity (Figure 2d) of haplogroup E1b1b1b1a-M81, others display high values. These differences in frequencies and diversities within such a small geographical area is indicative of

SC

genetic heterogeneity among these Tunisian populations, possibly reflecting various levels of

NU

Berber ancestry. In Lebanon and southern Spain, the diversity of this haplogroup is high, although their frequencies are comparatively low.

MA

Haplogroup J1a2b2a-P58 and derivatives Supplementary Table 1 provides the 17-loci Y-STR haplotypes for the 271 J1a2b-P58

TE

D

derived individuals. The highest microsatellite diversity within haplogroup J1a2b-P58 at the 15 Y-STR loci resolution is observed in Egypt (0.3539 ± 0.0599) and Oman (0.3408 ± 0.0655)

AC CE P

followed by Yemen (0.3164 ± 0.0698) and Spain (0.2873 ± 0.0705) (Table 1) (Figure 2d). It is noteworthy that despite of the fact that the frequency of P58 is low in southern Spain (Supplementary Figure 2a), its diversity is relatively high (Supplementary Figure 2b). Table 1 presents the age of Y-STR variation associated with each lineage for every one of the populations examined. The results indicate that for haplogroup J1a2b-P58, the age of the Y-STR variation of the four above mentioned populations (i.e., Egypt, Oman, Yemen and Spain) are comparable (4.73 ± 1.02 kya, 5.31 ± 1.18 kya, 3.82 ± 0.90 kya, and 4.75 ± 2.01, respectively) (Table 1) notwithstanding the fact that they are located geographically distant from each other. The phylogenetic relationships of individuals based on 15-loci Y STR haplotypes within the haplogroup J1-M267 is illustrated in the Median Joining network displayed in Supplementary Figure 3a. In Network analyses, the genetic relationships of individuals from populations are 14

ACCEPTED MANUSCRIPT examined. The test is employed to assess genetic relationships among single people and how they partition in relation to other samples from the same or different populations. The samples

PT

are represented in circles and color-coded according to the population they belong (each

RI

population takes a different color or shade) and the size of the circles reflect the number of

SC

persons exhibiting a given haplotype. Since the samples are color-coded according to

NU

populations, the distribution of samples within the network also provide information on the

MA

phylogenetic relationship among the groups. The smallest circles indicate singletons. The program generates lines connecting all the people creating a sort of network. These lines connect

D

individuals that are most closely related and the length of the connecting lines are directly

TE

proportional to the number of mutational steps separating them. Networks represent lineages of

AC CE P

sequentially related individuals with the samples at the terminals of each offshoot being the most genetically differentiated individuals. Networks that exhibit partitioning of individuals from specific populations into particular branches suggest limited gene flow among the populations or groups while a lack of compartmentalization is indicative of gene flow among the populations in the projection.

The 15-loci modal J1-M267 haplotype (DYS19*14, DYS389I*13, DYS389II*30, DYS390*23, DYS391*11, DYS392*11, DYS393*12, DYS437*14, DYS438*10, DYS439*11, DYS448*20, DYS456*14, DYS458*21, DYS635*21, GATA H4*11) is shared by 54 individuals belonging to the populations from Algeria, Egypt, Morocco, Qatar, UAE, Yemen and Tunisia (Sfax, Béja, Sousse, Cosmopolitans from Tunis and Andalusians from Zaghouan).

15

ACCEPTED MANUSCRIPT Although these populations occupy an extensive geographical expand, including the Arabian Peninsula, Northeast Africa and Northwest Africa, individuals with this modal haplotype were

PT

not observed in Spain. It is notable that the Berbers from Sened do not possess this modal haplotype, possibly a reflection of their unique non-Arabian ancestry. The Median Joining

RI

networks of J1a2b2 *-L147.1 and J1a2b2a*-L222.2 males are presented in Supplementary

SC

Figures 3b and 3c, respectively.

NU

In order to explore the genetic similarities among populations, an MDS analysis based on Rst distances from individuals belonging to the J1a2b-P58 lineage (15 Y-STR loci resolution)

MA

was performed (Supplementary Figure 4a). The MDS plot (stress value = 0.10740, R2 value = 0.94308) reveals a loosely associated arrangement of populations occupying the majority of the

TE

D

graph. The Galician collection is on the lower left portion of the graph by itself. In this projection, the second dimension illustrates divergent segregation between not only the two

AC CE P

Tunisian populations (Béja and Sfax) but also between both Spanish collections (Andalusia and Galicia), yet they group together in pairs in the first dimension. Similarly, MDS plots based on J1a2b2* -L147.1 (stress value = 0.00974, R2 value = 0.99950) (Supplementary Figure 4b) and J1a2b2a*-L222.2 (stress value = 0.06546, R2 value = 0.98942) (Supplementary Figure 4c) individuals illustrate scattered distributions of populations from Arabia, North Africa and Iberia with no obvious geographic partitioning. Coalescence time estimations based on Y-STR diversity as well as gene diversity values and haplotype variance associated with haplogroup J1a2b-P58 and derivatives are reported in Table 1. The oldest 15-loci Y-STR coalescence dates for the P58 mutation are observed in Oman (5.31 ± 1.18 kya), Egypt (4.73 ± 1.02 kya) and Spain (4.75 ± 2.01 kya). Interestingly, ages for both the J1a2b-P58 and J1a2b2*-L147.1 haplogroups are approximately equivalent in the

16

ACCEPTED MANUSCRIPT Spanish, Qatar, Oman, Yemen, Egypt and UAE collections (Table 1). Coalescence time estimate for sub-haplogroup J1a2ba*-L222.2 is older in Egypt (2.10 ± 0.72 kya) than in Qatar (1.51 ±

PT

0.68 kya), Oman (1.20 ± 0.56), Morocco (1.81 ± 0.49) and Tunisia (1.76 ± 0.42 kya). Frequency

RI

and variance contour maps of the most informative haplogroups, J1-M267 (Figures 2a and 2b, respectively), J1a2b-P58 (Supplementary Figures 2a and 2b, respectively) and J1a2b2a-L222.2

SC

(Supplementary Figures 2c and 2d, respectively), are provided.

NU

Haplogroup E1b1b1b1a-M81 and derivatives The haplotypes of 136 individuals belonging to the E1b1b1b1a-M81 haplogroup

MA

are available in Supplementary Table 1. Overall, 117 distinct haplotypes were identified at the 17 Y STR loci resolution, including 36 in Béja, 11 in Sfax, 46 in Morocco and 24 in Spain. No

TE

D

haplotypes are shared between the North African and Spanish populations. However, within the Northwest Africa populations, some haplotypes in common are noted, particularly between Béja

AC CE P

and Sfax, and between Béja and Morocco where six and three haplotypes are shared, respectively. This haplotype sharing (or lack of) is reflected in the genetic diversity estimates, the highest of which is observed in Spain (0.3428 ± 0.0493), particularly in the Andalusian population (0.4667 ± 0.0451) followed by Tunisia (0.2884 ± 0.0559) and Morocco (0.2227 ± 0.0457) (Table 1). All M81 Y-chromosomes in Tunisia and Morocco were also derived for the M183 marker. Conversely, five M81 chromosomes in Spain were undefined (Fig. 1). As noted for haplogroups J1-M267 (Figures 2a and 2b, respectively) and J1a2b-P58 (Supplementary Figures 2a and 2b, respectively) above, although the abundance of E1b1b1b1aM81 individuals in Iberia is minimal (Figure 2c), their diversity is non-commensurately high (Figure 2d). In fact, the diversity is higher in Iberia than in Northwest Africa. In contrast, both frequency and diversity are comparably high in Tunisia. It is likely that the high diversity values

17

ACCEPTED MANUSCRIPT for these three haplogroups in Iberia results from multiple migrations into the Peninsula at various times and/or from different places.

PT

The phylogenetic relationships based on Y STR haplotypes of all individuals contained in

RI

haplogroup E1b1b1b1a-M81 are illustrated in the Median Joining network displayed in

SC

Supplementary Figure 3d. The 10-loci modal E-M81 haplotype was assess as DYS19*13, DYS389I*14, DYS389II*30, DYS390*24, DYS391*9, DYS392*11, DYS393*13, DYS437*14,

NU

DYS438*10, DYS439*10 and is shared by 165 individuals from the Tunisian populations of

MA

Béja , Sfax, Sousse, Zaghouan and cosmopolitan from Tunis, Tunisian Berbers from CheniniDouiret, Jradou and Sened, Algeria, Tuareg from Libya, Morocco, Portugal, and the Spanish

D

populations of Andalusia, Aragon, Baleares, Castile, Catalonia, Galicia and Valencia. Given that

TE

all M81 Y-chromosomes in Tunisia and Morocco and almost all in Spain (with the exception of five M81 chromosomes) were also derived for the M183 mutation (haplogroup E1b1b1b1a1), a

AC CE P

MJ network based on the M183 individuals was generated at the 15-loci resolution and presented in Supplementary Figure 3e. The Morocco population shares 15-loci haplotypes with all the other populations: two haplotypes in common with Béja, two with Galicia, one with Sfax and one with Andalusia. The existence of extensive 15-loci haplotypes in common (Supplementary Figure 3e) signals a genetic continuity involving Northwest Africa and North/South Spain. As with J1a2b-P58 and derivatives, a lack of partitioning among the populations analyzed is exhibited in the MDS plots (15 Y-STR loci resolution) based on E1b1b1b1a-M81 (Supplementary Figure 4d) and E1b1b1b1a1-M183 (Supplementary Figure 4e) individuals. In both graphs, the populations segregate randomly in relation to geographically origin. The ages of microsatellite variation estimates using the method described in Zhivotovsky et al. (2004) as modified by Sengupta et al. (2006) for haplogroup E1b1b1b1a-M81 utilizing

18

ACCEPTED MANUSCRIPT evolutionary and genealogical mutation rates per population/area are provided in Table 1. Overall, the oldest expansion time is found in Spain (5.01 ± 1.24 kya), more specifically in

PT

Andalusia (7.07 ± 1.40 kya). It is possible that these values are inflated due to multiple distinct

RI

migrations into the Iberian Peninsula at different times. The second oldest dates were seen in Tunisia (Béja, 4.22 ± 1.45). Clines of M81 frequency and diversity are illustrated in contour

NU

SC

maps (Figure 2c and 2d, respectively).

MA

Discussion

TE

D

E-M81 and E-M183

AC CE P

Phylogeographic analysis of the E-M81 haplogroup and derivatives in Mediterranean populations has shown these lineages to be particularly frequent in Berber-speaking groups from Northwest Africa (100% in the Berber group from Chenini–Douiret and Jradou, Tunisia; 80% in Mozabite Berbers from Algeria; 76% in Saharawis from Western Sahara; 65–73% in Berbers from Morocco) (Arredi et al., 2004; Cruciani et al., 2004; Semino et al., 2004; Robino et al., 2008; Fadhlaoui-Zid et al., 2011) while their frequencies decline sharply towards the east (5% in North Egypt) (Cruciani et al., 2004) as Berber groups (Sengupta et al., 2006) dwindle. In Europe, E-M81 is mostly found in the Iberian Peninsula (Adams et al., 2008) with frequencies reaching 5% in Portugal (Beleza et al., 2006; Capelli et al., 2009) and 9% in Galicia, Spain (Flores et al., 2004; Adams et al., 2008), 10% in western Andalusia and Northwest Castile (Adams et al., 2008), and 13 % in Cantabria (north central Spain) (Capelli et al., 2009). E-M81 is also observed in Italy and France (Cruciani et al., 2004; Capelli et al., 2009). Within the E-

19

ACCEPTED MANUSCRIPT M81 cluster, the occurrence of the associated Maghrebin haplotype DYS19*13, DYS389I*14, DYS389II*30, DYS390*24, DYS391*9, DYS392*11, DYS393*13 is 42.6% (present in 58 out

PT

of 136 M81 Y-chromosomes from Northwest Africa and Iberia genotyped in the present study).

RI

This haplotype is present in Tunisia (21 chromosomes), Morocco (29 chromosomes) and Spain (1 Andalusian and 7 Galician samples) corroborating our haplogroup frequency data. The

SC

presence of this modal haplotype among the people of these territories indicates genetic

NU

affinities and continuity based on microsatallite markers as well. The presence of the M81 haplogroup in Iberia could be explained by the recent influx of

MA

Berber troops and migrants recruited by Arabians during the Islamic expansion into the Peninsula that started in 711 AD (Bosch et al., 2001; Adams et al., 2008; Capelli et al., 2009)

TE

D

and possibly older pre-Muslim migrations from Northwest Africa (Arnaiz-Villena et al., 1999; Maca-Meyer et al., 2003; Goncalves et al., 2005; Alvarez et al., 2009; Cerezo et al., 2012). If the

AC CE P

Islamic occupation of Iberia was indeed a major contributing factor to the presence of moderate levels of M81 in the Peninsula, how can we explained the higher levels of this mutation and the M183 derivative in Galicia (north west Spain) and Cantabria (north central Spain) compared to Andalusia (south Spain)? One possible explanation for the unexpected frequency distribution of these Berber markers could be the history of enforced relocations northward and westward of Moriscos (Muslims that converted to Christianity) following the War of Alpujarras (1567-1571) (Harvey 2005). Other potential contributing demographic events that could explain higher Berber signals in northern Spain were the earlier Muslims displacements northward that may have resulted in their gradual assimilation into main stream Spanish society during the 300 or so years of Reconquista, as Christian forces pushed south toward the Kingdom of Granada. Also, it is possible that subsequent migrations of Europeans into the south of Spain have diluted out

20

ACCEPTED MANUSCRIPT the signal, while this was not the case in the north of Spain. If any of these explanations are correct, and these frequencies represent historically recent gene flow events, we will expect that

PT

microsatellite variation within M81 in Iberia should be lower than in North Africa. In our North

RI

African populations, all E-M81 chromosomes also carry the most recent downstream M183 mutation. Since the age STR variation within M183 in the Spanish populations (3.26 ± 1.20

SC

kya) is comparable to that of the Northwest African populations of Tunisia (3.98 ± 1.18 kya)

NU

and Morocco (2.65 ± 0.81 kya), it is possible that Iberia received migrants from Northwest Africa from different source populations at different times, possibly prior to the expansion of

MA

Islam. For example, pre-Islamic gene flow from Northwest Africa into Iberia could have been driven, in historical times, by the well documented Phoenician and Roman commerce involving

TE

D

the two regions.

AC CE P

J1a2b-P58 and derivatives

The Fertile Crescent region has been considered the most probable place of origin ( Cinnioğlu et al., 2004) of haplogroup J and its two lineages, J1-M267 and J2-M172. J2-M172 is the most abundant and most widely distributed over Europe, especially along the Mediterranean basin (Semino et al., 2004) where it parallels the demic diffusion of Neolithic farmers (Underhill et al., 2001; Semino et al., 2004) or, more recently, the Phoenician and other historical, mainly maritime, expansions (Hammer et al., 2000; Di Giacomo et al., 2003; Zalloua et al., 2008). Previous studies on J1-M267 (Di Giacomo et al., 2003; Arredi et al., 2004; Luis et al., 2004; Semino et al., 2004; Cadenas et al., 2008; Zalloua et al., 2008) have found it to occur at high frequencies among Arabic speaking populations of the Middle East, conventionally interpreted as reflecting the spread of Islam during the first millennium CE (Nebel et al., 2002).

21

ACCEPTED MANUSCRIPT In addition, it has been proposed that the expansion of hunter-gatherers at the end of the late Pleistocene and the movements of foragers- herders after the mid-Holocene across Arabia and

PT

the Sahara were climate-driven events that contributed to the spread of this mutation (Tofanelli et

RI

al., 2009). The locale of highest J1* frequency along with the high YSTR variance of J1e (currently J1a2b-P58) have suggested to some that the genesis of J1 was eastern Anatolia

SC

(Chiaroni et al., 2010). J1e lineages might have been involved in episodes of diffusions of

NU

pastoralists into arid habitats coinciding with the spread of Semitic-speaking populations

MA

(Zalloua et al., 2008).

It is worth mentioning that the J1a2b-P58 ages in Oman and Egypt are about twice as old

D

(5.31 ± 1.18 kya and 4.73 ± 1.02 kya, respectively) as those obtained for Yemen, Qatar, UAE

TE

and Bahrain (Table 1). These estimates are in agreement with the ages of Semitic languages 5.75 kya (Kitchen et al., 2009). The similar variances of this clade in Egypt (0.354) and Oman (f) do

AC CE P

not allow us to discriminate between Northeast African and the Near East as the origin for this lineage. However, its frequency reaches significant higher frequencies in Yemen (69%) and Qatar (57%), which may reflect its genesis or isolation and genetic drift effects. The network analyses based on individuals under subhaplogroups J1-M267, J1a2b2*L147.1 and J1a2b2a*-L222.2 exhibit no substructure with samples from geographically distant regions ubiquitously distributed among branches. The L147.1 mutation range in time estimates from 5.06 ± 2.88 to 4.75 ± 2.01 in Iberia and 5.82 ± 1.83 to 3.11 ± 0.85 in Arabia while the L222.2 ages range from 2.03 ± 0.52 to 1.55 ± 0.50 in Northwest Africa and 2.10 ± 0.72 to 2.10 ± 0.56 in Arabia. Within each subhaplogroup involving populations encompassing wide geographical areas, the age estimates are comparable. The same random distribution within the networks is observed for the Berber E1b1b1b1a*-M81 and E1b1b1b1a1*-M183 subhaplogroups.

22

ACCEPTED MANUSCRIPT Similarly, the MDS plots for J1a2b-P58 and E1b1b1b1a*-M81 and their derivatives at the 10 and 15 loci resolution do not partition the populations according to their geographical location. This

PT

random distribution in the network and MDS analyses and comparable age estimates of distantly

RI

located populations may be indicative of dynamic movements of people, possibly driven by the cultural/religious amalgamation of Islam in the entire geographical expand of their dominion

SC

from Arabia to Iberia.

NU

J1a2b2a1-L65.2 is a recent mutation (1.12 ± 0.38 kya in UAE) under L222.2, exhibiting an interesting geographical distribution. It is likely that this mutation occurred after the Islamic

MA

dispersal into northern Africa started. In UAE, the frequency of L65.2 individuals is 20.86% (30 out of 163 samples). The abundance of L65.2 dramatically drops to 7.32%, 2.44% and 1.62% in

TE

D

the nearby territories of Qatar, Bahrain and Yemen, respectively. Apparently, this mutation originated within the small area currently occupied by UAE. It is remarkable that this mutation

AC CE P

reached such a high frequency in approximately 11,000 years, what amounts to about 50 generations. It is possible that the Islamic family structure often times made up of multiple wives may have facilitated the rapid spread of L65.2 chromosomes. Pre-Islamic migrations from Northwest Africa In addition to the comparable microsatellite variation within M81 in Iberia and Northwest Africa mentioned above, several other lines of evidence that suggest a pre-Islamic penetration of Arabian elements into Iberia include the recent age and unique distribution of the L222.2 mutation. J1a2b2a1-L222.2 is limited to the Arabian Peninsula and North Africa with its highest frequencies in the Sfax population of Tunisia (25.0%), Beja, Tunisia (15.3%) and Morocco (16.7%), all in Northwest Africa. This mutation was not detected in the Iberian Peninsula in the present study. The genealogical ages for this mutation range from 2.03 ± 0.52

23

ACCEPTED MANUSCRIPT in the population of Beja, Tunisia to 1.20 ± 0.56 in Oman, Arabia. Although these dates represent approximations, they are indicative of a recent mutational event possibly just prior to

PT

the Muslim conquest of northern Africa and the Islamic incursion into Iberia. The higher

RI

frequencies of L222.2 in Qatar (39.0%) may reflect its genesis in the Arabian region and a subsequent westward clinal dispersal. Therefore, just like the comparable STR diversity under

SC

the M81 mutation in Iberia and North Africa may be indicative of multiple, possibly pre-Islamic

NU

migration of North Africans into Iberia, the absence of L222.2 in Spain is congruent with gene flow across the Strait of Gibraltar prior to the L222.2 mutation or genetic drift that deleted the

MA

mutation.

Furthermore, the presence of the J1a2b2*-L147.1 subhaplogroup within Iberia (time

TE

D

estimates of 4.75 ± 2.01 kya and 5.06 ± 2.88 kya for the entire territory of Spain and Galicia, respectively) and Arabia (ranging from 3.11 ± 0.85 to 5.82 ± 1.83), at comparable and relatively

AC CE P

old ages, may suggest that the dispersals that carried this marker into Iberia may have predated the Islamic invasion of 711 A.D. The higher gene diversity values for J1a2b2*-L147.1 in Egypt (0.3539 ± 0.0599), may be indicative of a Northeast African origin for the M147.1 mutation. In Iberia, the M81 mutation has an age of 7.0 ± 1.40 in Andalusia, 3.26 ± 1.46 in Galicia and 5.01± 1.24 in Spain in general. The time estimates for M183 are younger ranging from 2.61 ± 0.88 kya to 3.34 ± 1.47 kya for the same three regions. While paragroup M81* is found only in Iberia, M183 is detected in both Northwest Africa and Iberia. Both markers pre-date the Muslim movement into Europe. A plausible explanation for these dichotomies in ages and geographic distributions of M81 and M183 is the occurrence of at least two different Berber migrations into Iberia from Northwest Africa. Along these lines, it is interesting to note that the diversity levels for the M81, J1 and J1a2b mutations are comparatively higher than their

24

ACCEPTED MANUSCRIPT frequency values in Iberia when compared to North Africa and Arabia (Figures 2a-d and Figures 2g-h). In fact, the diversity levels for these three mutations in Iberia are higher than in certain

PT

areas of Northwestern Africa (i.e., Morocco, Algiers and Tunisia). These haplogroups did not

RI

originate in Iberia and genetic drift events are not likely responsible for the inferior frequencies relative to the diversity values seen in the Iberian Peninsula. Therefore, the most parsimonious

SC

explanation is a number of migrations from Northwest Africa at various times and possibly from

NU

different sources. In this type of scenario, genetic diversity could be elevated relative to frequency of individuals. This explanation is congruent and corroborates the relative time

MA

estimates and geographic distribution data for the M81, M183, J1, J1a2b and L222.2 mutations discussed above.

D

It is noteworthy that two undefined subhaplogroups with very similar time estimates,

TE

E1b1b1b1a*-M81 and J1a2b2*-L147.1, lineages usually associated with Berbers and Arabian

AC CE P

populations, respectively, are absent in Northwest Africa and present in Iberia. In the case of J1a2b2*-L147.1, the mutation originated in the Arabian Peninsula approximately 5,600 ya where it is still as high as 59.02% in Yemen only to disappear in Northwest Africa and then reappear in Iberia. E1b1b1b1a*- M81 is a 5,750 year old Berber marker from Northwest Africa. It is possible that the disappearance of these two recent subhaplogroups from Northwest Africa may be the result of regional depopulation leading to a bottleneck event and the loss of M-81 and L147.1 chromosomes. In addition to our data, a number of previous studies based on different marker systems, including contemporary and ancient DNA, suggest pre-Islamic gene flow from Northwest Africa to Iberia (Arnaiz-Villena et al., 1999; Maca-Meyer et al., 2003; Goncalves et al., 2005; Alvarez et al., 2009; Currat et al., 2010; Lacan et al., 2011). Also, archeological evidence point

25

ACCEPTED MANUSCRIPT to early (Mesolithic to Neolithic) maritime colonization of Iberia by farmers (Zilhão, 2001). In more recent historical times, the Mediterranean was a highly contested region and the venue for

PT

commerce for a number of ancient empires. Specifically, the Western Mediterranean was

RI

colonized by Phoencians, Carthageneans, Greeks and Romans at time periods that predate the Muslim invasion of Iberia. The Romans, for example, were notorious for incorporating

SC

personnel from all over their dominion for their armies. These empires controlled extensive

NU

regions of North Africa, the Near East and Iberia. It is likely that the occupation and commerce among all of these regions contributed to gene flow from the Levant and Northwest Africa into

AC CE P

TE

D

MA

Iberia.

26

ACCEPTED MANUSCRIPT

PT

Conclusion The presence of the E1b1b1b1a-M81 and E1b1b1b1a1-M183 mutations in Spain

RI

represent signatures of Berber gene flow from Northwestern Africa. Similarly, haplogroup

SC

J1a2b-P58 and its derivatives represent genetic signals from the Arabian Peninsula. Our data,

NU

based on these haplogroups, as well as 15 Y-STR loci under them, are compatible with multiple migrations from Northwest Africa including the Islamic occupation of Spain that started in 711

MA

A.D. Also, it is interesting that the frequencies of both Arabic and Berber markers are higher at the extreme northwest of the Iberian Peninsula and not, as may be expected, in Andalusia in

D

southern Spain, the last stronghold of Islam in Western Europe. We propose that the relocation

TE

of converts during the reconquest, the migration of cryptic Muslims to safe locations and/or pre-

AC CE P

Islamic movements to the north of Iberia may explain the higher frequencies and older times estimates in the Northwest of the Peninsula.

Acknowledgments

The authors express their appreciation to the DNA donors who made this study possible.

27

ACCEPTED MANUSCRIPT References

AC CE P

TE

D

MA

NU

SC

RI

PT

Achilli A, Olivieri A, Pala M, Metspalu E, Fornarino S, Battaglia V, Accetturo M, Kutuev I, Khusnutdinova E, Pennarun E et al. 2007. Mitochondrial DNA variation of modern Tuscans supports the near eastern origin of Etruscans. Am J Hum Genet 80(4): 759-768. Adams SM, Bosch E, Balaresque PL, Ballereau SJ, Lee AC, Arroyo E, Lopez-Parra AM, Aler M, Grifo MS, Brion M et al. 2008. The genetic legacy of religious diversity and intolerance: paternal lineages of Christians, Jews, and Muslims in the Iberian Peninsula. Am J Hum Genet 83(6): 725-736. Almagro-Gorbea M. 2004. Inscripciones y grafitos tartesicos de la necropolis orientalizante de Medellin. Palaeohispanica 4: 13-44. Alonso S, Flores C, Cabrera V, Alonso A, Martín P, Albarrán C, Izagirre N, de la Rúa C, García O. 2005. The place of the Basques in the European Y-chromosome diversity landscape. Eur J Hum Genet 13: 1293-1302. Alvarez L, Santos C, Montiel R, Caeiro B, Baali A, Dugoujona JM, Aluja MP. 2009. Ychromosome variation in South Iberia: insights into the North African contribution. Am J Hum Bio 21: 407-409. Antunez-de-Mayolo G, Antunez-de-Mayolo A, Antunez-de-Mayolo P, Papiha SS, Hammer M, Yunis JJ, Yunis EJ, Damodaran C, Martinez de Pancorbo M, Caeiro JL et al. 2002. Phylogenetics of worldwide human populations as determined by polymorphic Alu insertions. Electrophoresis 23: 3346-3356. Arnaiz-Villena A, Martinez-Laso J, Alonso-Garcia J. 1999. Iberia: population genetics, anthropology, and linguistics. Hum Biol 71(5): 725-743. Arredi B, Poloni ES, Paracchini S, Zerjal T, Fathallah DM, Makrelouf M, Pascali VL, Novelletto A, Tyler-Smith C. 2004. A predominantly neolithic origin for Y-chromosomal DNA variation in North Africa. Am J Hum Genet 75(2): 338-345. Bahrami B. 1995. The Persistence of Andalusian Identity in Rabbat, Morocco. In Department of Antropology. University Pennsylvania. Balaresque P, Bowden GR, Adams SM, Leung HY, King TE, Rosser ZH, Goodwin J, Moisan JP, Richard C, Millward A et al. 2010. A predominantly neolithic origin for European paternal lineages. PLoS Biol 8(1): e1000285. Ballantyne KN, Goedbloed M, Fang R, Schaap O, Lao O, Wollstein A, Choi Y, van Duijn K, Vermeulen M, Brauer S et al. 2010. Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications. Am J Hum Genet 87(3): 341353. Bandelt HJ, Forster P, Rohl A. 1999. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16(1): 37-48. Barkai R. 1984. Cristianos y musulmanes en la España medieval : el enemigo en el espejo. Rialp, Madrid. Beleza S, Gusmao L, Lopes A, Alves C, Gomes I, Giouzeli M, Calafell F, Carracedo A, Amorim A. 2006. Micro-phylogeographic and demographic history of Portuguese male lineages. Ann Hum Genet 70(2): 181-194. Bosch E, Calafell F, Comas D, Oefner PJ, Underhill PA, Bertranpetit J. 2001. High-resolution analysis of human Y-chromosome variation shows a sharp discontinuity and limited gene

28

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

NU

SC

RI

PT

flow between northwestern Africa and the Iberian Peninsula. Am J Hum Genet 68(4): 1019-1029. Botigue LR, Henn BM, Gravel S, Maples BK, Gignoux CR, Corona E, Atzmon G, Burns E, Ostrer H, Flores C et al. 2013. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc Natl Acad Sci USA 110(29): 1179111796. Burgarella C, Navascues M. 2011. Mutation rate estimates for 110 Y-chromosome STRs combining population and father-son pair data. Eur J Hum Genet 19(1): 70-75. Busby GB, Brisighelli F, Sanchez-Diz P, Ramos-Luis E, Martinez-Cadenas C, Thomas MG, Bradley DG, Gusmao L, Winney B, Bodmer W et al. 2012. The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269. Proc Biol Sci 279: 884-892. Cadenas AM, Zhivotovsky LA, Cavalli-Sforza LL, Underhill PA, Herrera RJ. 2008. Ychromosome diversity characterizes the Gulf of Oman. Eur J Hum Genet 16(3): 374-386. Capelli C, Onofri V, Brisighelli F, Boschi I, Scarnicci F, Masullo M, Ferri G, Tofanelli S, Tagliabracci A, Gusmao L et al. 2009. Moors and Saracens in Europe: estimating the medieval North African male legacy in southern Europe. Eur J Hum Genet 17(6): 848852. Casas MJ, Hagelberg E, Fregel R, Larruga JM, Gonzalez AM. 2006. Human mitochondrial DNA diversity in an archaeological site in al-Andalus: genetic impact of migrations from North Africa in medieval Spain. Am J Phys Anthropol 131(4): 539-551. Cavalli-Sforza LL, Menozzi P, . PA. 1994. The History and Geography of Human Genes. Princeton University Press, Princeton. Cerezo M, Achilli A, Olivieri A, Perego UA, Gómez-Carballa A, Brisighelli F, Lancioni H, Woodward SR, López-Soto M, Carracedo A et al. 2012. Reconstructing ancient mitochondrial DNA links between Africa and Europe. Genome Res 22: 821-826. Chiaroni J, King RJ, Myres NM, Henn BM, Ducourneau A, Mitchell MJ, Boetsch G, Sheikha I, Lin A, Nik-Ahd M et al. 2010. The emergence of Y-chromosome haplogroup J1e among Arabic-speaking populations. Eur J Hum Genet 18: 348-353. Cinnioğlu C, King R, Kivisild T, Kalfoglu E, Atasoy S, Cavalleri GL, Lillie AS, Roseman CC, Lin AA, Prince K et al. 2004. Excavating Y-chromosome haplotype strata in Anatolia. Human genetics 114(2): 127-148. Cruciani F, La Fratta R, Santolamazza P, Sellitto D, Pascone R, Moral P, Watson E, Guida V, Colomb EB, Zaharova B et al. 2004. Phylogeographic analysis of haplogroup E3b (EM215) Y chromosomes reveals multiple migratory events within and out of Africa. Am J Hum Genet 74: 1014-1022. Cruciani F, La Fratta R, Trombetta B, Santolamazza P, Sellitto D, Colomb EB, Dugoujon JM, Crivellaro F, Benincasa T, Pascone R et al. 2007. Tracing past human male movements in northern/eastern Africa and western Eurasia: new clues from Y-chromosomal haplogroups E-M78 and J-M12. Mol Biol Evol 24(6): 1300-1311. Currat M, Poloni ES, Sanchez-Mazas A. 2010. Human genetic differentiation across the Strait of Gibraltar. BMC evolutionary biology 10: 237-255. de Hoz J. 1982. Crónica de linguística y epigrafia de la Península Ibérica. Zephyrus 34(5): 295311. Di Giacomo F, Luca F, Anagnou N, Ciavarella G, Corbo RM, Cresta M, Cucci F, Di Stasi L, Agostiano V, Giparaki M et al. 2003. Clinal patterns of human Y chromosomal diversity

29

ACCEPTED MANUSCRIPT

NU

SC

RI

PT

in continental Italy and Greece are dominated by drift and founder effects. Mol Biol Evol 28(3): 387-395. Domínguez Ortiz A, Vincent B. 1979. Historia de los moriscos : vida y tragedia de una minoría. Revista de Occidente, Madrid. Ennafaa H, Fregel R, Khodjet-El-Khil H, Gonzalez AM, Mahmoudi HA, Cabrera VM, Larruga JM, Benammar-Elgaaied A. 2011. Mitochondrial DNA and Y-chromosome microstructure in Tunisia. J Hum Genet 56(10): 734-741. Fadhlaoui-Zid K, Martinez-Cruz B, Khodjet-el-khil H, Mendizabal I, Benammar-Elgaaied A, Comas D. 2011. Genetic structure of Tunisian ethnic groups revealed by paternal lineages. Am J Phys Anthropol 146(2): 271-280. Fernández E, Pérez-Pérez A, Gamba C, Prats E, Cuesta P, Anfruns J, Molist M, Arroyo-Pardo E, and Turbón D. 2014. Ancient DNA Analysis of 8000 B.C. Near Eastern Farmers Supports an Early Neolithic Pioneer Maritime Colonization of Mainland Europe through Cyprus and the Aegean Islands. PLoS Genet 10(6):e1004401.

AC CE P

TE

D

MA

Flores C, Maca-Meyer N, Gonzalez AM, Oefner PJ, Shen P, Perez JA, Rojas A, Larruga JM, Underhill PA. 2004. Reduced genetic structure of the Iberian peninsula revealed by Ychromosome analysis: implications for population demography. Eur J Hum Genet 12(10): 855-863. Gayden T, Regueiro M, Martinez L, Cadenas AM, Herrera RJ. 2008. Human Y-chromosome haplotyping by allele-specific polymerase chain reaction. Electrophoresis 29(11): 24192423. Gieben JC. 1991. Greeks and Phoenicians in southwest Iberia - Who were the first? Aspects of archaeological and epigraphic evidence. Fossey JM (ed): Proceedings of the First International Congress of the Hellenic Diaspora: from Antiquity to Modern Times Amsterdam: 81-101. Goedbloed M, Vermeulen M, Fang RN, Lembring M, Wollstein A, Ballantyne K, Lao O, Brauer S, Kruger C, Roewer L et al. 2009. Comprehensive mutation analysis of 17 Ychromosomal short tandem repeat polymorphisms included in the AmpFlSTR Yfiler PCR amplification kit. Int J Legal Med 123(6): 471-482. Goncalves R, Freitas A, Branco M, Rosa A, Fernandes AT, Zhivotovsky LA, Underhill PA, Kivisild T, Brehm A. 2005. Y-chromosome lineages from Portugal, Madeira and Acores record elements of Sephardim and Berber ancestry. Annals of human genetics 69(4): 443454. González AM, Brehm A, Pérez JA, Maca-Meyer N, Flores C, Cabrera VM. 2003. Mitochondrial DNA affinities at the Atlantic fringe of Europe. Am J Phys Anthropol 120: 391-404. Guettat M. 1980. La Musique classique du Maghreb. Sindbad Press, Paris. Hammer MF, Redd AJ, Wood ET, Bonner MR, Jarjanazi H, Karafet TS, Santachiara-Benerecetti S, Oppenheim A, Jobling MA, Jenkins T et al. 2000. Jewish and Middle Eastern nonJewish populations share a common pool of Y-chromosome biallelic haplotype. Proc Nat Acad Sci USA 97: 6769-6774. Harvey LP. 2005. Muslims in Spain, 1500 to 1614. University of Chicago Press, Chicago. Hernández CL, Reales G, Dugoujon J-M, Novelletto A, Rodríguez JN, Cuesta P, and Calderón R. 2014. Human maternal heritage in Andalusia (Spain): its composition reveals high internal complexity and distinctive influences of mtDNA haplogroups U6 and L in the western and eastern side of region. BMC genetics 15(1):11. 30

ACCEPTED MANUSCRIPT

NU

SC

RI

PT

Jayyusi SK. 1994. The legacy of Muslim Spain. Brill, Leiden ; Boston. Judice Gamito T. 1994. Les Celtes et le Portugal. Aqvitania 12: 415-430. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF. 2008. New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res 18(5): 830-838. Kayser M, Krawczak M, Excoffier L, Dieltjes P, Corach D, Pascali V, Gehrig C, Bernini LF, Jespersen J, Bakker E et al. 2001. An extensive analysis of Y-chromosomal microsatellite haplotypes in globally dispersed human populations. Am J Hum Genet 68(4): 990-1018. Kitchen A, Ehret C, Assefa S, Mulligan CJ. 2009. Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East. Proc Biol Sci 276(1668): 2703-2710. Koch JT. 2013. Paradigm Shift? Interpreting Tartessian as Celtic. In: Cunliffe B, and Koch JT, editors. Celtic from the West: Alternative Perspectives from Archaeology, Genetics, Language and Literature. Oxford, UK: Oxbow Books. p 185-303

MA

Lacan M, Keyser C, Ricaut FX, Brucato N, Tarrús J, Bosch A, Guilaine J, Crubézy E, Ludes B. 2011. Ancient DNA suggests the leading role played by men in the Neolithic dissemination. Proc Natl Acad Sci USA 108: 18255-18259.

AC CE P

TE

D

Lao O, Lu TT, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Balascakova M, Bertranpetit J, Bindoff LA, Comas D et al. 2008. Correlation between genetic and geographic structure in Europe. Current Biology 18: 1-8. Luis JR, Rowold DJ, Regueiro M, Caeiro B, Cinnioğlu C, Roseman C, Underhill PA, CavalliSforza LL, Herrera RJ. 2004. The Levant versus the Horn of Africa: evidence for bidirectional corridors of human migrations. Am J Hum Genet 74(3): 532-544. Maca-Meyer NAM, Gonzalez AM, Pestano J, Flores C, Larruga JM, Cabrera VM. 2003. Mitochondrial DNA transit between West Asia and North Africa inferred from U6 phylogeography. BMC Genet 4: 15-25. Mallory JP, Blench R, Spriggs M. 1997. The homelands of the Indo-Europeans. Archaeology and Language. 1: 93-121. Marcos Garcia M. 1987. El signario tartessico. IV coloque lengua y cultura prerromana. Veleia 2(3): 275-284. Menocal MR, Scheindlin RP, Sells MA. 2000. The literature of Al-Andalus. In Cambridge history of Arabic literature, pp. 1 online resource (ix, 507 p.). Cambridge University Press, New York. Nebel A, Filon D, Brinkmann B, Majumder PP, Faerman M, Oppenheim A. 2002. The Y chromosome pool of Jews as part of the genetic landscape of the Middle East. Am J Hum Genet 69: 1095-1012. Nei M. 1987. Molecular evolutionary genetics. Columbia University Press: New York. O'Shea S. 2006. Sea of faith : Islam and Christianity in the medieval Mediterranean world. Walker : Distributed to the trade by Holtzbrinck Publishers, New York. Plaza S, Calafell F, Helal A, Bouzerna N, Lefranc G, et al. 2003. Joining the Pillars of Hercules: mtDNA sequences show multidirectional gene flow in the western Mediterranean. Ann Hum Genet 67: 312–328. Pereira L, Cunha C, Alves C, Amorim A. 2005. African female heritage in Iberia: a reassessment of mtDNA lineage distribution in present times. Hum Biol 77:213–229.

31

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

NU

SC

RI

PT

Rando JC, Pinto F, Gonzalez AM, et al. 1998. Mitochondrial DNA analysis of northwest African populations reveals genetic exchnges with Eiropean, near east and sub-Saharan populations. Ann Hum Genet 62:531-550. Ravid-Amir O, Rosset S. 2010. Maximum likelihood estimation of locus-specific mutation rates in Y-chromosome short tandem repeats. Bioinformatics 26(18): i440-445. Relethford JH. 2008. Geostatistics and spatial analysis in biological anthropology. Am J Phys Anthropol 136(1): 1-10. Richards M, Macaulay V, Hickey E, Vega E, Sykes B, Guida V, Rengo C, Sellitto D, Cruciani F, Kivisild T et al. 2000. Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet 67: 1251–1276. Robino C, Crobu F, Di Gaetano C, Bekada A, Benhamamouch S, Cerutti N, Piazza A, Inturri S, Torre C. 2008. Analysis of Y-chromosomal SNP haplogroups and STR haplotypes in an Algerian population sample. Int J Legal Med 122(3): 251-255. Sanmartí J. 2005. La conformación del mundo ibérico septentrional. Palaeohispanica 5: 333-358. Santos C, Fregel R, Cabrera VM, Álvarez L, Larruga J, Amanda A, Miguel, A. López; María Pilar Aluja, González A. 2014. Mitochondrial DNA and Y-chromosome structure at the mediterranean and Atlantic façades of the Iberian Peninsula. American Journal of Human Biology 26:130-141. Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, Battaglia V, Maccioni L, Triantaphyllidis C, Shen P, Oefner PJ et al. 2004. Origin, diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences on the neolithization of Europe and later migratory events in the Mediterranean area. Am J Hum Genet 74(5): 1023-1034. Sengupta S, Zhivotovsky A, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A et al. 2006. Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 78: 202-221. Stem SM. 1964. Les chansons mozarabes: les vers finaux (kharjas) en espagnol dans les muwashashahs arabes et hebreux. Oxford University Press Oxford. Tofanelli S, Ferri G, Bulayeva K, Caciagli L, Onofri V, Taglioli L, Bulayev O, Boschi I, Alu M, Berti A et al. 2009. J1-M267 Y lineage marks climate-driven pre-historical human displacements. Eur J Hum Genet 17(11): 1520-1524. Torroni A, Richards M, Macaulay V, Forster P, Villems R, Nørby S, Savontaus ML, Huoponen K, Scozzari R, Bandelt HJ. 2000. mtDNA haplogroups and frequency patterns in Europe. Am J Hum Genet 66: 1173-1177. Underhill PA, Passarino G, Lin AA, Shen P, Mirazon Lahr M, Foley RA, Oefner PJ, CavalliSforza LL. 2001. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Annals of human genetics 65(1): 43-62. Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, Haber M, Xue Y, Izaabel H, Bosch E, Adams SM et al. 2008. Identifying genetic traces of historical expansions: Phoenician footprints in the Mediterranean. Am J Hum Genet 83(5): 633-642. Zhivotovsky LA, Underhill PA, Cinnioğlu C, Kayser M, Morar B, Kivisild T, Scozzari R, Cruciani F, Destro-Bisol G, Spedini G et al. 2004. The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet 74: 50-61.

32

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

NU

SC

RI

PT

Zilhão J. 2001. Radiocarbon evidence for maritime pioneer colonization at the origins of farming in west Mediterranean Europe. Proc Natl Acad Sci USA 98: 14180-14185.

33

ACCEPTED MANUSCRIPT

SC

RI

PT

Table 1. Populations analyzed, Gene Diversity, Mean Variance (Vp) and Time estimates (TE) using Zivotovsky et al. (2004) method. Times are in kya ± sd

a

Dates generated from evolutionary mutation rates as in Zivotovsky et al. (2004) Dates generated from genealogical mutation rates as in Goedbloed et al. (2009)

NU

b

MA

15 Y-STR Loci used, DYS 385a/b not included in calculation

AC CE P

TE

D

*M267 derived but not typed for P58

34

Figure 1

AC CE P

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

35

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

Figure 2a

36

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

Figure 2b

37

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

Figure 2c

38

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

Figure 2d

39

ACCEPTED MANUSCRIPT

RSQ = 0.94308

AC CE P

Stress = 0.10740

TE

D

MA

NU

SC

RI

PT

Figure 3a. MDS HG J-P58

40

ACCEPTED MANUSCRIPT

TE

RSQ = 0.99950

AC CE P

Stress = 0.00974

D

MA

NU

SC

RI

PT

Figure 3b. MDS HG J-L147

41

ACCEPTED MANUSCRIPT

RSQ = 0.98942

AC CE P

Stress = 0.06546

TE

D

MA

NU

SC

RI

PT

Figure 3c. MDS HG J-L222

42

ACCEPTED MANUSCRIPT

TE

RSQ = 0.99915

AC CE P

Stress = 0.00833

D

MA

NU

SC

RI

PT

Figure 3d. MDS HG E-M81

43

ACCEPTED MANUSCRIPT

RSQ = 0.99915

AC CE P

TE

Stress = 0.00833

D

MA

NU

SC

RI

PT

Figure 3e. MDS HG E-M183

44

ACCEPTED MANUSCRIPT Abbreviation List

PT

STR: Short Tandem Repeat YA: Years Ago

RI

LGM: Left Glacial Maximum

SC

BCE: Before Current Era mtDNA: Mitochondrial DNA

NU

AD: After Death

MA

SNP: Single Nucleotide Polymorphism MJ: Median Joining MDS: Multi-Dimensional Scaling

TE

AC CE P

GD: Gene Diversity

D

MP: Maximum Parsimony

45

ACCEPTED MANUSCRIPT Higlights: The M81 and M183 mutations in Spain represent signatures of Berber gene flow.



The P58 mutation and its derivatives represent genetic signals from Arabian.



Our data are compatible with multiple migrations from Northwest Africa including the Islamic

PT



Frequencies of both Arabic and Berber markers are higher at the extreme Northwest of Iberian

SC



RI

occupation.

compared to the South of Spain.

TE

D

MA

frequencies compared to South Iberia.

NU

Relocation of converts and/or pre-Islamic dispersals to North Iberia may explain the higher

AC CE P



46

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.