Copyright © 2001 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 68, Issue 3, 723-737, 1 March 2001
doi:10.1086/318785
Agnar Helgason1, 3,
,
, Eileen Hickey2, Sara Goodacre2, Vidar Bosnes4, Kári Stefánsson3, Ryk Ward1 and Bryan Sykes2
1 Institute of Biological Anthropology, University of Oxford, Oxford
2 Institute of Molecular Medicine, University of Oxford, Oxford
3 Institute of deCODE Genetics Inc., Reykjavik
4 Department of Immunology and Transfusion Medicine, Ullevål Hospital, Oslo
Address for correspondence and reprints: Agnar Helgason, deCODE Genetics, Lynghals 1, 101 Reykjavik, IcelandThe period from the late 8th century through the early 11th century is commonly referred to in European history as the Viking Age. Raiding, trading, and settling, the Vikings expanded east, south, and west from Scandinavia—invading the Baltic region and Russia from Sweden; invading England, France, and as far south as Spain from Denmark; and invading Ireland, north England, Scotland, and the North Atlantic islands from Norway (Jones Jones, 1984; Collins Collins, 1991). In the westward spread, around 780 a.d., Shetland and Orkney were the first of the North Atlantic islands to be colonized by the emerging Vikings (Davies Davies, 1999), but, following the Scottish coastline, the Vikings soon reached the Western Isles. By 800 a.d., the Vikings had reached the Faroe Islands, and, some 60 years later, Iceland was discovered. Whereas Orkney and the Western Isles are known to have had thriving pre-Norse settlements of Picts and Gaels, respectively, Shetland was less densely populated, and the Faroes and Iceland are both held to have been largely uninhabited at the time of the Norse settlements (Berry and Muir Berry and Muir, 1975; Davies Davies, 1999).
Iceland was the last of the islands to be settled in this rapid expansion of Norse peoples, and historical sources clearly indicate that this episode of colonization in the North Atlantic involved the greatest movement of people (Rafnson Rafnson, 1999). Iceland is still inhabited by Norse-speaking descendants of the first settlers, some of whom are thought to have originated from the British Isles. The legacy of the Viking period is less clear for Orkney, the Western Isles, and the Isle of Skye. Archaeological, linguistic, and historical evidence all indicate that Viking activities had the most far-reaching effect in Orkney, where the indigenous Pictish population may have been entirely replaced by Norse settlers (Graham-Campbell and Batey Graham-Campbell and Batey, 1998). The apparent dominance of the Norse language and material culture in Orkney and Shetland contrasts with evidence from the Western Isles, where the coexistence of Vikings and the indigenous Gaelic population led to general bilingualism and a greater level of cultural amalgam (Graham-Campbell and Batey Graham-Campbell and Batey, 1998). Scotland regained control over the Western Isles after the battle of Largs in 1263, whereas Orkney and Shetland remained under Norse control until 1469 (Boyce et al. Boyce et al., 1973; Berry and Muir Berry and Muir, 1975; Clegg et al. Clegg, 1985). In both cases, the return of Scottish rule led to a period of repopulation from the mainland and the dominance of Scottish language and culture (although Norn, the Norse dialect of Orkney and Shetland, survived until the 19th century). Figure 1 shows the geographic location of the North Atlantic islands, the sailing routes of the Vikings, and the areas of Norse influence in the North Atlantic region.
A number of previous studies have attempted to shed light on the genetic affinities of the North Atlantic islanders using classical allozyme genetic markers, but their results have been difficult to interpret. Most studies have focused on the Icelanders, with the aim of calculating the contributions to admixture of the Norse and Gaelic ancestral populations. Estimates have varied considerably—from a 93%–98% Gaelic contribution (Thompson Thompson, 1973) to an 86% Norse contribution (Wijsman Wijsman, 1984). More recent analyses of mtDNA and Y-chromosome variation in the Icelanders suggests that a majority of the female settlers may have originated from the British Isles, whereas ∼80% of male settlers were Scandinavian (Helgason et al. Helgason et al., 2000bb, Helgason et al., 2000cc). Fewer studies have dealt with the other island populations in the North Atlantic. An analysis of classical allozyme markers in a Western Isles population from Lewis found that allele frequencies showed substantial differences from neighboring European populations (Clegg et al. Clegg, 1985). Natural selection or genetic drift in the Lewis gene pool or gene flow among the other European populations were suggested as possible causes of these differences. In similar studies of Orkney and Shetland, Roberts reported (Roberts, 1985, Roberts, 1990) that both island populations diverged considerably in allele frequencies from neighboring populations. Although not ruling out selection or drift as potential causes for this divergence, Roberts concluded that the islanders of Orkney and Shetland most likely represented remnants of an aboriginal gene pool that had changed on the British mainland because of later population movements. None of these studies of allozyme variation in the Scottish Islands reported estimates of Norse admixture.
In this study, we examined mtDNA control-region sequences in the North Atlantic island populations of Orkney, the Western Isles, the Isle of Skye, and Iceland, and we compared them to those observed in the rest of the British Isles, Scandinavia, and other regions of Europe. The primary aims were to assess the relative magnitude of diversity and levels of Gaelic and Scandinavian admixture in the mtDNA pools of the North Atlantic island populations and individuals from the northwest coast of Scotland. mtDNA lineages sampled from contemporary populations provide us with direct links to matrilineal ancestors from the Viking age, and thus enables us to examine the extent to which Scandinavian females were involved in Norse settlements on the North Atlantic islands. Many historians believe that the Norse expansion of the Viking Age was primarily a male enterprise (Clover Clover, 1988). If this were the case, one would not expect to find close links between mtDNA lineages found in the North Atlantic island populations and Scandinavia. On the basis of available historical, archaeological, and linguistic information (Graham-Campbell and Batey Graham-Campbell and Batey, 1998; Corráin Corráin, 1999; Davies Davies, 1999), we would expect that the largest proportion of mtDNA lineages inherited from Norse matrilineal ancestors would be found (in descending order of magnitude) in Iceland, Orkney, the Western Isles, the Isle of Skye, and the coastal region of northwest Scotland. To achieve these aims, we sequenced the mtDNA hypervariable segment 1 (HVS1) from 1,664 individuals from the Scottish islands, the Scottish mainland, England, and Norway. These new data were added to an existing data set of 3,444 Eurasian HVS1 sequences, for a detailed study of mtDNA variation in the North Atlantic region.
DNA from 891 individuals from all regions of mainland Scotland, 181 from the Western Isles, 49 from the Isle of Skye, and 142 individuals of English matrilineal descent (representing most regions of England) was extracted from blood collected at Blood Transfusion Service donor sessions throughout Scotland. Information about the birth place of the maternal grandmother was sought from each individual. Similarly, DNA from 323 Norwegians was extracted from blood collected at donor sessions at Ullevål hospital in Oslo. Again, the birth place of the maternal grandmother was recorded for each individual, showing the samples to be representative of all geographic areas of Norway. In all cases individuals gave informed consent. Samples from 78 Orkney Islanders were kindly supplied by Dr. J. Bodmer (see Bodmer et al. Bodmer et al., 1996). The data produced for this study were deposited in GenBank and are available on request from the corresponding author.
The comparative data set from Europe consisted of 3,444 mtDNA HVS1 sequences from the following populations and sources: Iceland (394 from Helgason et al. Helgason et al., 2000bb, 20 from the Mitochondrial DNA Concordance, 14 from Richards et al. Richards et al., 1996, and 39 from Sajantila et al. Sajantila et al., 1995), Ireland (23 from the Mitochondrial DNA Concordance and 105 from Richards et al. Richards et al., 2000), Orkney (74 from the Mitochondrial DNA Concordance), the Western Isles (16 from the Mitochondrial DNA Concordance), England and Wales (160 from Richards et al. Richards et al., 1996, 29 from the Mitochondrial DNA Concordance, and 97 from Piercy et al. Piercy et al., 1993), Finland (74 from Kittles et al. Kittles et al., 1999, 23 from Pult et al. Pult et al., 1994, and 29 from Richards et al. Richards et al., 1996), Estonia (26 from Sajantila et al. Sajantila et al., 1995), Austria (99 from Parson et al. Parson et al., 1998 and 16 from Handt et al. Handt et al., 1994), France (10 from Mitochondrial DNA Concordance and 50 from Rousselet and Mangin Rousselet and Mangin, 1998), Italy (49 from Francalacci et al. Francalacci et al., 1996 and 68 from Stenico et al. Stenico et al., 1996), Germany (151 from Richards et al. Richards et al., 1996, 67 from Hofmann et al. Hofmann et al., 1997, 200 from Lutz et al. Lutz et al., 1999, and 109 from Pfeiffer et al. Pfeiffer et al., 1999), Denmark (15 from the Mitochondrial DNA Concordance and 31 Richards et al. Richards et al., 1996), Sweden (28 from Kittles et al. Kittles et al., 1999 and 32 from Sajantila et al. Sajantila et al., 1996), Norway (216 from Opdal et al. Opdal et al., 1998), Saami (61 from Delghandi et al. Delghandi et al., 1998 and 115 Sajantila et al. Sajantila et al., 1995), European Russia (103 from Orekhov et al. Orekhov et al., 1999 and 112 from Sajantila et al. Sajantila et al., 1995), Switzerland (76 from Pult et al. Pult et al., 1994), Spain (132 from Corte-Real et al. Corte-Real et al., 1996, 18 from Pinto et al. Pinto et al., 1996, 92 from Salas et al. Salas et al., 1998, 11 from Handt et al. Handt et al., 1998, and 45 from Bertranpetit et al. Bertranpetit et al., 1995), Portugal (54 from Corte-Real et al. Corte-Real et al., 1996), Bulgaria (30 from Calafell et al. Calafell et al., 1996), Turkey (27 from Calafell et al. Calafell et al., 1996 and 45 from Comas et al. Comas et al., 1996). Data obtained from the Mitochondrial DNA Concordance Web site originate from Miller’s (Miller, 1996) analysis of archival blood samples from populations in the North Atlantic region. Of all 253 sequences from Orkney, the Western Isles, Ireland, Denmark, England, France, and Iceland from Miller (Miller, 1996), 66 had ambiguous sites (probably because of blood degradation) and were excluded from further analysis.
To obtain equivalent sample sizes for comparative analysis, we combined sequences from geographically proximate populations into the following groups: England/Wales, Spain/Portugal, Austria/Switzerland, France/Italy, Finland/Estonia, and Bulgaria/Turkey. In addition to individuals from Iceland, Orkney, the Western Isles, and the Isle of Skye, Norse and Gaelic admixture was estimated in 91 individuals from the coastal region of north and northwest Scotland (mainly Wester Ross, Caithness, and Sutherland). The term Gaelic is used throughout as a convenient label for the combined populations of mainland Scotland and Ireland and should not be interpreted in a strict linguistic sense.
HVS1 was amplified for the 1,664 new samples from the British Isles and Norway, as described elsewhere (Richards et al. Richards et al., 1996), and was sequenced at the University of Florida DNA Sequencing Core Laboratory, by use of ABI Prism Dye Terminator cycle-sequencing protocols developed by Applied Biosystems. The fluorescently labeled extension products were analyzed on an Applied Biosystems Model 373 Stretch DNA sequencer or on a 377 DNA sequencer (Perkin-Elmer). In most cases, sequences were obtained between sites 16010 and 16400. The mtDNA site numbers referred to in this study are those of Anderson et al. (Anderson et al., 1981). To maximize the number of sequences available for analysis from the North Atlantic region, all analyses were restricted to the 235 nucleotides between positions 16090 and 16324. Recent studies have indicated that European mtDNA pools contain an extensive array of different mtDNA lineages and that large sample sizes are required to capture a representative sample of this variation (Pfeiffer et al. Pfeiffer et al., 1999; Helgason et al. Helgason et al., 2000bb; Richards et al. Richards et al., 2000). The maximization of sample size at the cost of sequence length is further justified by the observation that 82% of the total polymorphism found in sequences available between sites 16010 and 16400 is contained within the 235-bp segment we have analyzed.
Gene diversity was estimated as
![]() |
Mean pairwise differences between sequences (θπ) were calculated as
![]() |
![]() |
The two parameters, θπ and θk, use different aspects of the genetic data to estimate the population mutation parameter 2Nfeβ, where Nfe represents the female effective population size and μ the mutation rate. Helgason et al. (Helgason et al., 2000bb) noted that θk and θπ exhibit quite divergent values for mtDNA control-region sequences in European populations and that θk appears to provide a better reflection of European population sizes during the rapid expansions of the last few centuries. Because our focus is on historical events taking place during the past 1,300 years, we take θk as the more reliable estimator of Nfe, whereas θπ is taken simply to represent the average mutational divergence observed between a population’s mtDNA sequences. Under conditions of neutrality, constant population size, and the infinite-alleles mutation model, and under the assumption that the mutation rate is equal in all populations, differences in θk will reflect differences in Nfe (Ewens Ewens, 1972). Departures from these assumptions complicate the direct estimation, using Ewens's sampling formula, of Nfe. However, given that such departures are more or less equivalent for all populations, θk can be considered as an effective relative indicator of Nfe.
Recent studies indicate that European populations contain a large number of mtDNA lineages, of which many will remain unsampled, even when sample sizes are >400 (Pfeiffer et al. Pfeiffer et al., 1999; Helgason et al. Helgason et al., 2000bb). Sampling saturation of mtDNA lineages in European populations was assessed using a method based on the above sampling formula by Ewens (Ewens, 1972). Under the assumption of a steady-state distribution of alleles, this formula predicts that the relationship between increasing sample size and the number of new lineages encountered will be one of diminishing returns. We use the observed θk values to estimate the point at which incremental increases of 10 in sample size yields less than one new lineage (see Helgason et al. Helgason et al., 2000aa, Helgason et al., 2000bb). The resultant sample sizes represent the points at which the rate of lineage detection is equal for all populations. The ratio of these theoretical sample sizes to the actual sample size for each population can serve as an indicator of relative sampling saturation.
Much of the evolutionary change in the mtDNA pools of European populations during the past 1,300 years and before will have been in the form of lineage redistribution—both within populations, caused by drift, and between populations, caused by migration. Thus, measures of genetic distance based on lineage frequencies may provide important information about the relationships of the populations in the North Atlantic region. In order to reduce statistical noise caused by sampling variance and missing (or unsampled) lineages on the frequency based genetic distance, we collapsed lineages into a smaller number of phylogenetically resolved subclusters. The basic phylogenetic structure of Eurasian mtDNA lineages has been revealed by a number of recent studies (Richards et al. Richards et al., 1998; Macaulay et al. Macaulay et al., 1999; Helgason et al. Helgason et al., 2000bb). The skeleton structure is shown in figure 2, along with a number of well-resolved subclusters and known diagnostic substitutions. The maximum amount of character information available for each sequence was used to reduce the full data set of 1,128 lineages to the 42 haplogroups or subclusters identified in figure 2 (excluding the superhaplogroups R, L, and M). Only 35 lineages (46 sequences) could not be assigned to these Eurasian lineage clusters, and almost all of these are rare occurrences of lineages belonging either to African superhaplogroup L or Asian superhaplogroup M. These 35 lineages were assigned to a synthetic subcluster designated “other.” Most of the haplogroups and subclusters shown in figure 2 are now well established in the literature. The following subclusters were identified through the construction of phylogenetic networks using mtDNA HVS1 sequences from populations in the North Atlantic region: H1, H3, H4, H5, H8, K1, K2, K2a, K2b, T2, and T3. Median-joining networks (Bandelt et al. Bandelt et al., 1999), were generated for the common European haplogroups (H, I, J, K, V, and U5) using the mtDNA HVS1 data set described above. On the basis of these networks, new subclusters were designated when a cluster of lineages emanating from a founder lineage was observed in more than one population. Lineages not assigned to new subclusters retained their basic haplogroup label (see Richards et al. Richards et al., 2000 for a similar approach). Until whole-genome sequencing of mtDNA becomes the norm, allowing the construction of unambiguous phylogenetic trees (see Ingman et al. Ingman et al., 2000), any such schemes will necessarily remain arbitrary and the source of debate and confusion (Simoni et al. Simoni et al., 2000bb; Torroni et al. Torroni et al., 2000). In the interim, we must rely on heuristic subclusters, such as those defined in the present study.
Genetic distances between populations based on these subcluster frequencies were calculated using the f distance, which is based on the chord genetic distance introduced by Cavalli-Sforza and Edwards (Cavalli-Sforza and Edwards, 1967). The resultant matrix of genetic distances between populations was represented in two-dimensional space by means of a multidimensional scaling (MDS) analysis, using the SPSS software package.
Using the full sequences between sites 16090 and 16324, we estimated the mutational divergence of the North Atlantic mtDNA pools from other European populations using the index ρ. The ρ index is defined as the average number of substitutions between the sequences of one population and the closest founder sequences observed in another population (Forster et al. Forster et al., 1996), and it effectively summarizes the overlap between one mtDNA pool and a potential source mtDNA pool. Unlike an analysis of molecular variance (AMOVA) distance, which summarizes the average mutational distance between all pairs of sequences from two populations (Excoffier et al. Excoffier et al., 1992), ρ is insensitive to the fact that the divergence between European populations, as measured in mutations at the mtDNA locus, is small relative to the overall mutational time-depth of the European mtDNA phylogeny (see Richards et al. Richards et al., 1998; Simoni et al. Simoni et al., 2000aa). Thus, an AMOVA analysis shows that <2% of the variance in mutational divergence between all pairs of European mtDNA HVS1 sequences are accounted for by their distribution among different populations (Helgason et al. Helgason et al., 2000bb).
To estimate the level of Scandinavian ancestry in the island populations of the North Atlantic and the coastal population of northwest Scotland, we employ a heuristic approach to estimate the admixture proportion that best fits the observed lineage distribution in the admixed and parental populations (see Helgason et al. Helgason et al., 2000cc). This estimator, designated mρ, is obtained as follows. Given a prior probability of admixture, η, the probability that a randomly chosen lineage observed in the admixed sample is derived from the first source population is given by ηp1/{ηp1+(1-η)p2}, where p1 and p2 are the frequencies of this lineage in the two source populations. For a given value of η, 10,000 random samples are used to obtain the mean and 95% confidence interval for the admixture estimate mρ, conditioned on the prior value of η. The fit to the data is evaluated by Σ(Nmρ-Nη)2/Nη), where N is the size of the admixed sample. The best-fitting model is found by an iterative search over the line 0≤mρ≤1 in successively smaller intervals around the best-fitting value for mρ.
To use the information provided by private lineages in the admixed populations, the putative founder lineage(s) for a private lineage is defined as the lineage in the source population(s) that differs by the smallest number of substitutions according to a matrix of mutational distances between lineages. If more than one lineage in the source population(s) meets this criterion as a putative founder lineage (as in the case of a tie), their frequencies are summed to derive p1 and p2, to calculate the conditional probability of origin. We note that the only way to unambiguously determine the genuine founder lineages is by means of the true phylogeny of sequences generated from a sample of all lineages from the populations in question. However, because unambiguous phylogenies cannot be constructed from HVS1 sequence data and because it is likely that many lineages have yet to be sampled from the mtDNA pools of European populations, our method provides an objective heuristic approach to identifying founder lineages that avoids subjective choice of founder lineages “by hand.”
Table 1 presents summary statistics for the 5,108 mtDNA HVS1 sequences used in this study, with populations placed in descending order by θk values. As was found elsewhere (Helgason et al. Helgason et al., 2000bb), θk seems to better reflect current and historical population sizes than does θπ. In most cases, values of θk and θπ differ by more than one order of magnitude and are not significantly correlated. As expected, the North Atlantic island populations exhibit relatively small values of θk, which is indicative of small effective population sizes for females (Nfe). In most respects, the populations from the Western Isles and Orkney exhibit similar levels of genetic diversity to those of the Icelanders. Interestingly, the Irish sample also exhibits very low values of gene diversity and θπ. The Saami and the islanders of Skye have by far the smallest Nfe.
| Table 1 Summary Statistics for HVS1 Sequences from European Populations |
| Population | N | K | S | GD | θk | θπ | Private Lineages (%) | Sampling Ratio | ||
|---|---|---|---|---|---|---|---|---|---|---|
| France/Italy | 248 | 158 | 97 | .963 | 186.42 | 4.23 | 60.76 | .147 | ||
| Germany | 527 | 234 | 99 | .97 | 160.68 | 3.70 | 50.00 | .361 | ||
| Scandinavia | 645 | 243 | 108 | .937 | 141.36 | 3.52 | 44.44 | .504 | ||
| England/ Wales | 429 | 183 | 91 | .934 | 120.18 | 3.35 | 44.81 | .394 | ||
| Scotland | 891 | 250 | 102 | .956 | 115.11 | 3.73 | 46.00 | .849 | ||
| Spain/Portugal | 352 | 154 | 95 | .935 | 103.85 | 3.26 | 45.45 | .371 | ||
| Bulgaria/Turkey | 102 | 71 | 70 | .977 | 102.25 | 4.34 | 56.34 | .110 | ||
| Austria/Switzerland | 187 | 93 | 70 | .958 | 72.84 | 3.55 | 37.63 | .279 | ||
| European Russia | 215 | 90 | 59 | .934 | 57.69 | 3.44 | 32.22 | .406 | ||
| Western Isles | 197 | 79 | 53 | .968 | 48.43 | 3.75 | 27.85 | .438 | ||
| Iceland | 467 | 114 | 67 | .966 | 47.76 | 3.96 | 42.98 | 1.061 | ||
| Orkney Islands | 152 | 67 | 55 | .946 | 45.24 | 3.37 | 27.94 | .362 | ||
| Ireland | 128 | 61 | 50 | .922 | 45.05 | 2.87 | 29.51 | .305 | ||
| Finland/Estonia | 202 | 75 | 59 | .949 | 42.74 | 3.49 | 33.33 | .505 | ||
| Isle of Skye | 49 | 23 | 27 | .935 | 16.30 | 3.70 | 21.74 | .306 | ||
| Saami | 176 | 30 | 30 | .808 | 10.15 | 3.21 | 46.67 | 1.760 | ||
| Note.—N = sample size; K = no. of lineages; S = no. of variable sites; GD = gene diversity. |
The sampling saturation ratio varies considerably among the populations included in this study and indicates that the Saami, Icelanders, and Scots are the most extensively sampled populations for mtDNA variation. In contrast, France/Italy and Bulgaria/Turkey appear to be the least-sampled regions included in this study.
The proportion of private lineages also varies considerably among populations (21.7%–60.7%). In general, the proportion of private lineages sampled from geographically proximate populations should increase as a function of θk, as this parameter reflects the probability of new lineages arising by mutation. Each new lineage will be private to the population in which it appeared until carried through female migration into another population’s mtDNA pool. As is indicated in figure 3, there is a strong correlation between θk and the proportion of private lineages (r=.77; P<.001). A few samples have a relative excess or scarcity of private lineages, of which the Iceland, Saami, Bulgaria/Turkey, Isle of Skye, Western Isles, and Ireland sample are outside the 95% confidence interval for the regression line. An excess of private lineages could have at least three basic causes. The first, where isolation has hindered the migrational flow of new lineages to and from neighboring populations, is perhaps typified by the Icelanders. The second is exemplified by the population sample from Bulgaria/Turkey: considerable gene flow has taken place from regions not included in the comparative database that was used to estimate the proportion of private lineages (in this case, from Asia and Africa). This also is exemplified by the Saami, whose mtDNA pool includes a number of lineages from Asian haplogroups. The third cause would be an excess of private lineages in populations that have been sampled more extensively than others included in the comparison. The relatively thorough sampling of the Icelanders may account partially for their excess of private lineages.
A relative scarcity of private lineages could be indicative of either a very high level of emigration (where few lineages remain private for long, because of rapid outward gene flow to neighboring populations) or immigration (where new lineages arriving into the population would increase θk but not the proportion of private lineages). It is interesting to note that, contrary to the excess of private lineages observed in the Icelanders, the populations of Orkney, the Western Isles, and the Isle of Skye all exhibit a relative scarcity of private lineages. As the Icelanders and Scottish islanders share the same source mtDNA pools, this difference may reflect the well-recorded extensive migration from the Scottish islands to the mainland during the past two centuries, as opposed to the isolation of the Icelanders.
Table 2 shows the pattern of lineage sharing between the North Atlantic island populations and the Scandinavian and Gaelic source mtDNA pools. It is notable that all populations share a higher percentage of their lineages exclusively with the Gaelic source populations. The Icelanders have the highest percentage of lineages that are found in neither source mtDNA pool (again, an indication of greater isolation), and the islanders of Skye have the lowest. As might be expected, the Icelanders have the lowest proportion of lineages shared exclusively with Gaels, and the islanders of Skye have the highest. More surprising is the observation that the islanders of Skye and Orkney share a greater proportion of their lineages with Scandinavians than do the Icelanders. However, if only lineages shared exclusively with either of the two source populations are examined, Iceland (0.38) and Orkney (0.35) are revealed as having the closest relationship to Scandinavia (see Table 2).
| Table 2 Pattern of Lineage Sharing between North Atlantic Islands and Source Populations |
| Lineages Shared with (%) | |||||||
|---|---|---|---|---|---|---|---|
| Population | Ka | Gaels Onlyb | Scandinavians Onlyb | Both | Neither | ||
| Iceland | 114 | 11.4 (.62) | 7.0 (.38) | 26.3 | 55.3 | ||
| Orkney | 68 | 16.2 (.65) | 8.8 (.35) | 38.2 | 36.8 | ||
| Western Isles | 79 | 16.5 (.81) | 3.8 (.19) | 35.4 | 44.3 | ||
| Isle of Skye | 23 | 30.4 (.78) | 8.7 (.22) | 39.1 | 21.7 | ||
| NW Scottish coast | 91 | 24.2 (.88) | 3.3 (.12) | 39.6 | 33.0 | ||
| Scottish Islands | 138 | 18.8 (.74) | 6.5 (.26) | 27.5 | 47.1 | ||
| North Atlantic islands | 214 | 14.0 (.67) | 7.0 (.33) | 21.0 | 57.9 | ||
| a K = number of distinct lineages. b The number in parentheses represents the proportion of lineages shared exclusively with either the Gaels or Scandinavians out of the total number of exclusively shared lineages. |
Figure 4 shows ρ distances between island populations of the North Atlantic and other European groups. Note that individuals from the northwest coast of Scotland are treated as a separate population. The ρ distances are largely consistent with expectations based on historical and archaeological records. The Gaels are closest in all cases, followed by a cluster of Scandinavians, other North Atlantic islanders, inhabitants of England and Wales, Germans, and the remaining European populations. The ρ distance to the Gaels is smallest from the Isle of Skye and the northwest coast of Scotland and is greatest for the Icelanders. The least difference between Scandinavians and Gaels is observed for Iceland and then, in descending order, for Orkney, the Western Isles, the northwest coast of Scotland, and the Isle of Skye. Interpreted as a rough indicator of Norse admixture, the relative differences between ρ distances to Gaels and Scandinavians accord with historical evidence of the differential impact of Norse settlement on each of the North Atlantic island populations. However, according to these results, it appears that the Gaelic contribution to the Icelandic mtDNA pool may have been at least as large as that from Scandinavia.
The small ρ distances to the Germans in all five cases are surprising, as there are no known accounts of recent female gene flow from Germany into the North Atlantic region. This may be accounted for by Germany’s central position in Europe and by the fact that many ancient population movements into the British Isles and Scandinavia originated from or passed through this territory (Collins Collins, 1991; Davies Davies, 1999). In this case, ancient German ancestral links to both the Norse and Gaelic mtDNA pools would account for the low ρ distances to the admixed North Atlantic island populations. It may also be that ρ values shown in figure 4 are influenced by sample size. ρ distances to putative source populations that have not been adequately sampled will tend to be overestimated (Helgason Helgason et al., 2000bb; Richards et al. Richards et al., 2000). However, Table 1 suggests that, for example, England/Wales, Finland/Estonia, and European Russia have a higher sampling saturation than Germany—and yet the latter two are consistently more distant from the North Atlantic island populations than is Germany. We thus conclude that ρ values are likely to reflect actual relationships between the mtDNA pools of these populations. Moreover, even when an effect caused by the sample size of putative source populations is assumed, this should influence equally ρ distances to the North Atlantic island populations. Thus, the varying configuration of ρ distances would still provide valuable information about the population histories of the North Atlantic island populations.
A more complete picture of population relationships in the North Atlantic region and the rest of Europe can be obtained from a multidimensional scaling (MDS) plot of genetic distances based on the frequencies of lineage clusters (fig. 5). Table 3 presents the frequencies of lineage clusters in the populations and groups used in this study (the phylogenetic relationships of these lineage clusters is shown in fig. 2). We have interpreted this plot as showing three nonoverlapping geographic groups of populations: those of the North Atlantic, south and central Europe, and the Baltic region. The North Atlantic island populations clearly have the closest links to the mtDNA pools of the British Isles and Scandinavia. The wide dispersion of the island populations in the top-right area of the diagram agrees with their small effective population size of females and the concomitant effect of genetic drift.
| Table 3 Haplogroup and Subcluster Frequencies for European Populations |
| Frequency for Population (%) | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Haplogroup | Austria/ Switzerland (N=187) | European Russia (N=215) | Finland/ Estonia (N=202) | France/ Italy (N=248) | Germany (N=527) | Iceland (N=467) | Ireland (N=128) | Orkney (N=152) | Scandinavia (N=645) | Scotland (N=891) | Bulgaria/ Turkey (N=102) | Spain/ Portugal (N=352) | England/ Wales (N=429) | Western Isles/Isle of Skye (N=246) | Saami (N=176) | ||
| A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | .16 | 0 | 0 | .85 | .23 | .41 | 0 | ||
| B | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | .11 | 0 | 0 | 0 | 0 | 0 | ||
| C | 0 | 1.86 | .50 | .40 | .19 | .43 | 0 | 0 | 0 | 0 | 1.96 | 1.14 | 0 | 0 | 0 | ||
| D | 0 | 1.86 | 0 | .40 | .38 | 0 | 0 | 0 | .16 | 0 | 4.90 | .28 | 0 | 0 | 5.11 | ||
| H | 47.06 | 33.49 | 36.63 | 45.16 | 38.33 | 28.27 | 41.41 | 40.79 | 39.69 | 38.38 | 31.37 | 50.28 | 40.79 | 27.24 | 1.70 | ||
| H1 | 3.74 | 5.58 | 2.97 | 2.82 | 4.36 | 8.35 | 2.34 | 6.58 | 3.41 | 3.03 | 1.96 | 2.56 | 4.43 | 1.22 | 1.14 | ||
| H3 | .53 | 0 | 0 | .40 | 1.14 | .43 | 1.56 | 0 | .62 | .56 | 0 | 0 | .93 | 1.22 | 0 | ||
| H4 | 1.60 | 0 | 1.49 | 1.61 | 1.52 | 9.64 | 2.34 | .66 | 2.33 | 1.23 | 3.92 | 3.41 | 3.03 | 3.25 | 0 | ||
| H5 | 1.07 | 0 | 0 | 2.82 | .76 | .43 | 0 | 0 | .62 | .11 | 0 | 1.99 | 1.63 | 0 | 0 | ||
| H8 | 0 | 3.26 | 2.97 | .81 | 2.85 | .43 | 0 | 2.63 | 1.86 | 2.36 | .98 | .28 | 1.40 | 1.63 | 2.84 | ||
| I | 2.14 | 1.40 | 2.48 | .81 | 2.28 | 4.71 | 2.34 | 3.29 | 1.86 | 4.38 | 1.96 | .57 | 3.03 | 6.50 | 0 | ||
| J | 5.35 | 6.51 | 4.95 | 2.42 | 6.83 | 6.85 | 11.72 | 7.89 | 6.82 | 8.64 | 5.88 | 3.69 | 10.72 | 10.57 | 0 | ||
| J1 | 1.60 | 0 | .50 | 0 | .76 | 0 | 0 | 0 | .47 | .56 | 4.90 | .28 | .47 | 1.22 | 0 | ||
| J1a | 3.21 | 0 | .50 | .40 | 1.33 | .43 | .78 | 0 | 1.55 | .45 | 0 | .57 | 1.63 | 0 | 0 | ||
| J1b | 0 | .47 | 0 | .40 | 0 | 0 | 0 | 0 | 0 | .11 | .98 | .57 | 0 | 0 | 0 | ||
| J1b1 | 0 | 0 | 0 | .40 | .19 | 5.57 | .78 | 1.97 | 1.40 | 3.48 | 0 | 0 | 1.40 | 1.22 | 0 | ||
| J2 | 0 | .93 | .99 | 2.42 | .19 | 1.28 | .78 | 0 | 0 | 1.12 | 2.94 | .85 | .23 | 1.63 | 0 | ||
| K | 2.14 | 1.86 | 2.48 | 4.44 | 5.69 | 4.93 | 5.47 | 5.26 | 4.03 | 3.70 | 4.90 | 3.13 | 5.13 | 8.54 | 0 | ||
| K1 | 0 | 0 | 0 | 0 | .19 | 2.36 | .78 | 1.32 | .47 | .79 | 0 | 0 | .23 | .81 | 0 | ||
| K2 | 5.88 | .93 | 0 | 1.61 | .76 | .43 | .78 | 0 | .16 | 1.01 | .98 | 1.14 | .47 | .41 | 0 | ||
| K2a | 0 | 0 | 0 | 0 | 0 | 0 | .78 | 0 | .31 | .56 | 0 | 0 | 0 | 1.22 | 0 | ||
| K2b | 1.07 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | .56 | 0 | .28 | .23 | 2.44 | 0 | ||
| T | 4.81 | 4.65 | 5.45 | 10.89 | 6.26 | 2.14 | 7.03 | 2.63 | 6.51 | 7.63 | 4.90 | 4.55 | 5.36 | 8.94 | 0 | ||
| T1 | 1.07 | 3.72 | 1.49 | 3.23 | 2.47 | .43 | 2.34 | 3.29 | 1.40 | 2.24 | 4.90 | 1.14 | 2.10 | 3.25 | 0 | ||
| T2 | .53 | 1.86 | 0 | .40 | .19 | 2.57 | 0 | 0 | .93 | .22 | 0 | .28 | .23 | .41 | 0 | ||
| T3 | 0 | 0 | 0 | 0 | .19 | 4.93 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| U | 0 | 0 | 0 | 0 | .19 | .21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| U1 | .53 | .47 | .50 | 1.21 | .38 | 0 | 0 | 0 | .16 | 0 | 0 | .28 | 0 | 2.03 | 0 | ||
| U2 | 1.07 | 0 | .99 | .81 | .57 | 0 | .78 | 0 | .16 | .79 | 0 | .28 | .70 | .41 | 0 | ||
| U3 | .53 | .47 | 0 | .81 | 1.14 | 0 | 0 | 0 | .78 | 1.23 | 5.88 | .57 | .70 | 2.44 | 0 | ||
| U4 | 3.74 | 8.37 | 3.47 | 2.02 | 2.66 | 2.14 | 2.34 | .66 | 2.17 | 2.47 | 3.92 | 1.99 | 1.63 | .41 | 0 | ||
| U5 | .53 | 0 | 0 | 1.21 | .19 | .43 | .78 | 2.63 | 1.71 | 1.12 | .98 | .57 | .93 | 2.44 | 0 | ||
| U5a | 5.88 | 7.91 | 6.44 | 2.82 | 4.93 | 5.57 | 4.69 | 5.92 | 6.82 | 5.05 | .98 | 4.55 | 3.50 | 4.88 | .57 | ||
| U5a1 | .53 | .93 | 1.49 | 0 | .19 | 0 | 0 | 1.32 | .47 | 0 | 0 | 0 | .70 | 0 | .57 | ||
| U5b | .53 | 2.33 | 6.44 | 1.21 | 3.80 | 3.43 | .78 | 1.97 | 1.71 | 1.01 | 0 | .85 | 1.40 | .81 | 1.70 | ||
| U5b1 | 0 | 2.79 | 2.97 | 0 | 0 | 0 | 0 | 0 | 2.33 | 0 | 0 | 0 | 0 | 0 | 42.61 | ||
| U6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.42 | 0 | 0 | 0 | ||
| U7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | .11 | 0 | 0 | 0 | 0 | 0 | ||
| V | 2.67 | 4.19 | 6.44 | 2.82 | 5.12 | 1.71 | 7.03 | 1.32 | 5.74 | 4.26 | 0 | 5.97 | 3.73 | 2.03 | 39.77 | ||
| W | 1.60 | 2.33 | 5.45 | .81 | 2.09 | .21 | 2.34 | 1.97 | 1.55 | .90 | 3.92 | 1.99 | 1.63 | .41 | .57 | ||
| X | .53 | 0 | 1.49 | 2.02 | .76 | 1.50 | 0 | 7.24 | .62 | 1.68 | 3.92 | 1.70 | .93 | 2.03 | 0 | ||
| Z | 0 | .47 | 0 | 0 | 0 | .21 | 0 | 0 | .62 | 0 | 0 | 0 | 0 | 0 | 3.41 | ||
| Other | 0 | 1.40 | .99 | 2.42 | 1.14 | 0 | 0 | .66 | .47 | .11 | 2.94 | 1.99 | .47 | 0 | 0 | ||
We performed a Mantel test in order to assess the strength and significance of the apparent geographic structure observed in figure 5. Geographic distances were calculated as geodesic distances, using the following coordinates for each population: Austria/Swiss (15°12′E, 48°42′N), European Russia (35°42′E, 57°0′N), Finland/Estonia (25°12′E, 60°6′N), France/Italy (7°24′E, 44°6′N), Germany (10°12′E, 51°0′N), Iceland (18°24′W, 64°42′N), Ireland (7°42′W, 53°23′N), Orkney (2°54′W, 59°17′N), Scandinavia (11°18′E, 59°30′N), Scotland (4°18′W, 56°30′N), Bulgaria/Turkey (29°48′E, 39°17′N), Spain/ Portugal (30°W, 39°42′N), England/Wales (2°6′W, 52°42′N), and the Western Isles/ Isle of Skye (7°6′W, 57°17′N). The product moment correlation between genetic and geographic distances for all the groups in figure 5 was r=0.717, and, of 10,000 random permutations of the distance matrices, none yielded values ⩾0.717. When the North Atlantic island populations are omitted from the matrices used in the Mantel test, we obtain r=0.713, with the same high degree of significance.
In this section, we apply the heuristic approach to estimation of the relative contributions of the Gaelic and Scandinavian source populations to the mtDNA pools of the island and coastal populations of the North Atlantic. Table 4 shows the estimated ancestral proportions for each of the five admixed populations, along with 95% confidence intervals.