i I


ation Tei Eee y

The Bronze Age and Early Iron Age Peoples of Eastern Central Asia

FFT SB a SRT A Sek a Pe RAY ee BR

edited by

Victor H. Mair

Volume Two

Genetics and Physical Anthropology, Metallurgy, Textiles, Geography and Climatology, History, and Mythology and Ethnology

The Institute for the Study of Man in collaboration with

The University of Pennsylvania Museum Publications

Institute for the Study of Man Inc. 1133 13th St. N.W., Washington D.C. 20005 in collaboration with The University of Pennsylvania Museum Publications

33rd and Spruce Streets Philadelphia, PA 19104

Copyright © 1998, Institute for the Study of Man Inc. All rights reserved

including the right of reproducing in whole or in part in any form.

Journal of Indo-European Studies Monograph Number Twenty-Six in two volumes

Manufactured in the United States of America

Production services by Clark Riley clark_riley@qmail.bs.jhu.edu

Library of Congress 98-070337

ISBN 0-941694-63-1

Genetics and Physical Anthropology


DNA Analysis on Ancient Desiccated Corpses from Xinjiang (China): Further Results

Paolo Francalacci Universita di Sassan

There are two different approaches to the study of the biological history of human population: physical anthropology and population genetics. The former studies the remains of past populations: it allows diachronic studies, but it is often limited by the scarcity of the fossil record and to the lack of knowledge about the modality of heredity of the macroscopical characters (such as the skeletal and dental features) considered. The latter is based on well-known models of genetic transmission and can be carried out on a large number of individuals, but it has to interpret a historical process, evolution, relying only on the present-day situation. An emerging field, called “molecular archeology”, tries to fill the gap between the two methodologies, analyzing ancient samples (as physical anthropology does) at the molecular level (as in genetic analysis).

Early works extracted and characterized ancient proteins, while more recent studies involve the analysis of ancient DNA. This exciting new possibility was promoted by recent advancements in molecular biology, and in particular by the devising of the technique known as Polymerase Chain Reaction (PCR) (Saiki et al. 1985). This technology, which is a kind of in vitro clonation, allows the enzymatic amplification in billions of copies of an informative DNA sequence selected by the researcher, starting from a very small amount of DNA. It is necessary to define two primers (short sequences of nucleotides) flanking the region of interest, that can promote with extreme specificity the subsequent amplification. Because of its high stringency, the presence of very different sequences, like bacterial and fungine DNA molecules, does not influence the reaction, since the two primers permit the researcher to pick out selected human sequences even amidst a huge number of exogenous DNA molecules. Neither the amount of nucleic acid, nor its physical integrity in extremely long molecules are an insurmountable problem, although they do affect the efficiency of the reaction. Since ancient DNA is usually present in very low amounts, degraded, and mixed with nucleic acids from microorganisms, the PCR technique, widely employed in many other fields of Molecular Biology, is the methodology of choice (virtually the only one) in the ancient DNA field.

Unfortunately, its application to ancient remains is not as

The Bronze Age and Early Iron Age Peoples of Eastern Central Asia

538 Paolo Francalacci

straightforward as to modern samples. In fact, researchers dealing with ancient molecules must face several additional problems that are of little or no importance when working with fresh tissues. Among others, the most noticeable are the poor preservation of nucleic acids in ancient material, often degraded to very short fragments of a few hundreds of base pairs (this implies that only short sequences can be successfully retrieved), the presence of inhibitory factors in extracts from ancient tissues, and possible contamination from modern human DNA. The last problem is felt particularly when studying anthropological remains because of the coincidence between subject and object of the research: in both cases a human being.

In this regard, the very advantage of PCR, i.e., sensitivity and specificity, is also a disadvantage for ancient DNA studies. This paradox is due to the fact that there are a lot of possible sources (dead skin cells from the hands, dandruff, saliva, sweat, blood) that contain enough DNA to contaminate a precious ancient sample. The more a specimen has been handled in a museum, the less will it be free from human modern molecules. These molecules, even if in very low concentration, are obviously in better condition than the ancient ones, and they will be preferentially amplified by the enzyme during PCR. In fact, the kinetics of the reaction are faster with undamaged molecules, and if intact DNA is present, it is very unlikely that any ancient molecule will be amplified. Obviously, the laboratory environment must be kept as clean as possible, but the phases prior to laboratory analysis (excavation and museum storage), usually not controlled by the analyst, are crucial. For all these reasons, even though from a technical point of view the analysis of the ancient DNA can be done in any Molecular Biology laboratory, since no special equipment is required and the analytical methods used are directly derived from those applied in fresh samples, it is necessary to exercise additional care and caution in extracting and amplifying DNA from ancient material.

Because of these difficulties, it is appropriate to focus the attention to a region of the DNA with a suitable ratio between length and informativeness, the ideal being a short region that is highly polymorphic. In addition, a DNA molecule present in a large number of copies per cell has an increased chance that at least one copy could survive through the years. Fortunately, a region of DNA with these features does exist, and it is not in the chromosomes, but in the genome of the cellular organelles called mitochondria.

The mitochondria are formerly independent organisms, similar to bacteria, that entered in symbiosis with the first nucleate cells at the very beginning of the evolution of the life on Earth. The early cell provided them nutrition and protection, while the mitochondria gave in exchange the management of energy. All superior organisms possess many mitochondria per cell which still maintain a sort of

Victor H. Mair, editor

DNA Analysis on Ancient Corpses from Xinjiang: Further Results 539

autonomy with a separate duplication cycle and their own particular DNA. During the course of evolution, the mitochondrial genome (mtDNA) has been simplified by the migration of many of its genes in the nuclear chromosomes, and presently only a few genes, carried by a very small number of base pairs (bp), are present (human mu)NA has 37 genes in 16,569 bp compared to hundreds of thousands of genes in 3 billion base pairs for the nuclear genome). These genes play an important role in cellular metabolism and they are strictly patched in the genome, with no gaps but existing in a region of about 1,000 bases with no special genetic expression, called a “control region” (it has a regulatory function for mtDNA duplication), that contains two domains of 400 bp each which can vary almost freely in their sequence without consequences for the functionality of the organelle. To be of evolutionary interest, a mutation (in the simplest case just a change in the DNA sequence of the 4 bases constituting DNA, for example a Guanine instead of an Adenine, or a Timine instead of a Citosine) should be neutral, or, in other words, not influenced by the environment, since, otherwise, two unrelated populations living in similar environment could present similar genetic features simply because of convergence (such as the dark color of the skin in tropical Africa and in Oceania). This is the case of two short portions of the control region, called hypervariable segment | and II, that presumably show the highest mutation rate (the first being twice as high as the second) of the whole genome. Even within a homogeneous population it is rather difficult to find identical individuals for the entirety of the hypervariable regions (excluding relatives from the maternal side, see below) and there is almost no sharing among different populations (obviously some overlapping can be observed if we consider short portions of the control region). Other interesung characteristics, besides its simplicity, abundance, and variability, account for the attention paid to this molecule by evolutionary geneticists. The mtDNA has no recombination and it is transmitted only by the maternal lineage since the mitochondria present in the spermatozoa do not enter into the egg but only those of the latter are present in the offspring: both features that drastically simplify the study of its evolution.

For all these reasons, the hypervariable control] regions of mtDNA (especially region I) are by far the most often investigated in ancient DNA studies, and the data which they yield can be compared with relevant data derived from modern populations from all around the world. The high mutation rate of this region is at the same moment both an advantage, because it allows for fast differenuauion among individuals and populations, but also a difficulty, because of the possible occurrence of the same mutation in independent lineages, thus confusing the reconstruction of the evolution of the mitochondrial types in a given population. In addition, the huge

The Bronze Age and Early Iron Age Peoples of Eastern Central Asta

540 Paolo Francalacci

variability induces a background noise of many rare and often conflicting mutations that sometimes hampers adequate statistical treatment of the data. Apparently, not all the mutations are equally important, and my current work, in collaboration with Antonio Torroni, a geneticist of the University of Rome, is aimed to individuate, among the changes in the control region, those of phylogenetic interest (Torroni et al., 1996, in press). The coding portions of the mtDNA are more stable, and a change there is likely to occur only once in the evolutionary history of a population. Our preliminary results show that the parallel analysis of both the sequences of the control] region and the various point mutations in the coding areas allows us to define groups of lineages with changes in the control region that are population-specific, at least at a continental level.

It is apparent that no evolutionary interpretation of the DNA extracts from ancient samples can leave out of consideration both the nature and the extent of the variability of mtDNA in modern populations. More precise phylogenetical affiliation of the desiccated corpses from Xinjiang, the object of this study, could be established only when the variability patterns of the mtDNA will be fully understood and the polymorphic sequences from modern individuals from Xinjiang (presently studied by Du Ruofu of the Institute of Genetics of the Chinese Academy of Sciences in Beijing) and from other regions of Eurasia will be known.

Drawings of desiccated tissues were carried out on several individuals naturally mummified, dated 3,200 BP, from the graveyard at Qizilchoqa near Qumul (Hami), in eastern Xinjiang (far northwest China) and from the Museum of Archeology in Urumchi. All together, 25 specimens from 11] individuals were collected, but up to now only 5 samples belonging to 2 individuals are available for analysis. Every effort has been made to keep these samples free from modern human DNA contamination. The samplings were made wearing disposable rubber gloves to avoid skin contact and a mask to prevent contamination from saliva when speaking or breathing. The drawings were made by disposable sterile scalpels. The gloves and the tools were changed when sampling a new individual to prevent cross contamination among ancient corpses. The specimens, about 1-2 grams of desiccated tissue, of different types—muscle, skin, bone, etc.—and from various parts of the body, were stored in sterile plastic tubes, immediately labelled and sealed, to avoid the growth of micro- organisms. The least exposed parts of the body, such as the inner thighs or underarms, were selected, with the aim of analyzing tissue with limited handling. In some cases, especially at the necropolis where many mummies where unearthed and reburied shortly after, it was possible to draw bone and soft tissue from below the woollen clothes, ensuring protection from handling in those places.

Victor H. Mair, editor

DNA Analysis on Ancient Corpses from Xinjiang: Further Results 54]

More than one sample (from 2 to 4) was drawn from each individual. This represents the best indirect control of the authenticity of the results. In fact, while all the specimens from the same mummy should yield identical sequences, those from different individuals should reflect the biological variability of the human species.

For the same purpose of mantaining the purity of the sample, strict control of the laboratory environment was exercised by working in dedicated areas and, whenever possible, by using disposable tools and glassware to minimize the possibility of contamination from previous use. As a further caution, possible exogenous DNA has been inactivated by prolonged exposure of all reagents, glassware, and instrumentation to UV light (wavelength 254 nm).

Some aliquots of the two individuals whose tissue was available for the research were processed to obtain nucleic acids for genetic investigations. About 2 grams of tissue were finely powdered under liquid nitrogen and aliquots of 0.5 gram have been transferred to sterile 50 ml tubes and submitted to different extraction methods according to the tissue type: for soft tissue we followed the method of Paabo (1990) and Hoss & Paabo (1993) while for bones we applied also the method of Perrson (1992).

The first method (Paabo, 1990) is directly derived from the most frequently used technique for extracting DNA in Molecular Biology. The sample is processed in an appropriate buffer and treated with Phenol and Chloroform to get rid of proteins and other organic and inorganic compounds and the nucleic acids, after a purification step, are precipitated with ethyl alcohol (to avoid the loss of DNA molecules, the precipitation step has been replaced in some cases by concentration with disposable microconcentrators). The majority of the nucleic acids obtained were of low molecular weight, with an average length from 200 to 500 base pairs. A small fraction of higher molecular weight DNA was also observed. As mentioned above, this finding is quite common in the case of DNA extraction from ancient tissue, since most of the nucleic acids are degraded. The DNA of bigger size is not necessarily related to the presence of longer ancient molecules, but can be due to the occurrence of DNA from microorganisms (bacteria, fungi). In spite of the good vield, the subsequent use of DNA extracted with this method is sometime hampered by the coprecipitation of some inhibitory factor of unknown etiology, whose presence in the mummy sample was indicated by a blue fluorescence in the UV light that is clearly distinguishable from the orange color shown by the nucleic acids.

To circumvent this problem, two other protocols have been devised. That of Perrson (1992) is based on the well-known affinity with DNA shown by hydroxyapatite, the mineral component of bone. The DNA is eluted from this matrix phosphate buffer and the DNA, adequately purified, is precipitated by a specific carrier, such as

The Bronze Age and Early Iron Age Peoples of Eastern Central Asia

542 Paolo Francalacci

spermidine. The second protocol (Hoss and Paabo, 1994) involves the affinity of silica (or diatomite) suspension for nucleic acids, enhanced by the presence of guanidine tyocianate. This method gave positive results only when applied to bone powder from the mummy sample #2, from which a fragment of rib was available, while no DNA could be recovered from the soft tissue of the same individual.

The ancient DNA extracted was submitted to enzymatic amplification (PCR) having as a target a portion of the hypervariable region I of the mtDNA and other short fragments in the coding part (by far more stable in respect to the control region) of the mitochondrial genome that include point mutations of populational interest. Because of the sample conditions, not all the amplifications attempted for the different targets were successful, but it was possible to obtain some informative amplifications (both directly and by applying a repair protocol described in Francalacci 8& Warburton, 1992) from independent extractions of one individual (mummy #2, from Qizilchoqa). All the PCR amplifications were carried out in parallel with blank controls of both the extraction and PCR reagents.

The amplified products carrying the point mutations, referable to as the haplogroups T, H, M and -10,394 DdelI (Torroni et al., 1996, in press), were analyzed by restriction enzyme analysis, while the 148 bp portion of the hypervariable region was manually and automatically sequenced. The results are compared with the complete human mitochondrial DNA, whose 16,569 nucleotides (or bases) were fully sequenced by Anderson et al. (1981). The base numbering and order is conventionally referred to this sample, also known as “the Cambridge sequence”, from the place where it was described.

The sequencing analysis of the amplified hypervariable region I, from bp 16,254 to bp 16,400, yielded a sequence identical for all the bases of the Cambridge reference but one, in position 16,278, where a Timine in the ancient sample substitutes for a Citosine.

We can attempt to reconstruct the phylogenetic relationships using sequences by analyzing the pairwise differences among individuals and by tracking the diffusion of a given change. It should be clearly pointed out that the Cambridge sequence of the hypervariable region does not represent either the “normal” one (since the majority of the changes are non-pathological), or the more common one in human beings, but simply that coming from a concrete individual of European origin that happened to be the first person analyzed. For this reason, European samples show a small number of mutations (or, better, changes, since it is not possible to know, in the presence of a base variation.in respect to the reference, which nucleotide was the original in mankind and which was the mutant), while, obviously, more distant populations show a higher number of differences in the sequence. European lineages presenting one nucleotide change in respect to the reference in the 148 bp here

Victor H. Mair, editor

DNA Analysis on Ancient Corpses from Xinjiang: Further Results 543

considered can be found in about a half of the cases (Bertranpetit et al., 1995; Di Rienzo and Wilson, 1991; Francalacci et al., 1996; Piercy et al., 1993) whereas this frequency drops to 15% in Indian (Mountain et al., 1995) and Asian populations (Horse and Hayasaka, 1990) and down to almost zero in Africa (Vigilant et al., 1989). From this perspective, the sequence amplified from the ancient Xinjiang corpse is more likely related to continental European lineages. The nudeotide change found in the ancient sample, the 16,278 Timine, however, is quite common world-wide. Because of its high frequency in Africa, approaching 90% in the African population of the Herero (Vigilant, 1990; Stoneking ef al., 1991), it can be considered as ancestral, and it is not surprising to find it in many distant populations. Similar lineages can be found in a wide area ranging from the Basque country at the west corner of Eurasia, to all of Asia proper and the Americas (Bertranpetit et al., 1995; Ward et al., 1991). However, we should take into account the possibility of new independent mutations in this position, and, more importantly, the association of the 16,278 T with other mutations in the coding region, that may be different according to the ethnic group studied.

As previously mentioned, the coding region, because of its physiological importance in cellular energetic management, is less prone to neutral mutations than the control region, and the occurrence of both retromutations, and independent parallel mutations at the same position in two different lineages, is unlikely. For this reason, two individuals carrying identical changes in the coding region are supposed to derive from a common ancestor. It is possible to group different mitochondrial lineages sharing the same change in the coding region in families called “haplogroups”. In addition, relevant changes in the control region and those in the coding region can be almost biunivocally associated (Torroni ef al., 1996 in press). In Europe nine different haplogroups (H, I, K, J, T, U, V, W, and X) are present, and the loss of a Ddel restriction site in position 10,394 is present in all the European haplogroups but I, K and J. When combined, the nine haplogroups encompass virtually all the mitochondrial variants observed in Europeans (about 98% in Swedish, Finnish and Tuscan samples), while the overlap between European and non-European mtDNA variant is extremely limited (Torroni et al., 1996 in press). Haplogroup H is the more common in Europe, including about 40% of the mitochondrial lineages from different European populations such as Swedish, Finnish, Tuscan, Corsican and Sardinians (Morelli, 1996) and it has been observed in only 3 out of 1,175 non-Caucasoid subjects (Torroni eéf al., 1996 in press).

Other continents show specific haplogroups. For example, more than 70% of Africans belong to the haplogroup L (Chen et al., 1995), while Asians can be grouped in A, B, F, and M haplogroups, the latter

The Bronze Age and Early Iron Age Peoples of Eastern Central Asta

544 Paolo Francalacci

being the most frequent (about 55% of all East Asians and Siberians) (Torroni et al., 1994; Chen et al., 1995). Native Americans belong to four haplogroups of Asian origin (thus pre-dating the colonization of the Americas): A, B, and two (C, D) sub-haplogroups of M (Ward et al., 1991).

The result of the restriction analysis shows that the sample under study belongs to the haplogroup H since it shows the lack of the restriction sites Alu I at 7,025 and Dde I at 10,394, while it yelded negative results for attribution to the haplogroups T and M. It is worthy to note that the transition C-T in the control region at 16,278 is compatible with the attribution to the H haplogroup of this sample.

These preliminary results are in agreement with a possible European origin of the ancient Xinjiang corpses, but further research, especially focused on other continental-specific point mutations in the coding areas of the mitochondrial genome, and extended to other individuals, is still needed to define with more precision their phylogenetic relationships.

The parallelism between biological and linguistic evolution was first noted by Charles Darwin, who pointed out that the mechanism at the base of the differentiation of the languages (diffusion and subsequent isolation) is the same as that which works in the evolution of living beings. However, linguistic transmission is not only vertical (from parents to offspring) as in the case of the transmission of genes, but also horizontal (learning from neighbors). A single individual or an entire people can replace their language in a relatively short time, while obviously this cannot be done for genes. This can explain the incongruencies when comparing the linguistic affiliation of a population with its genetic pattern: for example, Sardinians are presumably of non Indo-European origin, but their language 1s clearly neo-Latin, while, on the other hand, Hungarians are genetically similar to their central European neighbors, while their language is deeply different. Nevertheless, in most cases, the correlation between the tree drawn from the genetic distances and that based on the linguistic families is strong, as shown by Cavalli-Sforza et al. (1988; 1995). In addition, language is not affected by natural selection. From this point of view, a phonetic or semantic change in a language can be considered as a neutral allele (evolutionarily important, as mentioned above) and transmitted without environmental influence. For these reasons, population genetics can help to find correlations among language families and, on the other hand, linguistic similarities can be indicative of a phylogenetic relationship.

The physical appearance of the individuals studied here shows Caucasian features, suggesting a relationship with the Tokharians, an Indo-European people who lived in the area during historical times. The documentation of the early presence of Caucasian people in Northwest China (and the knowledge of their closer affinities with

Victor H. Mair, editor

DNA Analysis on Ancient Corpses from Xinjiang: Further Results 545

either European or Indo-Iranian modern populations) could make an important contribution on the debated question of the spread of Indo-European languages. It should, however, be pointed out that genetic analysis on the desiccated corpses can shed light only on the origin of their mitochondrial lineage, while the long process of physical and cultural evolution of the ancient Xinjiang people is somewhat more complicated than the knowledge of a fragment of DNA, and it can be understood only by an integrated vision of genetic, linguistic, historical, archeological and anthropological records.


Many thanks are due to Dr. Peter Underhill (Stanford University) for performing the automated sequencing. This work was supported by the Alfred P. Sloan Fundation and by MURST 60% and CNR “Biologica Archive” grants.


Anderson S., Bankier A. T., Barre] B. G., De Bruijn M. 11., Coulson A. R., Sanger F., Schreier P. H., Sinith A. J. 11., Staden R. and Young G. 1981 “Sequence and organization of the huinan initochondrial genome”. Nature, 290:457-465.

Bertranpetit J., Sala J., Calafell F., Underhill P.A., Moral P. and Coinas D. 1995 “Human mitochondrial DNA variation at the origin of Basques”. Ann. Hum. Genet., 59:63-81.

Cavalli-Sforza L. L., Menozzi P. and Piazza A. 1994 Ilistory and Geography of Human Genes. Princeton: Princeton University Press.

Cavalli Sforza L. L., Piazza A., Menozzi P. and Mountain J. L. 1988 “Reconstruction of human evolution: bringing together geneuc, archaeological and linguistic data.” Proc. Natl. Acad. Sa. USA, 85:6002-6006.

Chen Y.-S., Torroni A., Excoffier L., Santachiara-Benerecetti A. S. and Wallace D.C. 1995 “Analysis of intDNA variation in African population reveals the most ancient of all huinan continent-specific haplogroups.” Am. J. //um. Genet., 57:133-149.

Di Rienzo A. and Wilson A. C.

1991 “Branching pattern in the evolutionary tree for human mitochondrial DNA.” Proc. Natl. Acad. Sci. USA, 88:1597-1601.

The Bronze Age and Early Iron Age Peoples of Eastern Central Asta

546 Paolo Francalacci

Francalacci P. and Warburton P. E. 1992 “Pre-amplification without primers (Pre-PCR): a inethod to extend ancient molecules.” Ancient DNA Newsletter, 2:10-11.

Francalacci P., Bertranpett J., Calafell F. and Underhill P. A. 1996 “Sequence diversity of the control region of mitochondrial DNA in Tuscany and its implications for the peopling of Europe.” Am. J. Phys Anthropol., 100:443-460.

Horai S. and Hayasaka K. 1990 “Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA.” Am. J. Lum. Genet., 462:828-842.

lioss M. and Paabo S. 1993 “DNA extraction from Pleistocene bones by a silica-based purification method.” Nucleic Acid Research, 21:3913-3914.

Morelli L. 1996 “Analisi della variabilita del genoma mitocondriale umano in Europa.” Unpublished Ph.D. Thesis, University of Sassari.

Mountain J. L., Hebert J. M., Bhattacharyya S., Underhill P. A., Ottolenghi C., Gadgil M. and CavalliSforza L. L. 1995 “Demographic history of India and initochondrial DNA sequence diversity.” Am. J. Hum. Genet., 56:979-992.

Paabo S. 1990 “Amplifying ancient data.” In: PCR protocols. A guide to methods and applications. Innis M., Gelfand D., Sninsky J., White T. Eds. New York: Academnic Press. Pp.159-166.

Perrson P. 1992 “A method to recover DNA froin ancient bones.” Ancient DNA Newsletter, 1:25-27.

Piercy R., Sullivan K. M., Benson N. and Gill P. 1993 “The application of mitochondrial DNA typing to the study of white Caucasian genetic identification.” Int. J. Leg Med., 106:85-90.

Saiki R. K., Scharf S., Faloona F. A., Mullis K. B., Horn G. T., Erlich I. A. and Arnheiin N. 1985 “Enzyinatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sycle cell aneimia.” Science, 230:1350-1354.

Stoneking M., Hedgecock D., Higuchi R. G., Vigilant L. and Erlich H. A. 1991 “Population variation of huinan mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes.” Am. J. Hum. Genet., 48:370-382.

Victor H. Mair, editor

DNA Analysis on Ancient Corpses from Xinjiang: Further Results 547

Torroni A., Huoponen K., Francalacci P., Petrozzi M., Morelli L., Scozzari R., Obinu D., Savountaus M. L. and Wallace D. C. 1996, in press “Classification of European mitochondrial DNA from an analysis of 3 European populations.” Genetics.

TorroniA., Miller J. A., Moore L. G., Zamudio S., Zhuang J., Drona T. and Wallace D. C. 1994 “Mitochondrial DNA analysis in Tibet. Implication for the origin of Tibetan population and its adaptation to high altitude.” Am. J. Phys. Anthropol., 93:189-199.

Vigilant L., Pennington R., Harpending H., Kocher T. D. and Wilson A. C. 1989 “Mitochondrial DNA sequences in single hairs from a southern African population.” Proc. Natl. Acad. Sci. USA, 86:93509354.

Vigilant L. 1990 “Control region sequences from African populations and the evolution of human mitochondrial DNA.” PhD thesis, University of California at Berkeley.

Ward R. L1., Frazier B. L., Dew-Jager K. and Paabo S. 1991 “Extensive mitochondrial diversity within a single Amerindian tribe. Proc. Natl. Acacl. Sci. USA, 88:8720-8724.


The Bronze Age and Early Iron Age Peoples of Eastern Central Asta


The Uyghurs, a Mongoloid-Caucasoid Mixed Population: Genetic Evidence and Estimates of Caucasian Admixture in the Peoples Living in Northwest China

Tongmao Zhao Laboratory of Immunogenetics, NIAID Twinbrook II Facility, National Institute of Health

The Chinese nation consists of 56 nationalities and is thought to have originated froin two genetically distinct subgroups of ancient Mongoloid population. However, the possible existence of racial adinixture in certain nationalities has been suggested. Some Caucasoid genes have been found among Uyghur, Kazak, Dongxiang, and [lui nationalities living in northwest China. Two race-specific genetic markers, immunoglobulin allotype Gm f;b haplotype and Rh blood group cde haplotype, were used to exainine the presence of Mongoloid-Caucasoid mixed ancestry and to estimate the amount of Caucasian adimixture (1m) in the Chinese nationalities. Assuming that the Mongols and the peoples of Middle Asia are the parental populations, the m values based on Gm and Rh can be estimated as 54% and 55% in Uyghurs, 34% and 36% in Kazaks, and 14% and 13% in Huis, respectively. In Dongxiangs the m is 26% on the basis of Gin.

The high proportion of Caucasoid genes found in the Uyghur cohort suggests that this population may be derived froin an ancient mixed population formed between a Caucasoid-derived population and a North Mongoloid population. One of the possible sources of Caucasoid genes are the peoples who came from Middle Asia by the way of the Silk Road. The precise tine of admixture occurrence and the direction of gene flow can not be determined by this data analysis. If the adinixture occurred 2,000 - 3,500 years ago, the adinixture rate per generation among Uyghurs would be estimated at 0.55% - 0.97%.


Since the classification of ABO blood group antigens at the beginning of this century, several hundred human genetic markers, including blood groups, proteins, human leukocyte antigens, and DNA polymorphism, have been determined at the levels of either gene products or DNA using immunological or biochemical methods. The distributions of these genetically polymorphic markers in different ethnic groups and geographic regions have been widely investigated (Mourant et al. 1976, Roychoudhury et al. 1988, Cavalli- Sforza et al. 1994). One of the few genetic markers that can be used to characterize different racial groups is the human immunoglobulin

Victor H. Mair, editor

Genetic Evidence and Estimates of Caucasian Admixture 549

Gm allotype. The close linkage of the immunoglobulin heavy chain genes results in the inheritance of fixed combinations of Gm factors as haplotypes. Gm haplotypes are very stable, and crossing over is extremely rare. Some haplotypes are characteristic for a particular race or population. For example, Gm f;b is thought to be a Caucasoid haplotype; Gm a;bst characterizes Mongoloid populations; Gm a;bc3z and Gm a;bc3 are found only in people of African descent. This has made Gm allotype a useful tool in anthropological studies, especially in the identification of ethnic groups and in the detection of gene drift and migration in ancient time (Johnson et al. 1977, Matsumoto et al. 1982, Steinberg et al. 1981).

The Chinese nation consists of the Han nationality and 55 minorities. Han is the major Chinese nationality, comprising approximately 93.3% of the total population. Although the modern Chinese nation is thought to have originated from two genetically distinct subgroups of ancient Mongoloid population (Zhao 1987, Zhao et al. 1987), the Caucasoid Gmf;b haplotype was found among Uyghur, Kazak, Dongxiang, and Hui nationalities living in northwest China (Zhao et al. 1983, 1987, 1989).

The existence of the Caucasoid Gm f;b haplotype among Uyghurs was reported for the first ime in 1983 (Zhao et al. 1983). This finding was confirmed by a comprehensive investigation (Zhao et al. 1989). In this paper, a hypothesis that the current Uyghurs may be derived from an ancient mixed population formed between a Caucasoid-derived population and a North Mongoloid population was tested. The amount of Caucasian-admixture (m) among Uyghurs and other minorities living in northwest China has been estimated.

Materials and Methods

Data selection. Data derived using a variety of polymorphic genetic markers (table 1) were subjected to analysis of racial admixture. Gm factors are genetic markers of immunoglobulin molecules. Rh, Kell, Lutheran, and Diego are red cell blood group antigens. Gm haplotype frequencies in the Chinese ( Zhao et al. 1987, 1989) and in other ethnic groups (Johnson et al. 1977, Matsumoto et al. 1982, Steinberg et al. 1981) were selected for use. Gene frequencies of red cell blood groups were taken from a previous publication (Zhao 1987). Additional data used for comparisons in this paper were obtained from Cavalli-Sforza’s well-known work (Cavalli- Sforza et al. 1994). The demographic characteristics and gene frequencies used in this study are shown in table 2, table 3 and figure 1, respectively. The Chinese populations, independent of nationality, are divided into two genetically distinct groups for this study. The geographical boundary between North Chinese and South Chinese is near the latitude of 30° N, roughly along the Yangtze River (Zhao et al. 1987).

The Bronze Age and Early Iron Age Peoples of Eastern Central Asta

550 Tongmao Zhao

Kazak @ Hui() 2a adhe ewer ernne= 2.@ Uyghur s Silk RoadA——9 ee Mongol on sokentnenan. Mecenen De ey ae. Sa Se DongxianR® —pruigy) 2 : mK 2 ws

7 a wy aoe ‘\ _ latitude \ ~ : 2A : 30° N 3 at ae Ee aoe fe ? Si : Ph: Pe oleae, eee

Figure 1. Map of China showing the localities of the populations studied for estimation of Caucasian admixture

Table 1. Polymorphic genetic markers used in this study

Name of locus Symbol Chromosome alleles used location Immunoglobulin GM1;GM3_ IGHGIG3 14q32.33 abs cn j b, z : 2 a; Rhesus blood group Ri 1p36.2-34 cde Kell blood group KEL 7q33 K Lutheran blood group LU 19q12-13 A Diego blood group DI 17q12-21 A

Table 2. List of the populations studied in this paper

: Population Main areas of Linguistic classification pauenEny (1981)* habitation fainily subfamily Uyghur 5,957,112 Xinjiang Altaic Turkic Kazak 907,582 Xinjiang Altaic Turkic Dongxiang 279,397 Ningxia Altaic Mongolian Mongol 3,411,657 Inner Mongolia Altaic Mongolian iui 7,219,352 Ningxia Sino-Tibetan Sinitic Han 936,703,824 all country Sino-Tibetan Sinitic

* From: State Statistical Bureau of the People’s Republic of China (1982)

Genetic analysis. Gene frequencies and haplotype frequencies were calculated using maximum likelihood estimation. Genetic distances were computed using Nei’s formula (Nei 1972). Phylogenetic trees were constructed by unweighted pair-group clustering methods (Sneath and Sokal 1973). The Bernstein formula for determining the extent of racial admixture (m) in a hybrid population was applied (Cavalli-Sforza and Bodmer 1971). The formula is m = (PH - PB) / (PA - PB), where PA and PB are the

Victor H. Mair, editor

Table 3. Gene and haplotype frequencies (x10°) in ethnic groups or geographic regions

Dong- Hui Wlui = North = South Near

Locus Allele Uyghur Kazak xiang Mongol (1)* (2) Chinese Chinese Iranian East Europe Uralic IGHGIG3 2a: 293 345 346 524 374 382 393 176 203 237 183 177 ZaXx,g 72 14] 151 202 274 220 179 81 79 29 79 70 za;bst 139 156 190 108 147 130 162 135 56 104 12 9 fa;b 124 123 135 166 108 =191 266 607 52 0 10 6 f;b 372 236 178 0 98 77 0 0 689 600 727 746 RH cde 174 130 * 45 75 x 38 7 280 285 362 367 LU A 28 7 * 0 ** ** 4 0 14 27 26 19 KEL K 36 at ** 4 ++ * 2 0 29 49 42 30 DI A 39 ** #* 34 we ee 57 38 2 10 1 0

* Two Hui populations were tested. Hui (1) living in Xinjiang ; Hui (2) living in Ningxia. ** : No data are available.

pisy’ posjuary uinsv7q fo sadoag ady uosy \pvy puv ady axuoig oy |

anjxmpy uvisponn) fo sapmysy pun souapiung ryauar)


552 Tongmao Zhao

frequencies in the parent populations, and PH is the frequency in the hybrid population. Estimation of admixture rate per generation was carried out using formula m(n) = 1 - ( 1 - m)n (Cavalli-Sforza et al. 1994). m(n) represents the proportion of admixture after n generations.

Results and Discussion

The distribution of the Caucasoid Gm f;b haplotype in the Chinese is restricted to portions of northwest China. China is a country of many nationalities. The 55 minorities, comprising only 6.7% of the total population, inhabit 50% to 60% of the country’s area. Most minorities are located in border regions. In an investigation on a large scale (Zhao et al. 1989), a total of 9,560 individuals from 74 Chinese geographical populations were phenotyped for Gm factors. These populations are derived from 24 nationalities which comprise 96.6% of the total population of China. The Caucasoid Gm f;b haplotype was found among only four nationalities (table 3). Uyghur and Kazak nationalities living in Xinjiang have relatively higher Gm f;b frequency (0.372 and 0.236 respectively) than the Dongxiang nationality living in Gansu and Huis living in Ningxia ( 0.178 and 0.098 respectively).

Estimation of extent of Caucasian admixture. The true admixture of two genetically distinct parental populations will generate a new population and all genes should give the same estimate of m (Cavalli- Sforza et al. 1994). In order to check the hypothesis of admixture, Gm f;b haplotype and Rh blood group cde haplotype were chosen as two markers to estimate the m value. The RH locus is composed of two structural genes, RHD and RHCE. The individuals can be subdivided into “Rh positive” and “Rh negative” according to the presence or absence of the major Rh D antigen on the red cell surface. The majority of Rh negative Caucasians lack the RHD gene. In Australia and Japan, some Rh negative people may arise by a partial deletion or nonsense mutation of the RHD gene (Daniels 1995). There are wide racial differences in the frequencies of the RH gene complex. About 15% of Caucasoid people are Rh negative with genotype cde/cde (Race and Sanger 1975), but only 0.1% - 0.4% among most of Chinese nationalities exhibit this genotype (Zhao 1987).

Several combinations of parental populations were examined for the extent of Caucasian admixture among Uyghurs, Kazaks, Dongxiangs and Huis (data not shown). The most similar m values based on Gm f;b and Rh cde haplotype frequencies were observed among a pair of Mongols and Iranians (table 4). If they were the true parental populations, the amount of Caucasian admixture among Uyghurs was as high as 54%; among Kazaks it was 34%. Low levels of Caucasian admixture were observed among Dongxiangs (25%) and Huis (11% -14%).

Victor H. Maztr, editor

Genetic Evidence and Estimates of Caucasian Admixture 553

Table 4. Estimated Caucasian admixture (m) in Uyghurs, Kazaks, Dongxiangs and Huis using Gm f;b and Rh cde haplotypes for Bernstein’s estimate of m

Parental populauion*

Iranian Near East Europe Uralic

Nationality Gmf;b cde Gmf;b cde Gmf;b cde Gmf:b cde

Uyghur 0.540 0.549 0.620 0.538 0.512 0.407 0.499 0.401 Kazak 0.343 0.362 0.393 0.354 0.325 0.268 0.316 0.264 Hui (1) 0.142 0.128 0.163 0.125 0.135 0.095 0.131 0.093 Hui (2) 0.112 ae 0.128 a 0.106 *e 0.103 **

Dongxiang 0.258 s 0.297 ** 0.245 * 0.239 ¥*

*: Assuming that the Mongol is another parental population. **: No data are available

Phylogenetic tree analysis. On the basis of Gm and computed genetic distances, the phylogenetic tree of the Chinese nation can be divided into two clusters: South Chinese and a cluster consisting of two subgroups (fig. 2A). Uyghurs, Kazaks, and Dongxiangs belong to one subgroup; the North Chinese, Mongols and Huis form the other subgroup. However, when Caucasoid-derived populations were taken into account, Uyghurs, Kazaks, and Dongxiangs were clustered together with populations of Iranians, English, Finns and Uralians. This result is reasonable, as there are relatively high amounts of Caucasian admixture among these nationalities. Huis have a low level of Caucasian admixture and remain with the North Chinese and Mongol subgroup. The tree pattern (fig. 2B) meets the features of a mixed population that has a shorter branch than the parental populations from which it originates (Cavalli-Sforza 1994). A similar pattern was observed in the tree for three major ethnic groups (fig. 3).




Uralian Uyghur Uvghur Kazak Kazak Dongxiang Dongxiang Hui Hui Mongol Mongol North Chinese North Chinese South Chinese South Chinese

(A) (B) Figure 2. Tree constructed on the basis of Gm haplotype frequencies. (A) Tree for 6 Chinese nationalities. (B) Tree for 11 ethnic groups.

The Bronze Age and Early Iron Age Peoples of Eastern Central Asia

550 Tongmao Zhao

yo rs

@ Hui(1) ae ween rene &_ @ Uyghur pe

Silk Road—=Bon “—"@ ee ra

Donne Hei?) oe

Pec bit

late ~. a a _ latitude te i ; i Me ~ 30°N oF pay . 4 ra Us PRG saan Slag NE on o i ae pes ey : 4 2 Li E je f . . Se oe a i! a‘ . ws Lee ; vm WI

Figure 1. Map of China showing the localities of the populations studied for estimation of Caucasian adinixture

Table 1. Polymorphic genetic markers used in this study

Name of locus Symbol Chromosome alleles used location Immunoglobulin GM1;GM3 IGHG1G3 = 14q32.33 74,8 "hf a 2a,0, ja; Rhesus blood group Ri 1p36.2-34 cde Kell blood group KEL 7q33 K Lutheran blood group LU 19q12-13 A Diego blood group DI 17q12-21 A

Table 2. List of the populations studied in this paper

: : Population Main areas of Linguistic classification pane (1981)* habitation —_ family subfainily Uyghur 5,957,112 Xinjiang Altaic Turkic Kazak 907,582 Xinjiang Altaic Turkic Dongxiang 279,397 Ningxia Altaic Mongolian Mongol 3,411,657 Inner Mongolia Altaic Mongolian Hui 7,219,352 Ningxia Sino-Tibetan Sinitic Han 936,703,824 all country Sino-Tibetan Sinitic

* From: State Statistical Bureau of the