The Kalash represent an enigmatic isolated population of Indo-European speakers who have been living for centuries in the Hindu Kush mountain ranges of present-day Pakistan. Previous Y chromosome and mitochondrial DNA markers provided no support for their claimed Greek descent following Alexander III of Macedon’s invasion of this region, and analysis of autosomal loci provided evidence of a strong genetic bottleneck. To understand their origins and demography further, we genotyped 23 unrelated Kalash samples on the Illumina HumanOmni2.5M-8 BeadChip and sequenced one male individual at high coverage on an Illumina HiSeq 2000.

Comparison with published data from ancient hunter-gatherers and European farmers showed that the Kalash share genetic drift with the Paleolithic Siberian hunter-gatherers and might represent an extremely drifted ancient northern Eurasian population that also contributed to European and Near Eastern ancestry. Since the split from other South Asian populations, the Kalash have maintained a low long-term effective population size (2,319–2,603) and experienced no detectable gene flow from their geographic neighbors in Pakistan or from other extant Eurasian populations. The mean time of divergence between the Kalash and other populations currently residing in this region was estimated to be 11,800 (95% confidence interval = 10,600−12,600) years ago, and thus they represent present-day descendants of some of the earliest migrants into the Indian sub-continent from West Asia.


Human populations show subtle allele-frequency differences that lead to geographical structure, and available methods thus allow individuals to be clustered according to genetic information into groups that correspond to geographical regions. In an early worldwide survey of this kind, division into five clusters unsurprisingly identified (1) Africans, (2) a widespread group including Europeans, Middle Easterners, and South Asians, (3) East Asians, (4) Oceanians, and (5) Native Americans. However, division into six groups led to a more surprising finding: the sixth group consisted of a single population, the Kalash.1 The Kalash are an isolated South Asian population of Indo-European speakers residing in the Hindu Kush mountain valleys in northwest Pakistan, near the Afghan frontier. With a reported census size of 5,000 individuals, they represent a religious minority with unique and rich cultural traditions. DNA samples from the Kalash have been distributed as part of the cell-line panel from the Foundation Jean Dausset’s Human Genome Diversity Project and Centre d’Etude du Polymorphisme Humain (HGDP-CEPH) for over a decade and have formed part of several genetic analyses.2 Analyses of uni-parental (Y chromosome and mitochondrial) DNA markers characterized the Kalash as a small population that had undergone a population bottleneck during their recent migration to their present-day abode.3, 4 This was confirmed by the study of genome-wide autosomal SNPs, which highlighted a strong pattern of genetic drift in this population.5 A recent exploration of admixture at fine scales suggested that a major admixture event between the Kalash and present-day western Eurasians occurred between 990 and 210 BCE and related this to Alexander’s invasion of the Indian sub-continent in 327–326 BCE,6 although no evidence of such admixture was detected by an analysis of Y chromosome and autosomal short tandem repeat (STR) variation in the Kalash.7, 8

To further investigate the Kalash population’s demographic history and origins, we genotyped additional unrelated Kalash samples on the Illumina bead chip and sequenced one male individual at high coverage. Our aim was to assess whether the Kalash were a recent or an ancient isolate and categorize the extent of genetic isolation and admixture, if any, with extant or archaic humans and thus better understand the reasons for their unique position in worldwide comparisons