The Taiwan Biobank Provides Insights into the Health and History of Han Chinese Subjects


Over the last two decades, several large, population-based biobanks have been set up to collect blood and other biospecimens along with a standard set of clinical data  in order to study genetic basis of several disorders. A common focus of the early population-based biobanks has been to identify genetic variants associated with disease. Many disease-causing mutations are rare and population-specificwhich hasmotivated the development of biobanks around the world1. The Taiwan Biobank (TWB) was established in 2012 and consists of a cohort of 200,000 individuals from the general Taiwanese population with no cancer diagnosis at the time of enrollment. The majority of Taiwanese are Han Chinese (over 99%) immigrated from different provinces of China and minority of them are Taiwanese aboriginals. In this study by Wei et al., the authors present the whole genome sequencing (WGS) data as well as genotyping results from the first 103,106 participants of the TWB. This is the largest publicly available genetic database of individuals with East Asian ancestry2.

Demographic and health-related survey data for 103,106 individuals, together with WGS data (1492 individuals), genotyping data (27,737 typed on the TWBv1 custom array and 75,369 on the TWBv2 array, with 1463 typed on both), and high-resolution allele typing of 6 HLA alleles (1101 individuals) were obtained from the Taiwan Biobank with the approval from the respective ethical committees. Additional WGS data from 64 individuals were obtained from the Pan-Asian Population Genomics Initiative and the Taiwan Han Chinese Sequence Database. In addition, principal component analysis, novel allele analysis, HLA type prediction and ABO blood type imputations were also performed on study subjects.

The authors demonstrate that ABO blood types and HLA types can be accurately inferred from an inexpensive commercial SNP array. The authors found that 21.2% of the population are carriers of known gene mutations responsible for recessive genetic diseases, 4.7% have known gene mutations causing autosomal dominant diseases, and 3.1% carry known gene variants causing cancer susceptibility. Further, they found that 87.3% of the population carry variants that alter their ability to metabolize commonly prescribed drugs or mark them for susceptibility for severe adverse drug reactions (ADRs). This information is extremely valuable for both clinicians and patients. With imputed HLA genotypes available in the patients’ medical record, the physician can prescribe medications to patients without the HLA genotypes responsible for specific drug-induced ADRs with confidence and use alternative medications for patients with the HLA genotypes that put them at risk for ADRs2. Additionally, the population allele frequencies of several pathogenic variants were higher than those predicted by disease prevalence, probably due to incomplete penetrance orpreviously undiagnosed cases with milder clinical symptoms in these autosomal dominant diseases.

In summary, the authors generated a large reference panel that greatly improved the imputation accuracy of SNP genotyping data and designed a custom SNP array optimized for genetic studies in the Han Chinese population, which accounts for 19% of the world’s population. Furthermore, they obtained genetic testing results for thousands of known risk variants and simultaneously collected genetic profiles to assess for common diseases and future genome-wide association studies (GWAS), which have great clinical value. Overall, this study shows that combining comprehensive genetic testing in a population setting can serve as a model for precision health management.


According to the authors “The Taiwan Biobank was created in part to catalyze future medical genetics studies in Taiwan, and the sample size of individuals with dense SNP array data in the TWB (n = 103,106) is several times larger than from comparable Biobanks in Japan and China4,5. In addition, our generation of a large reference panel and development of a custom SNP array makes the resulting TWB genotype data much more valuable than comparable studies that rely on existing European-biased SNP arrays and reference panels for genotyping and imputation6. In particular, the Taiwan Biobank array includes thousands of Mendelian disease mutations and known pathogenic variants. So, we can cheaply and efficiently conduct thousands of genetic tests on the participants while simultaneously collecting genetic profiles that can be used for PRS calculations for common diseases and future GWAS.”


  1. Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
  2. Wei, CY., Yang, JH., Yeh, EC. et al.Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese.  Med. 610 (2021).
  3. Phillips, E. J. et al. Clinical pharmacogenetics implementation consortium guideline for HLA genotype and use of carbamazepine and oxcarbazepine: 2017 update.  Pharm. Ther. 103, 574–581 (2018).
  4. Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up.  J. Epidemiol. 40, 1652–1666 (2011).
  5. Kuriyama, S. et al. The Tohoku Medical Megabank Project: design and mission.  Epidemiol.26, 493–511 (2016).
  6. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).