全球中草药年产值已突破万亿元,产业发展依赖于药用植物资源的可持续开发与高效利用。然而,药用植物资源以野生为主,全球已收录的28 000余种药用植物中,仅400余种实现驯化栽培。中草药种植盲目引种、种质混杂等问题,导致中药材质量参差不齐,影响临床用药的安全性与有效性。此外,中药活性物质在基原植物中含量低,化学合成难度大,其药物开发及利用严重威胁药用植物资源的可持续发展。针对中药材品质下降、活性物质含量低等问题,陈士林院士团队提出本草基因组学并发起千种本草基因组计划,该计划拟完成千种药用植物全基因组测序与解析,系统揭示活性物质生物合成、调控及运输机制。在此基础上,通过合成生物学技术实现天然药物的高效生产,借助基因组辅助育种培育高产优质的中药材新品种,为中药资源的可持续利用提供全链条的科技支撑(
图1)。基于本草基因组学发展趋势和应用需求分析,未来药用植物资源挖掘与创新利用有望在以下几方面得到更为系统且深入的研究。
(1)药用植物全基因组测序研究。中药材品质形成的分子机制、道地药材形成的分子基础等科学问题的阐释是中药材产业高质量发展的重要基础。然而,药用植物基因组信息不清严重限制其分子遗传学等前沿领域的研究。药用植物基因组杂合度高、重复序列高,相较于模式物种及作物等,其基因图谱测序及组装进度较慢。近年来,随着高通量测序读长提升、成本降低及生信算法的迭代优化,药用植物全基因组测序已实现从“草图”到“染色体水平”的跨越,组装至T2T级别(端粒到端粒无间隙)基因组已成为可能。本草基因组学研究团队在国际上发布了人参(Panax ginseng)、黄花蒿(Artemisia annua)、黄连(Coptis chinensis)、穿心莲(Andrographis paniculata)、紫苏(Perilla frutescens)、金银花(Lonicera japonica)、丹参(Salvia miltiorrhiza)、栀子(Gardenia jasminoides)、北豆根(Menispermum dauricum)、关黄柏(Phellodendron amurense)、千金藤(Stephania japonica)等代表性中药材基原植物的基因组图谱。截至目前,据不完全统计,已有440余种药用植物已完成基因组测序,这些海量数据为药用植物分子遗传学发展奠定坚实的基础。随着“千种本草基因组计划”的深入推进,解析道地药材形成的遗传基础、种质与环境互作Hubs及“基因组-代谢物-药效”关联网络,将是药用植物全基因组测序研究延伸的重点方向。这些研究工作有助于提升药用植物资源利用效率,加速活性物质生物合成途径解析与合成生物学体系构建,指导优质种质资源筛选与示范种植,推动中药资源从传统采集向高效、绿色、可持续开发转型。
(2)中药活性物质生物合成研究。中药活性物质作为结构复杂多样的天然产物,不仅是中药发挥药效的物质基础,更是创新药物研发的关键源头。其含量与积累水平直接决定中药材的品质属性,因此,系统解析中药活性物质的生物合成途径及调控机制,对阐释中药材品质形成机理、指导优良品种选育具有重要的理论价值与实践意义。然而,中药活性物质生物合成研究面临多重挑战:其生物合成途径往往冗长且复杂,涉及催化酶种类繁多,基因功能冗余性高,导致关键基因挖掘与机制解析难度较大,相关研究进展相对缓慢。
近年来,部分具有重大药用价值的中药活性物质生物合成途径受到国际学术界广泛关注。基于多组学整合的研究策略取得了一系列突破性成果。例如,2024年《Science》期刊报道了南方红豆杉(Taxus chinensis var. mairei)中抗癌药物紫杉醇(典型二萜类化合物)生物合成途径的研究进展。闫建斌等团队通过构建南方红豆杉高质量基因组图谱,并整合转录组、代谢组等多组学数据,系统揭示了紫杉醇氧杂环丁烷结构单元的形成机制,该成果入选中国生命科学十大进展。本草基因组学研究团队成功解析了小檗碱、青藤碱、千金藤素等苄基异喹啉类生物碱的生物合成关键酶及其催化机制,相关成果入选中医药十大学术进展。此外,青蒿素(抗疟)、人参皂苷(免疫调节)、长春花碱(抗癌)、七叶皂苷(抗炎)、黄芪甲苷(免疫增强)、穿心莲内酯(抗炎)、西红花苷(抗抑郁)及丹参酮(心脑血管保护)等代表性中药活性物质的生物合成也已被系统解析。由于中药活性物质的物种分布特异性强,且受限于多数药用植物基因组资源尚不完善,现有研究仍存在显著不足。未来亟需进一步聚焦关键结构基团形成及酶催化机制等科学问题,系统解析更多活性物质生物合成网络。中药活性物质生物合成机制的解析,为合成生物学技术创新和中药材优良品种选育提供规模化的数据支撑与技术储备。
(3)中药活性物质绿色生物制造。中药来源的天然药物如青蒿素、紫杉醇等临床需求巨大,但其含量在基原植物中普遍较低,化学合成路径复杂,导致传统获取方式面临严峻挑战。基于微生物或烟草等宿主系统的绿色生物制造技术,通过重构中药活性物质的代谢路径,已成为解决资源依赖与环境保护问题的有效手段。例如,青蒿酸(25 g·L-1)、天麻素(2.1 g·L-1)、红景天苷(7 .5 g·L-1)、人参皂苷(1 g·L-1)等代表性活性物质已实现微生物异源高效合成,具有工业化生产的前景。然而,大多数已完成生物合成途径解析的中药活性物质,其绿色生物制造仍处于合成生物学发展的初级阶段。现有研究多局限于微生物或烟草(Nicotiana benthamiana)体系的异源从头合成,普遍面临产量低、效率差等问题,如小檗碱、长春花碱、大麻素等活性物质的合成效率仍处于实验室研究阶段(产量在μg·L-1、mg·L-1级别)。其根本原因在于生物合成途径涉及基因数量多、酶催化功能杂泛性高、催化效率低下、底盘细胞代谢兼容性差及发酵工艺优化不足等多重复杂因素。为突破瓶颈,需系统开展关键酶催化机制解析以阐明影响立体选择性与催化效率的结构基础,通过理性设计改造关键酶提升催化性能,整合基因编辑、模块化途径组装及动态调控等技术优化合成途径与代谢流平衡,增强底盘细胞鲁棒性,实现中药活性物质异源高效生物制造,推动中药资源产业可持续发展。
(4)分子辅助育种选育中药材优良新品种。药用植物育种不同于作物、经济或园艺植物,其目标性状聚焦于活性物质含量等独特指标,且以多年生为主,传统育种周期长、效率低的问题突出。基于本草基因组学研究策略,结合种质资源评估与群体基因组重测序,关联活性物质合成、调控及药用植物发育相关的分子标记,可高效选育高产、高抗、高含量等优质新品种,推动规范化种植与产业化推广。例如,DNA标记辅助育种结合系统选育技术成功培育出首个三七(Panax notoginseng)抗病新品种“苗乡抗七1号”,二年生及三年生三七根腐病发生率分别下降43.6%和62.9%;通过黄花蒿分型基因组解析青蒿素合成进化优势,选育的“研青1号”青蒿素含量达2.11%;基于紫苏群体特异SNP标记指纹图谱,选育出叶籽两用、丰产、高抗、耐瘠的“中研肥苏1号”。然而,药用植物关键性状多由大量微效数量性状位点(QTL)协同控制,且活性物质生物合成涉及复杂代谢调控网络,传统QTL定位方法难以全面解析其遗传基础。全基因组选择育种理论上可突破多性状协同改良的瓶颈,但因药用植物遗传背景解析不足、代谢途径与调控机制研究滞后,其在实际育种中的应用仍受限制。“千种本草基因组计划”通过构建高质量基因组数据库与群体遗传变异图谱,将为全基因组选择育种技术提供关键数据支撑,推动药用植物育种效率的提升。
随着人工智能、组学大数据与深度学习技术的深度融合,药用植物资源开发与利用正迎来智能化、精准化的新范式。针对中药活性物质在基原植物中含量低微、化学提取效率低下等瓶颈问题,本草基因组学团队创新性地构建了全球首个亿量级草药基因编码天然多样性成分库,融合深度学习算法推演天然产物生物合成途径及其中间体结构,显著加速了创新药物发现进程。基于AlphaFold等蛋白质结构预测模型,可以精准解析药用植物活性物质生物合成关键酶的三维构象,推测其潜在的催化机制,并结合图神经网络对代谢网络拓扑结构的系统解析,可以精准挖掘高效催化药用植物活性物质生物合成的关键酶元件。在生物制造领域,依托“设计-构建-测试-学习(DBTL)”循环框架,可以实现微生物底盘细胞的高效改造与中药活性物质的定向合成。未来,通过整合深度学习技术与群体基因组学数据,有望开发精准高效的药用植物全基因组选择育种平台,指导分子辅助育种,筛选优良药用植物新种质。人工智能、组学大数据及深度学习技术的创新实践与深度融合,有望实现药用植物资源开发从传统的经验依赖型模式向数据智能驱动的新型研发范式转型升级。
Global annual output of traditional Chinese medicine has exceeded one trillion yuan. Development of the industry relies on sustainable development and high-efficient utilization of medicinal plant resources. However, most medicinal plant resources are wild; up to date, among more than 28 000 medicinal plants recorded worldwide, only about 400 plant species have been domesticated and cultivated. Issues such as planless introduction and mixed germplasm in cultivation of traditional Chinese medicinal herbs have led to inconsistent quality of Chinese herbal medicines, influencing the safety and efficacy of clinical medication. Additionally, the active compounds derived from medicinal plant resources are characterized by low concentration and difficulty in chemical synthesis, which severely limit sustainable development of medicinal plant resources. To solve the crucial issues, Chen Shilin's team proposed Herbgenomics and initiated the “1K Herb Genomes Project”. This project planned to complete whole-genome sequencing and analysis of 1K medicinal plants, systematically revealing the biosynthesis, regulation, and transport mechanisms of active substances. On this basis, through synthetic biology technology, natural drugs can be produced efficiently, and high-yielding and high-quality new varieties of medicinal materials can be bred using genome-assisted breeding, providing comprehensive technological support for the sustainable use of Chinese medicinal resources(
Fig.1). Based on the development trends and application demands of herbgenomics, future research on the exploration and innovative utilization of medicinal plant resources is expected to be more systematic and in-depth in the following areas.
(1) Whole genome sequencing of medicinal plants. Interpretation on the molecular mechanisms regulating quality formation of Chinese medicinal materials, particularly authentic medicinal materials, is an important basis for high-quality development of Chinese medicinal materials industry. However, unclear genomic information of medicinal plants severely limits the research in the frontier fields such as molecular genetics. The genomes of medicinal plants exhibit species specificity, including high heterozygosity and extensive repeat sequences. This complexity led to slower progress in genome sequencing and assembly compared to model species and crops. In recent years, with improvements in read length, cost reduction, and the iterative optimization of the bioinformatic algorithms, the whole genome sequencing of medicinal plants has achieved a leap from “scaffolds” to “chromosome level”, making it possible to assemble the T2T level (telomere to telomere without gap) genome. In recent years, the herbal genomics research team has released the genome map of representative Chinese herbal medicines Panax ginseng, Artemisia annua, Coptis chinensis, Andrographis paniculata, Perilla frutescens, Lonicera japonica, Salvia miltiorrhiza, Gardenia jasminoides, Menispermum dauricum, Phellodendron amurense, Stephania japonica. Up to now, according to incomplete statistics, more than 440 species of medicinal plants have completed genome sequencing. These massive datasets have laid a solid foundation for the development of molecular genetics of medicinal plants. Based on a newly advancement of “1K Herb Genomes Project”, elucidating genetic foundation of medicinal plants, hubs in regulation of interactions between germplasm and environment, and “genome-metabolite-pharmacological effect” association networks will be the key directions for extending research on herbgenomics. These studies will help improve the utilization efficiency of medicinal plant resources, accelerate the dissection of biosynthetic pathways of active compounds, and construct synthetic biology systems. Furthermore, the results will guide the screening and demonstration plantation of high-quality germplasm resources, and promote the transformation of traditional Chinese medicine resources from conventional collection to efficient, green, and sustainable development.
(2) Biosynthesis of active compounds in traditional Chinese medicine. As a natural product with complex and diverse structures, the active compounds of traditional Chinese medicine are not only the material basis for efficacy of traditional Chinese medicine, but also the key source of innovative drug research and development. Its content and accumulation level directly determine the quality attribute of Chinese herbal medicine; therefore, a systematic analysis of biosynthesis pathway and regulation mechanism of active compounds has important theoretical value and practical significance to explain the formation mechanism of Chinese herbal medicine quality. However, research on biosynthesis of active compounds in traditional Chinese medicine is facing multiple challenges due to its lengthy and complex networks in biosynthetic pathway and the promiscuity of catalytic enzymes.
In recent years, the biosynthetic pathways of some active compounds of traditional Chinese medicine have been widely concerned. Multi-omics integration-based research strategy has achieved a series of breakthrough results. For instance, in 2024, the journal Science reported the research progress of biosynthetic pathway of an anticancer drug taxol (typical diterpenoid) in Taxus chinensis var. mairei. Yan Jianbin’s team and the other teams systematically revealed the biosynthetic pathway of taxol, and elucidated the formation mechanism of taxol oxetane ring structural units, which was selected as one of the top ten advances in life science in China. Chen Shilin's team has successfully deciphered the key enzymes and their catalytic mechanisms in biosynthesis of berberine, sinomenine, cepharanthine, and other benzylisoquinoline alkaloids. The related achievements have been selected as one of the top ten academic advancements in traditional Chinese medicine. Additionally, the biosynthesis of representative active compounds such as artemisinin (anti-malaria), ginsenosides (immune regulation), vinblastine (anti-cancer), aescin (anti-inflammatory), astragaloside (immune enhancement), andrographolide (anti-inflammatory), crocin (anti-depression), and tanshinone (cardio-cerebral vascular protection) has also been systematically studied. Due to the restricted species specificity of active compounds and limitation of incomplete genomic resources of most medicinal plants, existing studies still exhibit significant deficiencies. In the future, there is an urgent need to further focus on scientific issues such as the formation of key structural groups and enzyme catalytic mechanisms, and systematically analyze the biosynthetic networks of more active compounds.
(3) Metabolic engineering of active compounds from traditional Chinese medicine. There is a huge clinical demand for natural drugs derived from traditional Chinese medicine, such as artemisinin and paclitaxel, but low content in the original plants and complexity in chemical synthesis pathways have led to the huge challenge for traditional acquisition methods. Thus, green biomanufacturing technologies based on host systems like microorganisms or tobacco have become an effective way to solve resource dependency and environmental protection issues by reconstructing metabolic pathways of bioactive compounds from traditional Chinese medicine. For example, artemisinic acid (25.0 g∙L-1), gastrodin (2.1 g∙L-1), salidroside (7.5 g∙L-1), and ginsenoside (1.0 g∙L-1) have achieved efficient heterologous synthesis through microorganisms, showing prospects for industrial production. However, most bioactive compounds derived from traditional Chinese medicine with a completed analysis on biosynthetic pathway are still in an early stage of development in green biomanufacturing. Current research is mostly limited to heterologous de novo synthesis in microbial or tobacco (Nicotiana benthamiana) systems, facing common problems such as low yield and poor efficiency, for example, the synthesis efficiency of active substances such as berberine, vincristine, and cannabinoids is still at the laboratory research stage (with the yield at a level of μg·L-1 or mg·L-1). Fundamental reasons lie in the complexity of acting factors including a large number of genes involved in biosynthetic pathways, high functional diversity of enzymes, low catalytic efficiency, poor compatibility of chassis cell metabolism, and insufficient optimization of fermentation processes. To break through the bottlenecks, it is necessary to systematically conduct key enzyme catalytic mechanisms to elucidate a structural basis affecting stereo-selectivity and catalytic efficiency. By rationally designing and modifying key enzymes, catalytic performance can be enhanced. Integrating technologies such as gene editing, modular pathway assembly, and dynamic regulation, the synthesis pathways and metabolic flow balance can be optimized to strengthen the robustness of host cells, achieving high-efficient heterologous biosynthesis of active compounds, and promoting the sustainable development of traditional Chinese medicine resource industry.
(4) Molecular marker-assisted breeding for developing new superior varieties of medicinal plants. Breeding of medicinal plants, differing from that of crops, economic or horticultural plants, has target traits focusing on unique traits such as content of active compounds. Traditional breeding approaches often suffer from long cycles and low efficiency, limiting rapid improvement. To address these challenges, an integrated strategy combining herbal genomics, germplasm resource evaluation, and population resequencing can be employed. By identifying molecular markers associated with the synthesis, regulation, and accumulation of active compounds, as well as growth-related traits, breeders can efficiently select for high-yield, high-resistance, and high-quality varieties with enhanced bioactive content. The implementation of this approach accelerates the development of superior cultivars, facilitating standardized cultivation and industrial-scale promotion. For example, using DNA marker-assisted breeding combined with systematic selection technology, a new disease-resistant variety of Notoginseng (Panax notoginseng), “Miaoxiang Kangqi No.1”, was successfully bred; the incidence of root rot in two-year-old and three-year-old Notoginseng decreased by 43.6% and 62.9%, respectively. Through the analysis of Artemisia annua genotype-genome, the “Yanqing No. 1” variety with artemisinin content reaching 2.11% was selected. Based on the specific SNP marker fingerprint of Perilla frutescens populations, the “Zhongyan Feisu No.1” variety, which is both leaf and seed usable, high-yielding, high-resistant, and tolerant to poor soils, was developed. However, many key traits in medicinal plants are controlled by a large number of minor quantitative trait loci (QTLs), and the biosynthesis of active compounds involves complex metabolic regulatory networks. This means the difficulty in using traditional QTL positioning methods to fully decipher their genetic foundations. Genome-wide selection breeding can theoretically break through the bottleneck of multi-trait synergistic improvement, but its application in practical breeding is still limited due to the lack of genetic background of medicinal plants. Through the construction of high-quality genome database and population genetic variation map, the “1K Herb Genomes Project” will provide key data support for genome-wide selection breeding technology and promote the efficiency of medicinal plant breeding.
With the deep integration of artificial intelligence, omics data, and deep learning technology, development and utilization of medicinal plant resources are ushering in a new paradigm of intelligence and precision. To address bottlenecks such as low content of active compounds in their original plants and inefficient chemical extraction, the herbgenomic team represented by Prof. Chen Shilin innovatively constructed the world’s first billion-level medicinal herb gene-encoded natural diversity component library. By integrating deep learning algorithms to deduce the biosynthesis pathways and intermediate structures of natural products, the process of discovering innovative drugs has been significantly accelerated. Based on protein structure prediction models like AlphaFold, we can accurately analyze the three-dimensional conformation of key enzymes involved in biosynthesis of active compounds in medicinal plants, speculate on their potential catalytic mechanisms, and through systematic analysis of metabolic network topological structures using graph neural networks, precisely explore key enzyme components that efficiently catalyze the biosynthesis of active substances in medicinal plants. In the field of biomanufacturing, based on the “Design-Build-Test-Learn (DBTL)” cyclic framework, it is possible to achieve efficient modification of microbial chassis cells and direct synthesis of active compounds from traditional Chinese medicine. In the future, by integrating deep learning technology with population genomics data, it is expected to develop precise and efficient whole-genome selection breeding platforms for medicinal plants, guiding molecular marker-assisted breeding and screening for excellent medicinal plant germplasm. The innovative practice and deep integration of artificial intelligence, omics data, and deep learning technology are expected to transform the development of medicinal plant resources from a traditional experience-dependent model to a new research paradigm.
Topic editor:HU Yanbo,SUN Jingjue.
推荐阅读Suggested further reading
[1] CHEN S L,SONG J Y,SUN C,et al.Herbal genomics: examining the biology of traditional medicines[J].Science,2015,347(6219):S27-S29.
[2] SUN W,XU Z C,SONG C,et al.Herbgenomics:decipher molecular genetics of medicinal plants[J].Innovation,2022,3(6):100322.
[3] XU J,CHU Y,LIAO B S,et al.Panax ginseng genome examination for ginsenoside biosynthesis[J].Gigascience,2017,6(11):1-15.
[4] LIAO B S,SHEN X F,XIANG L,et al.Allele-aware chromosome-level genome assembly of Artemisia annua reveals the correlation between ADS expansion and artemisinin yield[J].Molecular Plant,2022,15(8):1310-1328.
[5] LIU Y F,WANG B,SHU S H,et al.Analysis of the Coptis chinensis genome reveals the diversification of protoberberine-type alkaloids[J].Nature Communications,2021,12(1):3276.
[6] SUN W,LENG L,YIN Q G,et al.The genome of the medicinal plant Andrographis paniculata provides insight into the biosynthesis of the bioactive diterpenoid neoandrographolide[J].The Plant journal,2019,97(5):841-857.
[7] ZHANG Y J,SHEN Q,LENG L,et al.Incipient diploidization of the medicinal plant Perilla within 10,000 years[J].Nature Communications,2021,12(1):5508.
[8] PU X D,LI Z,TIAN Y,et al.The honeysuckle genome provides insight into the molecular mechanism of carotenoid metabolism underlying dynamic flower coloration[J].New Phytologist,2020,227(3):930-943.
[9] XU H B,SONG J Y,LUO H M,et al.Analysis of the genome sequence of the medicinal plant Salvia miltiorrhiza[J].Molecular Plant,2016,9(6):949-952.
[10] XU Z C,PU X D,GAO R R,et al.Tandem gene duplications drive divergent evolution of caffeine and crocin biosynthetic pathways in plants[J].BMC Biology,2020,18(1):63.
[11] AN Z J,GAO R R,CHEN S S,et al.Lineage-specific CYP80 expansion and benzylisoquinoline alkaloid diversity in early-diverging eudicots[J].Advanced Science,2024,11(19):e2309990.
[12] XU Z C,TIAN Y,WANG J,et al.Convergent evolution of berberine biosynthesis[J].Science Advances,2024,10(48):eads3596.
[13] LENG L,XU Z C,HONG B X,et al.Cepharanthine analogs mining and genomes of Stephania accelerate anti-coronavirus drug discovery[J].Nature Communications,2024,15(1):1537.
[14] JIANG B,GAO L,WANG H J,et al.Characterization and heterologous reconstitution of Taxus biosynthetic enzymes leading to baccatin Ⅲ[J].Science,2024,383(6683):622-629.
[15] PADDON C J,WESTFALL P J,PITERA D J,et al.High-level semi-synthetic production of the potent antimalarial artemisinin[J].Nature,2013,496(7446):528-532.
[16] WEI G F,ZHANG G Z,LI M Z,et al.Panax notoginseng:panoramagram of phytochemical and pharmacological properties,biosynthesis,and regulation and production of ginsenosides[J].Horticulture Research,2024,11(8):uhae170.
[17] ZHANG J,HANSEN L G,GUDICH O,et al.A microbial supply chain for production of the anti-cancer drug vinblastine[J].Nature,2022,609(7926):341-347.
[18] SUN W,YIN Q G,WAN H H,et al.Characterization of the horse chestnut genome reveals the evolution of aescin and aesculin biosynthesis[J].Nature Communications,2023,14(1):6470.
[19] XU B Y,HUANG J P,PENG G Q,et al.Total biosynthesis of the medicinal triterpenoid saponin astragalosides[J].Nature Plants,2024,10(11):1826-1837.
[20] PU X D,HE C N,YANG Y,et al.In vivo production of five crocins in the engineered Escherichia coli[J].ACS Synthetic Biology,2020,9(5):1160-1168.
[21] MA Y,CUI G H,CHEN T,et al.Expansion within the CYP71D subfamily drives the heterocyclization of tanshinones synthesis in Salvia miltiorrhiza[J].Nature Communications,2021,12(1):685.
[22] YIN H,HU T D,ZHUANG Y B,et al.Metabolic engineering of Saccharomyces cerevisiae for high-level production of gastrodin from glucose[J].Microbial Cell Factories,2020,19(1):218.
[23] ZENG W Z,WANG H J,CHEN J B,et al.Engineering Escherichia coli for efficient de novo synthesis of salidroside[J].Journal of Agricultural and Food Chemistry,2024,72(51):28369-28377.
[24] WANG D,WANG J H,SHI Y S,et al.Elucidation of the complete biosynthetic pathway of the main triterpene glycosylation products of Panax notoginseng using a synthetic biology platform[J].Metabolic Engineering,2020,61:131-140.