[ad_1]
Quotation: Ladner JT, Sahl JW (2023) In direction of a post-pandemic future for international pathogen genome sequencing. PLoS Biol 21(8):
e3002225.
https://doi.org/10.1371/journal.pbio.3002225
Printed: August 1, 2023
Copyright: © 2023 Ladner, Sahl. That is an open entry article distributed below the phrases of the Inventive Commons Attribution License, which allows unrestricted use, distribution, and replica in any medium, supplied the unique creator and supply are credited.
Information Availability: Information used for the technology of Fig 3 can be found in Supporting Data. SARS-CoV-2 sequence counts have been obtained from the EpiCoV “World by month” obtain (www.epicov.org). Influenza virus sequence counts have been obtained from https://gisaid.org/influenza-subtypes-dashboard/.
Funding: The creator(s) acquired no particular funding for this work.
Competing pursuits: The authors have declared that no competing pursuits exist.
Abbreviations:
AMR,
antimicrobial resistant; COVID-19,
Coronavirus Illness 2019; HIV,
human immunodeficiency virus; INSDC,
Worldwide Nucleotide Sequence Database Collaboration; LMIC,
low-and-middle-income nation; SARS-CoV-2,
Extreme Acute Respiratory Syndrome Coronavirus 2
Introduction
Lower than a century in the past, the general public well being influence of infectious illness was thought to have largely been resolved. By the Nineteen Sixties, we had an in depth understanding of the varied microbes that trigger infectious illness: viruses, micro organism, and fungi. We additionally knew how these pathogens unfold and had made extraordinary progress in the direction of the prevention and therapy of infectious illness by way of the event and use of antibiotics and vaccines, in addition to societal modifications associated to non-public hygiene and sanitation [1]. What we didn’t totally admire on the time, nevertheless, was the unimaginable range of human pathogens, their capability for speedy evolution, and the dynamic nature of interactions between pathogens and their hosts. Mixed, these elements have considerably difficult our makes an attempt to mitigate the impacts of infectious illness.
One of many main causes for that is the continued emergence of latest pathogens, in addition to the reemergence of identified pathogens in several types and/or locations. H1N1 influenza virus, human immunodeficiency virus (HIV), and Extreme Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) have all emerged comparatively just lately by way of zoonotic transmission from animals to people. Now we have additionally repeatedly seen identified pathogens reemerge in types which might be troublesome or not possible to deal with with accessible medicine. For instance, our widespread use of antibiotics has chosen for brand spanking new, multidrug-resistant strains of many micro organism, together with Staphylococcus aureus, Escherichia coli, and Pseudomonas aeruginosa. One more reason it has been difficult to mitigate the general public well being influence of infectious illness is that not all pathogens are simply managed with current approaches. Regardless of our early successes utilizing vaccines to cease the unfold of viruses like variola virus and poliovirus, and micro organism like Bordetella pertussis and Clostridium tetani, different pathogens have been way more troublesome to manage utilizing vaccines; for instance, because of the co-circulation of a number of serotypes and the existence of nonhuman or environmental reservoirs. Now we have additionally made nice progress within the improvement of antiviral therapeutics, however in lots of instances their effectiveness is dependent upon speedy and particular prognosis, which stays a problem. Societal modifications like will increase in inhabitants measurement and density, environmental degradation, and will increase within the frequency of long-distance journey, have raised the chance of zoonotic transmission and made it simpler for pathogens to unfold inside populations and around the globe. As well as, we proceed to wrestle with public acceptance of current interventions, which may severely restrict their utility.
Fortuitously, we’ve additionally continued to develop new instruments which might be permitting us to arrange for and reply to infectious illness outbreaks in additional focused methods, one in all them being pathogen genome sequencing [2]. A pathogen’s nucleic acid genome (DNA or RNA) accommodates the entire info wanted for its correct improvement and performance. Subsequently, genome sequences can educate us concerning the biology of pathogens, and so they additionally function distinctive barcodes for pathogen identification and monitoring. We will now routinely and cost-effectively generate full-length genome sequences in close to actual time, even for pathogens with bigger genome sizes, like micro organism and fungi. Utilizing these sequences, we will diagnose infectious ailments, be taught concerning the dynamics of pathogen unfold, and make knowledgeable, patient-level therapy selections.
On this Essay, we talk about the speedy rise of pathogen genome sequencing, starting within the 2000s after which accelerating with the emergence and international unfold of SARS-CoV-2 in 2019. We begin with a dialogue of the technological advances that enabled routine pathogen genome sequencing, then describe the varied makes use of of pathogen genomic info for understanding and combating infectious illness, in addition to a number of of the vital advances on this discipline that have been pushed by the SARS-CoV-2 pandemic and finish with a dialogue of the wants and future challenges for pathogen sequencing.
Enabling routine pathogen sequencing
The utility of genetic knowledge for monitoring and understanding pathogens has been acknowledged for a number of a long time, however routine, full-length genome sequencing has solely turn out to be doable throughout the final roughly 10 years because of a number of vital technological advances (Fig 1). With out query, crucial of those advances was the event of high-throughput (aka “next-generation”) DNA sequencing. A number of approaches for high-throughput sequencing got here to market across the identical time (2005 to 2007; e.g., 454 [3], Solexa [4], Illumina [5]) and so they all enabled, for the primary time, massively parallel sequencing of various swimming pools of nucleic acids. These applied sciences enabled genome sequencing by considerably decreasing the per base price of DNA sequencing and offering an environment friendly strategy for sequencing DNA in a nonspecific method (i.e., not using predefined priming websites). Over time, incremental enhancements in a few of these preliminary applied sciences (e.g., Illumina’s sequencing by synthesis [6]) have resulted in progressively longer reads, increased throughput, and decrease price. In the meantime, a number of new, single molecule sequencing approaches have additionally been launched (e.g., Oxford Nanopore Applied sciences [7]) and these have considerably elevated learn size (1,000s versus 100s of bases per learn), thus facilitating the meeting of bigger genomes, whereas additionally reducing the fee and measurement of the sequencing devices, thus rising the accessibility and portability of high-throughput sequencing.
Fig 1. Advances which have enabled routine sequencing of pathogen genomes.
Timeline (proper) features a choose variety of associated expertise launch/publication dates, with colours linking every occasion to one in all 3 normal classes of development (left). HTS, high-throughput sequencing. Created with BioRender.com.
A second, associated advance has concerned capability constructing for the usage of high-throughput applied sciences. Though these applied sciences initially debuted almost 2 a long time in the past, the sequencing {hardware} and the experience for working the devices and deciphering the outcomes have been initially concentrated inside a small variety of labs and nearly completely inside a handful of high-income nations. In distinction, infectious illness outbreaks are a world concern, and lots of the acknowledged sizzling spots for rising infectious ailments are throughout the World South. This preliminary discordance between the provision and want for high-throughput applied sciences difficult well timed genomic responses to infectious illness outbreaks, such because the Ebola virus epidemic in West Africa in 2013 to 2016 [8]. Over the intervening years, nevertheless, international entry to high-throughput sequencing for outbreak responses has grown immensely (although not equally) as a consequence of a mix of reducing prices for sequencing {hardware} and reagents, devoted efforts from worldwide companies and native governments to construct sequencing capability in low-and-middle-income nations (LMICs), the discharge of open supply, freely accessible software program packages and internet sources for the evaluation and interpretation of pathogen genomes (e.g., BEAST [9], Galaxy [10], NextStrain [11], CZ ID [12]) and the incidence of a number of outbreaks of worldwide concern, together with the Coronavirus Illness 2019 (COVID-19) pandemic. Importantly, there has additionally been a gentle migration of sequencing experience from business and academia into the general public well being laboratories that function the primary line response to outbreaks of infectious illness.
One other essential set of advances enabled the enrichment of pathogen-derived nucleic acids from complicated samples. Though high-throughput sequencing can ship full pathogen genomes with out focused enrichment, that is usually not price efficient as a result of pathogen-derived nucleic acids are sometimes current at very low abundance inside related samples (e.g., blood, feces, saliva, soil, and air filters), and conventional enrichment approaches involving laboratory culturing are time consuming and depending on the presence of a adequate variety of infectious particles. Subsequently, novel approaches for enrichment have been wanted to allow routine pathogen genome sequencing inside clinically related time frames. Essentially the most profitable approaches fall into 2 classes: depletion of nontarget nucleic acids, like host ribosomal RNA, which are sometimes probably the most considerable RNAs inside scientific samples [13], or particular enrichment of pathogen-derived nucleic acids. Two major strategies have been profitable for pathogen genome enrichment: selective amplification by way of PCR and probe-based hybrid-capture (Fig 1). Complete-genome amplification has been used for many years to review RNA viruses like influenza A [14] and HIV [15], and the potential for combining whole-genome amplification with high-throughput sequencing was initially demonstrated with these identical viruses [16,17]. In subsequent years, tiled amplicon sequencing has been utilized to all kinds of pathogens, and whereas a lot of the preliminary strategies targeted on a small variety of giant amplicons (roughly 1,000 to three,000 nt), lots of the newer strategies use extremely multiplexed swimming pools of primers that generate brief amplicons (roughly 400 nt) and due to this fact can amplify pathogen genomes even inside degraded samples with low titers (e.g., RNA “jackhammering” [18], Primal Scheme [19]). One of these enrichment is comparatively low cost and easy to arrange, however the primer panels are additionally extremely particular for a selected pathogen and the strategy is just not simply scalable to bigger genomes, reminiscent of dsDNA viruses, micro organism, and fungi. In distinction, probe-based, hybrid-capture strategies can concurrently enrich nucleic acids from a number of distinct pathogens and throughout full genomes of even the most important infectious brokers [20,21]. Nonetheless, this methodology is dearer due largely to the price of synthesizing the oligonucleotides (i.e., probes) used for selective seize.
How pathogen genomes are used
Along with these vital technological advances, pathogen genome sequencing has additionally risen to prominence because of the many distinctive ways in which pathogen genomes will help us to grasp and management the unfold of infectious illness. The functions for pathogen genome sequences can usually be assigned to not less than one in all 3 broad classes, and right here, we’ll talk about a number of distinguished examples from every: (1) the identification and characterization of infectious brokers; (2) monitoring the motion and evolution of pathogens by way of house and time; and (3) informing therapies and interventions (Fig 2).
The genome serves because the hereditary materials for all types of life, and as such, every pathogen’s genome encodes a singular set of directions that may be exploited for unambiguous identification, particularly when sequenced in its entirety. In distinction, earlier strategies for pathogen identification have been usually primarily based on oblique measures of the genetic code (e.g., phenotypes in tradition, complement fixation, restriction fragment size polymorphisms by pulsed-field gel electrophoresis) or small items of the genome (e.g., multi-locus sequence typing). These strategies are sometimes time intensive, require distinct reagents/approaches for various teams of pathogens, and may typically result in ambiguous or deceptive diagnoses. Subsequently, full-genome sequencing has emerged as a robust strategy for rapidly figuring out the causative agent of an infectious illness, and it may be utilized in a way that’s largely agnostic to the character of the pathogen (i.e., metagenomics). For instance, in 2013 metagenomic sequencing was used to diagnose a younger affected person with neuroleptospirosis, thus enabling acceptable intervention with intravenous antibiotics, even supposing conventional scientific assays for infectious ailments have been all unfavourable [22]. Equally, in 2014 high-throughput sequencing was used to definitively establish Ebola virus as the reason for a illness outbreak in Guinea [23]. Previous to this, Ebola virus had not been noticed exterior of some nations in Central Africa.
Genome sequences can be used to reconstruct chains of transmission, and due to this fact, genomic analyses can inform public well being initiatives targeted on minimizing the unfold of infectious illness. As a result of genomes function the hereditary materials, any genomic mutations or rearrangements will probably be inherited from dad or mum to offspring and variants that come up inside one an infection will be transmitted to a brand new host. Which means that instances from the identical outbreak/transmission chain are anticipated to be brought on by genetically similar or very comparable pathogens and that genetic divergence between infection-derived genomes will probably be correlated to epidemiological distance. For instance, whole-genome sequences have turn out to be instrumental for investigations of bacterial foodborne illness outbreaks by way of initiatives reminiscent of PulseNet [24,25] and GenomeTrakr [26,27]. By offering better pressure decision than conventional approaches (e.g., pulsed-field gel electrophoresis), whole-genome sequences can extra precisely establish instances linked to the identical outbreak and pinpoint the preliminary supply of contamination, thus facilitating focused remediation [28,29]. Equally, whole-genome sequencing has improved public well being interventions for tuberculosis by extra precisely figuring out current human-to-human transmission occasions [30]. Genome sequences have additionally been used to reconstruct HIV-1 transmission networks to allow focused public well being interventions [31] and have even performed an vital function in confirming atypical modes of transmission, just like the sexual transmission of each Ebola [32] and Zika [33] viruses. Together with conventional epidemiological investigations, the technology of almost similar virus genomes from semen samples from the male companions and blood samples from the feminine companions made sexual transmission the more than likely situation in each instances.
Inside the genomes of pathogens, mutations additionally are likely to accumulate at a broadly common price by way of time. That is generally known as a molecular clock, which can be utilized to estimate dates for vital outbreak-related occasions. Even earlier than high-throughput sequencing enabled routine pathogen sequencing, virus genomes (generated by way of PCR and Sanger sequencing) have been used to assist perceive the origin of the 2009 swine flu pandemic. Molecular clock evaluation demonstrated that the pandemic pressure circulated undetected for a number of months in people and several other years in swine, thus indicating the necessity for extra systematic surveillance for novel influenza viruses [34]. Lately, genome sequencing and molecular courting analyses have turn out to be a routine a part of outbreak investigations and have make clear the emergence of many viruses, together with MERS coronavirus [35], Ebola virus [36], HIV-1 [37,38], and Zika virus [39]. Molecular clock analyses have additionally been used to grasp the evolutionary histories and geographical unfold of bacterial pathogens, though there will be problems as a consequence of excessive ranges of recombination [40] and distinct life-history phases with totally different charges of evolution (e.g., spore-forming micro organism) [41]. For instance, genomes generated utilizing high-throughput sequencing have been used to grasp the traditional origins of Mycobacterium tuberculosis [42,43], in addition to the current origins of epidemic clones of multidrug-resistant S. aureus [44].
Pathogen genome sequencing additionally performs an vital function within the up to date design (and redesign) of diagnostics and vaccines. Lots of our present diagnostics are primarily based on the detection of pathogen genomes, and the sensitivity of those diagnostics is dependent upon sequence complementarity between the goal pathogen and the assay’s primers/probes, whereas specificity is dependent upon a scarcity of complementarity with off-target, close to neighbors. For pathogens with excessive mutation charges, like viruses, it’s essential to observe genome range by way of house [45] and over time [46] to take care of an excellent match between pathogen and diagnostic. For instance, a number of business SARS-CoV-2 diagnostics have misplaced sensitivity over time (i.e., began producing false negatives) as a consequence of evolution of the virus [47,48]. For pathogens with bigger genomes and versatile gene content material, like micro organism, it’s essential to establish genomic targets which might be extremely conserved and particular to the pathogenic strains of curiosity [49]. For instance, detection of the biothreat agent, Francisella tularensis, has been stricken by false optimistic detection as a consequence of a scarcity of genomic understanding of unculturable, but associated environmental species [50]. Equally, for a vaccine to be protecting, there should be an excellent match between the antigens included within the vaccine and people expressed by the circulating type of the pathogen. Complete-genome sequencing is a routine a part of influenza virus surveillance, used to observe each the evolution of identified strains and the emergence of latest reassortants, and annually’s vaccine pressure is chosen primarily based on these genome sequences [51]. The event of bacterial vaccines can be aided by genomic sequencing, as regional variation in strains might have an effect on the selection of acceptable antigens. For instance, colonization elements in enterotoxigenic E. coli are various, simply detectable by whole-genome sequencing, and are the main elements of some ETEC vaccines [52], guided by the regional dominance of particular genotypes.
Genome sequencing can be used to observe the continued evolution of pathogens for escape from current therapeutics and to tell the design of latest therapies. Antibiotics are our major device for combating bacterial infections, however the tempo of antibiotic discovery has slowed significantly and antibiotic-resistant strains of micro organism are rising at alarming charges. Complete-genome sequencing can be utilized to precisely predict antimicrobial resistance profiles from sequence knowledge for a lot of micro organism [53], together with M. tuberculosis [54]. As software program to carry out bacterial genome-wide affiliation research, powered by machine studying algorithms, turn out to be extra highly effective, genome sequencing will symbolize an vital device for monitoring resistance on the inhabitants stage [55] and informing patient-level therapy selections [56]. Genome sequencing has additionally turn out to be a essential element within the improvement of one of the vital promising alternate options to antibiotics: bacteriophage remedy. Excessive-throughput sequencing is used to display bacteriophage genomes for deleterious markers (e.g., toxins) and to detect contamination inside laboratory shares [57]. For a few years now, genome sequencing has additionally been a really useful element of the WHO’s technique for stopping and monitoring drug resistance in HIV [58], and in recent times, there was a concerted effort to transition to the usage of high-throughput sequencing for HIV surveillance as a result of it might detect drug-resistant variants current at low frequency inside an contaminated particular person [59].
Pandemic-driven advances
With the technical foundations and broad utility already established, the general public well being and analysis communities have been effectively positioned to quickly apply pathogen genome sequencing to assist perceive and reply to the COVID-19 pandemic that started late in 2019. For instance, throughout the very first weeks of the outbreak, unbiased high-throughput sequencing was used to establish and characterize the novel coronavirus that will finally be named SARS-CoV-2 [60]. These preliminary genome sequences have been publicly launched and so they allowed for the speedy improvement of focused diagnostics and vaccines [61]. In addition they enabled the design of nucleic acid enrichment methods particular for SARS-CoV-2 (e.g., tiled amplicon primer units) [62], which facilitated routine genome sequencing straight from scientific samples.
Finally, pathogen genomic surveillance was applied at an unprecedented scale in response to the COVID-19 pandemic. In actual fact, as of Might 9, 2023, 15,532,821 SARS-CoV-2 genome sequences had been submitted to the GISAID database (Fig 3). That is a number of orders of magnitude increased than the variety of genomes generated in response to earlier outbreaks brought on by rising viruses (e.g., roughly 2,000 Ebola virus sequences from West Africa from 2013 to 2016; lower than 1,000 Zika virus sequences from the Americas from 2015 to 2016), and it has even surpassed the overall variety of accessible influenza virus genomes (<1 M), for which genomic surveillance applications have existed for greater than a decade (Fig 3). The variety of contributing sequencing services has additionally been unprecedented. As of Might 11, 2023, 222 totally different nations/territories and >5,700 “submitting labs” have contributed SARS-CoV-2 genomes to GISAID, and lots of the sequences of biggest consequence for the general public well being response have been generated by labs within the World South [63,64]. Though capability for high-throughput sequencing was already on the rise previous to the emergence of SARS-CoV-2, the pandemic led to appreciable funding in sequencing services and genomic surveillance, and SARS-CoV-2 genomes have been utilized in quite a lot of methods, together with: (1) to grasp the origin of the pandemic [65]; (2) to reconstruct transmission chains [66,67]; (3) to observe the emergence of latest variants [63,64,68]; (4) to design and redesign diagnostics and vaccines [69–71]; and (5) to make knowledgeable affected person therapy selections (e.g., which monoclonal antibody therapeutics are more likely to be efficient) [72–74].
In response to the COVID-19 pandemic, there was a proliferation of software program instruments geared toward facilitating speedy interpretation and dialogue of pathogen genome knowledge. For instance, pangolin [75] and nextclade [76] each permit customers to rapidly assign SARS-CoV-2 genomes to lineages utilizing a dynamic and non-stigmatizing nomenclature, thus offering a constant and exact vocabulary for the dialogue of SARS-CoV-2 genomes [77]. Equally, web-based “dashboards” have rapidly turn out to be indispensable instruments which have helped to resolve 2 vital challenges in genomic surveillance: (1) the real-time evaluation of genomic knowledge; and (2) the speedy and widespread dissemination of outcomes. For pathogen genomes to be of use throughout an energetic outbreak, sequences should be analyzed quickly and outcomes should be communicated to the big variety of teams and people making public well being selections. Via the usage of automated workflows, SARS-CoV-2-focused dashboards like NextStrain’s ncov [78], CoV-Spectrum [79], COG-UK-ME [80], and plenty of others have facilitated steady, real-time evaluation of virus genomes all through the pandemic. They’ve additionally helped democratize entry to genomic epidemiology. Not solely did these dashboards facilitate the actual time sharing of outcomes, but in addition due to the interactive nature of many of those web sites, they permit customers to parse the accessible knowledge in custom-made methods, even when they don’t have experience in genomics, and with a minimal funding of time. In lots of instances, the code base underlying these dashboards can also be open supply and contributions from the neighborhood are welcome. This strategy not solely enhances transparency, but in addition facilitates the difference of those sources to different pathogens, outbreaks, and functions.
The unprecedented magnitude of the COVID-19 pandemic (and the related sequencing response) has additionally pushed the event of novel instruments and strategies targeted explicitly on the evaluation and visualization of very giant datasets (i.e., containing thousands and thousands of sequences). Whereas extraordinarily highly effective, the instruments that existed in the beginning of the pandemic (e.g., NextStrain, BEAST) have been designed to course of datasets with, at most, a number of thousand genome sequences. For an intensely sequenced pathogen like SARS-CoV-2, which means that datasets must be considerably downsampled previous to evaluation. And whereas there are instruments that facilitate downsampling in ways in which goal to reduce bias [11,81], downsampling is just not acceptable for all functions and its influence is often not rigorously evaluated [82]. One instance of a novel device that has facilitated complete phylogenetic analyses for SARS-CoV-2 is UShER [83]. Slightly than following the standard strategy for constructing phylogenies, which begins from scratch every time new knowledge is acquired, UShER provides new sequences to current timber, and it does so rapidly and with excessive accuracy. Not solely is that this strategy effectively suited to energetic outbreaks, the place new sequences are being generated recurrently, but it surely additionally scales effectively and due to this fact can add new knowledge to timber containing thousands and thousands of sequences inside an actionable timeframe [84]. One other vital device for enabling complete phylogenetics of SARS-CoV-2 is Taxonium [85], which is optimized for visualizing and exploring timber that include thousands and thousands of sequences. And though SARS-CoV-2 was the impetus for the event of those instruments, they don’t seem to be SARS-CoV-2 particular. Each have already been utilized to different high-priority pathogens, and instruments like these are more likely to turn out to be extra broadly wanted as the extent of pathogen sequencing continues to extend.
One other main problem throughout the response to international well being emergencies is facilitating communication and knowledge sharing between the various related teams producing and utilizing pathogen genome sequences (e.g., public well being labs, educational analysis teams, biotechnology corporations, governments, and media). On the nationwide stage, initiatives just like the CDC’s SPHERES (SARS-CoV-2 Sequencing for Public Well being Emergency Response, Epidemiology and Surveillance) symbolize a significant advance over earlier outbreak responses. SPHERES has utilized fashionable software program instruments (e.g., Zoom and Slack) to facilitate common and energetic discussions between various stakeholders from throughout the USA [86]. Inside the UK, the COVID-19 Genomics UK (COG-UK) Consortium went even additional by not solely facilitating dialogue, but in addition really making a centralized system for quickly amassing, processing, and sharing SARS-CoV-2 genome sequences, together with related pattern metadata [87]. This technique, powered by the CLIMB-COVID compute infrastructure [88], was capable of leverage a distributed community of clinics and sequencing services to offer a unified view of the pandemic on the nationwide stage [89].
Lastly, the COVID-19 pandemic has renewed curiosity in the usage of wastewater sampling for pathogen surveillance, which, when mixed with genome sequencing, gives a passive but highly effective strategy for monitoring the emergence of latest viruses and variants. Pathogen surveillance in wastewater dates again to the Nineteen Forties, the place poliovirus was detected from sewage in New Haven, Connecticut and New York Metropolis [90]. Wastewater sampling for SARS-CoV-2 surveillance gained consideration as a consequence of its complete and unbiased detection functionality [91] and up to date work has broadened into the detection of influenza virus [92], monkeypox virus [93], and antimicrobial resistance genes [94]. Wastewater surveillance has additionally just lately been used once more to trace poliovirus, this time figuring out circulation in a number of non-endemic areas, with the ensuing sequences implicating strains from the replication-competent oral poliovirus vaccine [95–97]. One of many challenges for high-resolution surveillance, the place the detection of particular mutations is required for genomic epidemiology, is the presence of blended genotypes. Nonetheless, current work means that the deconvolution of associated viruses is feasible as a consequence of informatics advances made throughout the SARS-CoV-2 pandemic [98]. Wastewater can also be a gorgeous sampling matrix for the early identification of rising pathogens as it’s impartial of voluntary testing campaigns and can be utilized as a neighborhood forecasting device [99]. The problem of wastewater surveillance for brand spanking new pathogens is that deep metagenomic sequencing is required for novel discovery efforts. As sequencing turns into cheaper or new enrichment approaches turn out to be possible, routine metagenomic surveillance of wastewater samples could also be doable to observe for the emergence of novel viral, bacterial, and fungal pathogens.
The way forward for knowledge sharing
Pathogen genome sequences have rapidly turn out to be an indispensable a part of how we put together for and reply to infectious illness outbreaks, however the good thing about these sequences for public well being is very depending on well timed and equitable sharing of information [100]. Previous to the COVID-19 pandemic, most pathogen genomes have been shared by way of a member of the Worldwide Nucleotide Sequence Database Collaboration (INSDC), which is a group of repositories (DDBJ, ENA, and NCBI) that share a typical coverage of free and unrestricted knowledge use [101]. In lots of respects, this represents an excellent system for sharing outbreak-related knowledge as a result of it ensures that the accessible sequences can be utilized as broadly as doable, each for analysis and business functions (e.g., the event of diagnostics and vaccines). Unrestricted knowledge sharing by way of INSDC repositories has additionally enabled the event of many vital knowledge evaluation sources for pathogens (e.g., the NIAID’s Bioinformatics Useful resource Facilities, together with the Los Alamos HIV sequence database and the Bacterial and Viral Bioinformatics Useful resource Heart, which just lately built-in PATRIC, IRD, and ViPR [102]). These sources have facilitated discoveries associated to pathogen genomes by way of knowledgeable curation and annotation of uncooked sequences submitted to INSDC repositories.
Nonetheless, the INSDC’s strategy solely works within the context of public well being if knowledge producers are comfy importing their sequences in actual time, which usually means previous to any in depth evaluation or publication. Sadly, the INSDC’s knowledge use coverage is just not capable of present any protections for knowledge producers with regard to attribution and/or necessities for collaboration. Subsequently, many knowledge producers are hesitant to add knowledge instantly to the INSDC, fearing that they could get scooped by others utilizing their very own knowledge. GISAID was launched as an alternative choice to the INSDC mannequin, one which explicitly protects the pursuits of information producers by requiring that customers adhere to a database entry settlement [103]. GISAID is run by way of an impartial, nonprofit that was initially established to facilitate the sharing of influenza genomes [61], however, with the onset of the COVID-19 pandemic, GISAID expanded its scope to incorporate SARS-CoV-2 (EpiCoV) (Fig 1).
On account of these protections, in addition to a streamlined submission system, GISAID was broadly embraced by the worldwide neighborhood throughout the COVID-19 pandemic, and is especially in style with knowledge producers in LMICs who might not have the sources to investigate and publish their knowledge as rapidly as teams in high-income nations [104]. In lots of respects, GISAID additionally seems effectively primed to additional develop sooner or later. Nonetheless, a number of current controversies have imperiled the belief that GISAID has labored so laborious to determine [105–107], and it’s clear that substantial modifications are wanted with regard to the transparency of GISAID governance. GISAID has additionally did not ship on an preliminary promise to serve solely as a brief repository, with knowledge finally transferred to the INSDC [103]. In actual fact, there’s presently no direct mechanism for transferring sequences from GISAID to the INSDC. In consequence, many viral genome sequences have successfully turn out to be siloed in a database that forestalls knowledge sharing with unregistered customers, and due to this fact, these sequences can’t be built-in into current bioinformatics sources that brazenly share curated sequence datasets (see above).
As we glance to the long run, our wants with regard to knowledge sharing are fairly clear, although it’s much less clear precisely how these wants will probably be met. First, we have to do every part we will to encourage speedy knowledge sharing, and this must embody protections for the pursuits of the info suppliers. Second, the rules for knowledge entry should be clear and pretty enforced, and there should be an official course of for interesting selections that end result within the lack of entry. Third, there should be a streamlined course of for transitioning knowledge from a restricted repository to at least one that enables unrestricted knowledge use. The necessity to defend the pursuits of information suppliers is actual, however it’s not indefinite. As soon as the suppliers have revealed on their knowledge, it ought to turn out to be freely accessible for extra use. All of those wants might feasibly be met by way of cooperation between GISAID, the INSDC, and the broader neighborhood of stakeholders (i.e., funding companies, knowledge suppliers, and knowledge customers). Nonetheless, if such cooperation doesn’t materialize, then we might have new options that may meet the entire necessities wanted for seamless and equitable incorporation of pathogen genome sequencing into our international public well being response to each epidemic and endemic pathogens [100].
The way forward for genomic surveillance
The SARS-CoV-2 pandemic led to the event of thrilling new methods, knowledge sharing platforms, and analytical instruments, but it surely additionally highlighted vital points, gaps, and inequities that, if addressed appropriately, might enhance future genomic surveillance efforts and higher put together us for the following public well being emergency. For instance, huge emergency investments facilitated the event of sequencing infrastructure that has allowed for the mass manufacturing, submission, and evaluation of pathogen genomes, however as this funding in SARS-CoV-2 sequencing wanes (Fig 3), we are actually confronted with the problem of sustaining this infrastructure within the absence of a public well being emergency. Fortuitously, most of this infrastructure is versatile sufficient to be utilized to many various pathogens of concern, and plenty of infectious ailments have been uncared for over the previous a number of years because the world’s consideration has been drawn to SARS-CoV-2. Subsequently, the important thing to sustaining our current advances seemingly lies in a pivot away from a sole concentrate on SARS-CoV-2 and in the direction of a extra inclusive scope [108]. For instance, throughout the SARS-CoV-2 pandemic, antimicrobial resistant (AMR) micro organism misplaced focus, however proceed to pose a considerable public well being menace [109]. Genomic surveillance for a lot of endemic viruses is presently effectively under optimum ranges [110], and our capability to effectively diagnose fungal infections and predict antifungal resistance is severely restricted [111]. By pivoting to a extra inclusive strategy to genomic surveillance, together with viral, bacterial, and fungal targets, and probably using multiplex detection and sequencing methods (Field 1), we will broadly enhance public well being and keep current capability. If infrastructure is just not supported and knowledge sharing pipelines aren’t maintained, a whole rebuild will probably be wanted for the following pandemic, which is able to drastically improve response time.
Field 1. Precedence areas for future funding
1. Open supply software program improvement and upkeep.
- To understand the complete potential of pathogen genomes for bettering public well being, we’d like software program that’s correct, simple to make use of, freely accessible, and capable of rapidly ship actionable outcomes for ever-expanding datasets.
- Regardless of many current advances on this space, a scarcity of acceptable software program stays a barrier for broader implementation of pathogen genome sequencing in public well being responses, particularly for functions exterior of the monitoring of rising viruses [112].
Particular wants: New instruments to fill gaps and streamline workflows with a precedence on interoperability; continued upkeep of current, high-impact instruments (in any other case they’ll rapidly lose their worth).
2. Multiplex detection and sequencing methods.
- To broaden the utility of genome sequencing for public well being, will probably be vital to spend money on approaches which might be able to detecting and characterizing a number of pathogens concurrently.
- If we proceed to concentrate on “singleplex” methods, our effort will stay closely biased towards solely the very best precedence pathogens.
Particular wants: Broader implementation of diagnostic assays (e.g., CRISPR-based nucleic acid detection methods [113]) and sequencing methods (e.g., probe-based hybrid seize [114,115]) that may concurrently detect/characterize a number of pathogens with a single set of reagents.
3. Price-effective enrichment of huge/various targets.
- Focused nucleic acid enrichment methods are essential for facilitating pathogen genome sequencing straight from scientific samples.
- Choices are restricted, and infrequently not cost-effective, when assays want to focus on a considerable amount of sequence range in a single assay, e.g., for multiplex enrichment protocols (see above) or whole-genome sequencing of pathogens with giant genomes, like micro organism, which is turning into more and more vital with the adoption of tradition impartial diagnostic checks [29].
Particular wants: Methods that may enrich a big number of nucleic acid targets with a single set of reagents, whereas remaining reasonably priced sufficient for routine implementation.
4. Understanding the optimum stage of sequencing.
- As we glance to the long run, will probably be vital to transition from a perspective of “the extra the higher,” to at least one that fastidiously considers the required stage of genome sequencing [116] and optimum sampling methods [82] for addressing probably the most essential wants for our public well being response.
Particular wants: Quantitative frameworks for evaluating the influence of various approaches and ranges of funding in sequencing (e.g., [116,117]); clearly outlined targets for the function of pathogen genomics in making ready for and responding to public well being threats.
5. Implementation of passive, long-term surveillance applications.
- Passive sampling strategies, reminiscent of wastewater surveillance, will probably be essential to observe for the presence of novel pathogens or variants.
- We must always broaden current applications (e.g., implement wastewater sampling for arriving plane and cruise ships [118], make the most of Biowatch program infrastructure for airborne pathogen detection [119]) and be sure that these applications will be constantly operated, over lengthy intervals of time.
Particular wants: Standardized sampling and evaluation protocols for the detection of particular pathogens; buy-in from funding companies in addition to shut collaboration between federal displays and native laboratory response networks.
Moreover, regardless of substantial will increase in international sequencing and evaluation capability during the last a number of years, vital disparities stay that undermine outbreak preparedness at each native and worldwide scales [108,120,121]. Through the pandemic, a lot of the genomic knowledge was generated in high-income nations (Fig 3), however many variants of concern emerged from LMICs [120,122]. Moreover, new pathogens can emerge from anyplace and rapidly unfold across the globe. Subsequently, our future genomic surveillance technique should contain increasing capability in LMICs. It will seemingly require will increase in native investments for public well being initiatives [108,121,123], in addition to continued assist by way of worldwide public–personal partnerships, such because the Africa Pathogen Genomics Initiative. Fortuitously, there are numerous current regional facilities of excellence and assist networks that may assist, not solely to determine new sequencing facilities, but in addition to offer the continued assist wanted to maintain and develop these applications [123–125].
Trying ahead, it can even be vital to fastidiously contemplate the restrictions of genome sequencing, which is able to assist us focus our efforts in methods that may optimize the return on funding for public well being. Regardless of unprecedented sequencing efforts throughout the pandemic, we nonetheless sequenced a small fraction of the overall variety of SARS-CoV-2 infections and the turnaround time between pattern assortment and genome submission was usually >3 weeks [120]. Subsequently, in observe, genome sequences weren’t informative for lots of the most time-sensitive public well being selections, just like the implementation of border closures; by the point a brand new variant of concern was recognized, it was seemingly already geographically widespread. Technological advances are more likely to lower sequencing turnaround instances sooner or later [126], however we’d nonetheless must be sequencing a really giant variety of samples every day to detect a brand new variant with excessive chance previous to substantial neighborhood unfold [116]. Additionally it is difficult to deduce the practical penalties of mutations from genome sequences in isolation. Slightly, it’s the change in prevalence over time that’s strongest for the identification of variants of concern [63,127]. Subsequently, pathogen sequencing is more than likely to be helpful for addressing questions that may stay related over longer timescales (e.g., forecasting future surges in instances, redesigning diagnostics and vaccines, choosing probably the most acceptable therapy regimens).
Though we’ve benefitted in some ways from genome sequencing throughout the SARS-CoV-2 pandemic, additionally it is true that, in some respects, there have been diminishing returns on funding as increasingly more instances have been sequenced, particularly from comparable places and time limits. At one excessive, we have been capable of notice a number of advantages with only a single genome sequence, together with the preliminary identification of the causal agent of COVID-19 and the knowledge wanted to provoke the design of vaccines and diagnostics. On the different finish of the spectrum, is the usage of pathogen genomes to establish and observe the unfold of latest variants. On this case, extra genomes means earlier detection of latest variants and extra correct estimation of variant frequencies [116]. There are additionally cases wherein comparable info might seemingly have been obtained from sub-genomic analyses. For instance, a lot of the mutations which have been proven to influence SARS-CoV-2 infectivity and immune evasion are situated within the Spike glycoprotein gene. This protein can also be the one antigen contained in most vaccines presently in use in opposition to SARS-CoV-2. By focusing our sequencing efforts on high-priority genomic areas, just like the SARS-CoV-2 Spike, we might be able to lower prices per goal pathogen whereas sustaining most (although not all) of the utility of the generated sequences [128]. Subsequently, as we put together for future outbreaks, we have to fastidiously contemplate the optimum sequencing effort that may guarantee a stability between the related prices and the ensuing advantages (Field 1). This is not going to solely require the institution of quantitative frameworks for evaluating the influence of various investments in sequencing (e.g., [116,117]), but in addition a clearly outlined set of targets for the function of pathogen genomics in making ready for and responding to public well being threats.
Given the large development within the measurement of the sequencing neighborhood and the necessity for speedy turnaround of information, we additionally face vital challenges relating to workflow standardization, high quality assurance, and the dissemination of outcomes. Standardization will at all times be a problem when 100s to 1,000s of teams are concurrently contributing to a discipline of research. Nonetheless, standardization tends to come up organically each time high-quality sources are supplied which might be freed from cost, simple to make use of, and don’t require any lack of knowledge possession. Nice examples throughout the SARS-CoV-2 pandemic embody the ARTIC Community primers for genome amplification [62], the Pangolin software program for lineage naming [75], and the NextStrain platform for phylogenetic evaluation [11]. You will need to proceed to spend money on efforts like these, as they should be actively maintained to stay related, and ought to be expanded to cowl different high-priority pathogens (Field 1). For instance, to maintain tempo with virus evolution, a number of variations of the ARTIC amplicon panel needed to be developed over the course of the pandemic to deal with the dropout of genomic areas as a consequence of primer mismatch [129]. We additionally want new software program pipelines tailor-made particularly for evaluation of pathogens with bigger, extra complicated genomes, like micro organism, fungi, and even some dsDNA viruses (Field 1) [112,130]. And eventually, we should spend money on sturdy, automated protocols that may facilitate sequence curation in a sustainable means to make sure knowledge high quality and due to this fact additionally the standard of downstream interpretations.
Over the past couple a long time, technological advances have enabled the routine sequencing of pathogen genomes. Mixed with a rising and extremely engaged neighborhood of scientists, this has revolutionized the way in which we research and reply to outbreaks of infectious illness, and as we transition out of a interval dominated by the emergency response to the COVID-19 pandemic, we’re effectively positioned to broadly apply the advantages of routine genome sequencing to the complete range of human pathogens.
[ad_2]