7月bioRxiv生信好文速览

上个月，bioRxiv上发文数达到了破纪录的2600篇，此外还有大约1000份手稿有版本更新，这些数字向学界表明着bioRxiv——这一生物学界最为成熟的预印本（preprint）平台——在国际上越来越流行的趋势。另一边厢，六月份刚刚出炉的医学预印本平台medRxiv也在上个月月末迎来了第100篇preprint。和bioRxiv一样，medRxiv表示会与学术期刊达成协议，允许研究人员在向同行评议期刊正式投稿时直接将medRxiv上的preprint直接转过去。目前，已有以下基本杂志达成了协议：JCO Clinical Cancer Informatics, JCO Precision Oncology, 以及 Genetics in Medicine。和运作成熟的bioRxiv相比，medRxiv不论是关注度还是运作上暂时都有很大差距，但相信其管理者们会迅速推出更多的措施方便大家，并将medRxiv打造成为医学研究的重要平台。

七月也是电影暑期档的时节。12日，《狮子王2019》于大陆率先上映，唤起了无数人的童年记忆。巧合的是，七月的biorxiv恰好也登出了一篇预印本手稿，报道了非洲狮基因组测序的最新结果，似乎也表达了生物学家对经典的致敬。与之呼应，上个月的biorxiv上还发布了西伯利亚虎的基因组测序结果。而或许最令人意想不到的是，这两篇文章的第一作者，竟都是来自美国斯坦福大学的博后Ellie Armstrong！让我们一起看看吧。

1. 狮、虎基因组

1.1 【Genomics】非洲狮基因组

Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long read data（CC-BY-NC-ND 4.0）

The lion (Panthera leo) is one of the most popular and iconic feline species on the planet, yet in spite of its popularity, the last century has seen massive declines for lion populations worldwide. Genomic resources for endangered species represent an important way forward for the field of conservation, enabling high-resolution studies of demography, disease, and population dynamics. Here, we present a chromosome-level assembly for the captive African lion from the Exotic Feline Rescue Center as a resource for current and subsequent genetic work of the sole social species of the Panthera clade. Our assembly is composed of 10x Genomics Chromium data, Dovetail Hi-C, and Oxford Nanopore long-read data. Synteny is highly conserved between the lion, other Panthera genomes, and the domestic cat. We find variability in the length and levels of homozygosity across the genomes of the lion sequenced here and other previous published resequence data, indicating contrasting histories of recent and ancient small population sizes and/or inbreeding. Demographic analyses reveal similar histories across all individuals except the Asiatic lion, which shows a more rapid decline in population size. This high-quality genome will greatly aid in the continuing research and conservation efforts for the lion.

Figure S1 Circos plot of alignments between tiger (left) and lion (right) chromosomes. Colors represent different chromosomes with bottom chromosome (shown in dark brown) representing A1.

1.2 【Genomics】65头老虎基因组测序看遗传漂变和自然选择在老虎进化中扮演的角色

Recent evolutionary history of tigers highlights contrasting roles of genetic drift and selection

Tigers are among the most charismatic of endangered species, yet little is known about their evolutionary history. We sequenced 65 individual genomes representing extant tiger geographic range. We found strong genetic differentiation between putative tiger subspecies, divergence within the last 10,000 years, and demographic histories dominated by population bottlenecks. Indian tigers have substantial genetic variation and substructure stemming from population isolation and intense recent bottlenecks here. Despite high genetic diversity across India, individual tigers host longer runs of homozygosity, potentially suggesting recent inbreeding here. Amur tiger genomes revealed the strongest signals of selection and over-representation of gene ontology categories potentially involved in metabolic adaptation to cold. Novel insights highlight the antiquity of northeast Indian tigers. Our results demonstrate recent evolution, with differential isolation, selection and drift in extant tiger populations, providing insights for conservation and future survival.

2. 【Bioinformatics】PromethION，11个人基因组, 9天时间，63X覆盖, N50 42kb，尽在全新的从头组装工具——杀死他（SHASTA）

Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit（CC-BY 4.0）

Present workflows for producing human genome assemblies from long-read technologies have cost and production time bottlenecks that prohibit efficient scaling to large cohorts. We demonstrate an optimized PromethION nanopore sequencing method for eleven human genomes. The sequencing, performed on one machine in nine days, achieved an average 63x coverage, 42 Kb read N50, 90% median read identity and 6.5x coverage in 100 Kb+ reads using just three flow cells per sample. To assemble these data we introduce new computational tools: Shasta - a de novo long read assembler, and MarginPolish & HELEN - a suite of nanopore assembly polishing algorithms. On a single commercial compute node Shasta can produce a complete human genome assembly in under six hours, and MarginPolish & HELEN can polish the result in just over a day, achieving 99.9% identity (QV30) for haploid samples from nanopore reads alone. We evaluate assembly performance for diploid, haploid and trio-binned human samples in terms of accuracy, cost, and time and demonstrate improvements relative to current state-of-the-art methods in all areas. We further show that addition of proximity ligation (Hi-C) sequencing yields near chromosome-level scaffolds for all eleven genomes.

3. 【Evolution】德国比勒菲尔德大学：细胞悬浮液漂了25年的拟南芥细胞基因组有什么变化？

25 years of propagation in suspension cell culture results in substantial alterations of the Arabidopsis thaliana genome（CC-BY 4.0）

Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that has been established about 25 years ago. Here we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Col-0 reference sequence were detected. The number of deletions exceeds the number of insertions thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions e.g. the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones.

4. 【Genomics】1997到2019：十万genome十万菌

What can we learn from over 100,000 Escherichia coli genomes?（CC-BY-NC-ND 4.0）

The explosion of microbial genome sequences in public databases allows for large-scale population studies of model organisms, such as Escherichia coli. We have examined more than one hundred-thousand E. coli and Shigella genomes. After removing outliers, genomes were classified into two broad clusters based on a semi-automated Mash analysis, which distinguished 14 distinct phylotypes, graphically illustrated by Cytoscape. From a set of more than ten-thousand good quality E. coli and Shigella genomes from GenBank, we find roughly 2,700 gene families in the E. coli species core, and more than 135,000 gene families in the E. coli pan-genome. Based on a set of 2,613 single-copy core proteins taken from one representative genome per phylotype, we constructed a robust phylogenetic tree. This is the largest E. coli genome dataset analyzed to date, and provides valuable insight into the population structure of the species.

5. 【Omics】麻省大学医学院Dekker开发新技术助力Hi-C染色体区隔化研究

Compartment-dependent chromatin interaction dynamics revealed by liquid

chromatin Hi-C（CC-BY-NC-ND 4.0）

Chromosomes are folded so that active and inactive chromatin domains are spatially segregated. Compartmentalization is thought to occur through polymer phase/microphase separation mediated by interactions between loci of similar type. The nature and dynamics of these interactions are not known. We developed liquid chromatin Hi-C to map the stability of associations between loci. Before fixation and Hi-C, chromosomes are fragmented removing the strong polymeric constraint to enable detection of intrinsic locus-locus interaction stabilities. Compartmentalization is stable when fragments are over 10-25 kb. Fragmenting chromatin into pieces smaller than 6 kb leads to gradual loss of genome organization. Dissolution kinetics of chromatin interactions vary for different chromatin domains. Lamin-associated domains are most stable, while interactions among speckle and polycomb-associated loci are more dynamic. Cohesin-mediated loops dissolve after fragmentation, possibly because cohesin rings slide off nearby DNA ends. Liquid chromatin Hi-C provides a genome-wide view of chromosome interaction dynamics。

6. 【Genomics】加州大学戴维斯分校学者GWAS研究揭示环境对墨西哥玉米基因组的影响

Single-gene resolution of locally adaptive genetic variation in Mexican maize（CC-BY-NC 4.0）

Threats to crop production due to climate change are one of the greatest challenges facing plant breeders today. While considerable adaptive variation exists in traditional landraces, natural populations of crop wild relatives, and ex situ germplasm collections, separating adaptive alleles from linked deleterious variants that impact agronomic traits is challenging and has limited the utility of these diverse germplasm resources. Modern genome editing techniques such as CRISPR offer a potential solution by targeting specific alleles for transfer to new backgrounds, but such methods require a higher degree of precision than traditional mapping approaches can achieve. Here we present a high-resolution genome-wide association analysis to identify loci exhibiting adaptive patterns in a large panel of more than 4500 traditional maize landraces representing the breadth of genetic diversity of maize in Mexico. We evaluate associations between genotype and plant performance in 13 common gardens across a range of environments, identifying hundreds of candidate genes underlying genotype by environment interaction. We further identify genetic associations with environment across Mexico and show that such loci are associated with variation in yield and flowering time in our field trials and predict performance in independent drought trials. Our results indicate that the variation necessary to adapt crops to changing climate exists in traditional landraces that have been subject to ongoing environmental adaptation and can be identified by both phenotypic and environmental association.

7. 【Bioinformatics】奥地利维也纳大学科学家开发adaptive introgression检测新软件

VolcanoFinder: genomic scans for adaptive introgression（CC-BY-NC-ND 4.0）

The process by which beneficial alleles are introduced into a species from a closely-related species is termed adaptive introgression. We present an analytically-tractable model for the effects of adaptive introgression on non-adaptive genetic variation in the genomic region surrounding the beneficial allele. The result we describe is a characteristic volcano-shaped pattern of increased variability that arises around the positively-selected site, and we introduce an open-source method VolcanoFinder to detect this signal in genomic data. Importantly, VolcanoFinder is a population-genetic likelihood-based approach, rather than a comparative-genomic approach, and can therefore probe genomic variation data from a single population for footprints of adaptive introgression, even from a priori unknown and possibly extinct donor species.

8. 【Bioinformatics】转录组组装工具StringTie升级2.0版震撼来袭（可以handle长度段）

Transcriptome assembly from long-read RNA-seq alignments with StringTie2（CC-BY 4.0）

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools.

9. 【Omics】加州大学圣地亚哥分校张坤：splint oligo实现同一单细胞的ATAC-seq和RNA-seq双测序

Linking transcriptome and chromatin accessibility in nanoliter droplets for single-cell sequencingLinked profiling of transcriptome and chromatin accessibility from single cells can provide unprecedented insights into cellular status. Here we developed a droplet-based Single-Nucleus chromatin Accessibility and mRNA Expression sequencing (SNARE-seq) assay, that we used to profile neonatal and adult mouse cerebral cortices. To demonstrate the strength of single-cell dual-omics profiling, we reconstructed transcriptome and epigenetic landscapes of cell types, uncovered lineage-specific accessible sites, and connected dynamics of promoter accessibility with transcription during neurogenesis.

10. 【Omics】选自medRxiv

Early detection of molecular disease progression by whole-genome circulating tumor DNA in advanced solid tumors（CC-BY-ND 4.0）

Purpose: Treatment response assessment for patients with advanced solid tumors is complex and existing methods of assessment require greater precision for early disease assessment. Current guidelines rely on imaging, which has limitations such as the long time required before treatment effectiveness can be determined. Serial changes in whole-genome (WG) circulating tumor DNA (ctDNA) were used to detect disease progression early in the treatment course. Methods: 97 patients with advanced cancer were enrolled, and blood was collected before and after initiation of a new treatment. Plasma cell-free DNA libraries were prepared for either WG or WG bisulfite sequencing. Longitudinal changes in the fraction of ctDNA were quantified to identify molecular progression or response in a binary manner. Study endpoints were agreement with first follow-up imaging (FUI) and stratification of progression-free survival (PFS). Results: Patients with early molecular progression had shorter PFS (n=14; median 62d) compared to others (n=78; median 263d, HR 12.6 [95% confidence interval 5.8-27.3], log-rank P<10^-10, 5 excluded from analysis). All cases with molecular progression were confirmed by FUI and molecular progression preceded FUI by a median of 40d. Sensitivity for the assay in identifying clinical progression was 54%, median 24d into treatment and specificity was 100%. Conclusions: Molecular progression, based on ctDNA data, detected disease progression for cases on treatment with high specificity approximately 6 weeks before follow-up imaging. This technology may enable early course change to a potentially effective therapy, avoiding side effects and cost associated with cycles of ineffective treatment.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。