Nature Podcast: DNA元件百科全书计划

打开APP

未登录

开通VIP，畅享免费电子书等14项超值服

开通VIP

首页

好书

留言交流

下载APP

联系客服

Nature Podcast: DNA元件百科全书计划

userphoto

skysun000001 >《科学技术卫生健康养生养老体育环保》

2020.08.09

关注

又到了每周一次的 Nature Podcast 时间了！欢迎收听本周由Shamini Bundell 和 Anand Jagatia带来的一周科学故事，本期播客片段里讨论了DNA元件百科全书(ENCODE)计划。欢迎前往iTunes或你喜欢的其他播客平台下载完整版，随时随地收听一周科研新鲜事。

音频文本：

Host: Shamini Bundell

This week sees the publication in Nature of a huge number of papers relating to the third iteration of ENCODE, a project aiming to identify all the regions in the human genome involved in gene regulation. To find out more about ENCODE, Anand Jagatia has been speaking to one of the researchers who’s been involved in the project from the start.

Interviewer: Anand Jagatia

The human genome is a code 3 billion base pairs long that contains the instructions for our cells to make proteins and, ultimately, us. In 2003, scientists successfully completed the Human Genome Project, managing to sequence every single one of these pairs. But it turns out that was just the beginning. It was like we had the blueprints for how to make a human but no idea how to interpret it. What did this vast series of As, Ts, Cs and Gs actually encode? Researchers now think it contains between 21,000 and 25,000 genes for proteins, but there’s also a huge amount going on in the rest of the genome. For one thing, lots of it doesn’t code for protein at all. It codes for sections of RNA that have a function in and of themselves, and there are at least 1 million stretches of DNA in the genome, known as switches or elements, that are involved in turning genes on or off or changing their levels of expression. Picking up where the Human Genome Project left off, ENCODE, which stands for Encyclopaedia of DNA Elements, is an ambitious project to identify all of these elements and how they affect gene expression. This week, several papers on the third iteration of the project – ENCODE 3 – have been published in Nature. Rick Myers from the HudsonAlpha Institute for Biotechnology in the US has been part of the project since it began. He explained to me why these areas of the genome are of such interest.

Interviewee: Rick Myers

You don’t get to be a liver cell or a neuron by expressing the entire genome. You have some sets of the genes being activated and other being inactivated, and the hope was to identify these switches and what turns those switches on and off in different cell types. So, when ENCODE started in 2003, it was a pilot project really looking at only 1% of the genome, so then the second phase of ENCODE was to do that on a genome-line level, and then ENCODE 3 that we’re talking about now greatly, greatly expanded that so that we’re not only looking genome-wide. We’re looking at a lot of different cell types and starting to learn a whole lot more detail of this atlas, essentially, like a collection of maps, but know putting these maps together so that they make sense.

Interviewer: Anand Jagatia

So, can you give us a sense of the scale of this? How many of these switches have you identified in ENCODE 3?

Interviewee: Rick Myers

So, now, with ENCODE 3 we’ve identified in the human genome, identified almost a million of these switches. There may well be more of them. These are the ones we’ve identified in a few dozen different cell types. As an aside, we also, during this phase, did a mouse ENCODE project. There’s several hundred thousand in the mouse that are identified, and having the interplay between those two organisms’ datasets has been really helpful for interpreting the human genome, for instance.

Interviewer: Anand Jagatia

And part of ENCODE 3 was trying to figure out how these million or so switches affect gene expression when bound by different molecules in the cell, which could be proteins or even RNA. So, what are some of the molecules that you were looking at?

Interviewee: Rick Myers

A big push was on the DNA-binding proteins that are called transcription factors – proteins that bind to DNA or bind to other proteins bound to DNA and turn genes on or off or determine the levels of the genes in different cell types and at different times during development. And there are a lot of them – 1,600 of these means that we put a lot of our genome and the energy made into making cells into controlling when and where all the genes get expressed. In addition to what we call transcription factors, there are other more general DNA-binding proteins that are called chromatin regulators. They play a role in what the whole genome looks like in a particular cell at any time, in terms of opening up regions of the genome for transcription or helping to keep them closed. So, that was another really important part of ENCODE because they bind to many, many more places in the genome than do transcription factors.

Interviewer: Anand Jagatia

So, what form does all of this data actually take?

Interviewee: Rick Myers

So, ENCODE 3 is the first time we actually generated the encyclopaedia, which is freely available to everyone. It has all the annotations that ENCODE has generated to date – genes, the switches, transcriptomes, epigenetics and many different cell types – and it’s organised and meant to be easy to use. So, computational biologists and many creative scientists helped to build tools to take these very large complex datasets and huge numbers of different contexts and get out what you want to look at.

Interviewer: Anand Jagatia

I mean looking back, how does these kinds of datasets and tools that you’ve used to build this encyclopaedia, how do they compare to what you were using back in the 90s when the Human Genome Project was set up?

Interviewee: Rick Myers

It’s fun for me, at least, to look back on the history of this. When we started the Human Genome Project in 1990, the goal was to figure out one person’s or one composite human genome sequence. The truth is we really didn’t know how we were going to do it. The technology was pretty crude back then, and in 1990 the internet didn’t exist or at least we weren’t using it yet, and we were copying data onto floppy disks and providing that to people as much as we could and, of course, in that subsequent 30 years, we’ve had enormous increases in computational ability, and thank goodness we do because the amount of data we have is millions of times more than what we had in 1990.

Interviewer: Anand Jagatia

In lots of ways, this is basic science, really. You’re trying to annotate the genome to figure out what these different elements are and what they do and how they affect gene expression, but scientists are using the data, and there are practical applications too. Have you got any examples you can share?

Interviewee: Rick Myers

Yes, so one of them is a severe gastrointestinal disorder in babies. The cause was unknown, and researchers used ENCODE data to identify particular switches that control the expression of a gene that was suspected to be involved in this terrible disorder in babies. They tested the region and, sure enough, it was involved in regulating the gene in the digestive system, and that actually identified then the cause of this disease that is also probably related to many other similar diseases in children and even some adults. And that example is one of many where being able to understand how the gene is expressed and how the regulation of the gene is controlled has helped to understand and even work their way towards not just diagnosis and prognosis but treatments.

Interviewer: Anand Jagatia

So, ENCODE 3 is now available in the form of this encyclopaedia, but things don’t stop there. What’s happening with the next phase of the project?

Interviewee: Rick Myers

ENCODE 4 is well underway. The goals in ENCODE 4 are greatly expanding the data collection, a lot more cell types are being included, and we’re really working towards analysing all 1,600 transcription factors and all the chromatin marks. That’s some of the major goals. One of the really important parts about ENCODE 4 is actually integrating all of these data types. You don’t have one little element and one protein controlling the expression of a gene. You have a massive group of components that interact to give you that specificity of cell type, when it’s going to be expressed and when it happens during development.

Host: Shamini Bundell

That was Rick Myers from the HudsonAlpha Institute for Biotechnology talking to Anand Jagatia.

《自然》ENCODE 3专题：

A collection of research articles and related content describing the Encyclopedia of DNA Elements, its datasets and tools.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。

打开APP，阅读全文并永久保存查看更多类似文章

猜你喜欢

类似文章

【热】打开小程序，算一算2024你的财运

Nature Podcast：RNA检测有望及早发现致命的妊娠疾病

Genome Data Interpretation Using GIS

How to get the detailed information on genetic variants for your genotype-phenotype association stud

Gene study shows three distinct groups of chimpanzees

The Gene Ontology: enhancements for 2011

Gene editing can speed up plant domestication

更多类似文章 >>

生活服务

热点新闻

分享收藏导长图关注下载文章

绑定账号成功
后续可登录账号畅享VIP特权！

如果VIP功能使用有故障，
可点击这里联系客服！

联系客服