Version
française

Download the presentation >>>>

 

Home > News > General Assembly of Teratec

28th General Assembly of Teratec
October 10, 2019 - CEA's TGCC

Digital challenge for Genomics in medical sciences

Daniel Verwaerde then welcomed Jean-Marc Grognet, Genopole's Chief Executive Officer. "Jean-Marc Grognet is a great scientist. A biologist by training, he spent a good part of his career at the CEA, before taking over the management of the Genopole in 2017. We have since strengthened our ties and considered that a partnership between Genopole and Teratec would be beneficial for both our structures, as digital technology plays an increasingly important role in decoding the genome. That is what he wanted to remind us.

You introduced me as a biologist. I will therefore move away from my comfort zone by talking to you about digital technology and the benefits of this rapprochement between Genopole and Teratec, and by explaining why there is a renewed need for digital in the exploration of genomics in a global way” Jean-Marc Grognet began.

"The Genopole is a vision that was developed 20 years ago to create a bio-cluster, i.e. an innovation campus that brings together the knowledge triangle (higher education and academic research actors with industrialists) around a focused theme of great importance (genomics) in the tightest possible physical area (Evry, Corbeil, Courcouronnes). And we also put something essential in the middle of the triangle, the patient.
The first reason is historical because the AFM-Telethon which fights neuromuscular diseases had its first laboratories on the Evry site. The second reason is that the Centre Hospitalier Sud-Francilien (CHSF), the largest hospital in Ile-de-France outside Paris with 1,000 beds and 3,000 carers is at the heart of our territory" explains Jean-Marc Grognet.

5,600 professionals around Genomics

And 20 years after, the Genopole innovation campus is a true success with 5,600 people involved, including 2,400 direct jobs. They are located in 16 academic laboratories under the supervision of INSERM, CNRS, CEA, Evry Paris Saclay and Paris-Sud Universities (1,000 people), as well as in 96 labeled companies (1,500 people) and 29 advanced technological platforms.


®Christophe Hargoues - Genopole
These make it possible to provide a laboratory or a company relying on technical skills, with state-of-the-art equipment that they would not necessarily have means to finance, helping them to operate and share with all stakeholders on each site, thus facilitating exchange of knowledge. For example, the Ecole des Mines de Paris has a laboratory working on materials in Evry which has been equipped with electronic microscope, now accessible to biological applications for Genopole member laboratories and companies.

There are Grandes Ecoles as well: École nationale supérieure d'informatique pour l'industrie et l'entreprise (ENSIIE); Telecom Sud Paris; Institut Mines-Télécom Business School (IMT BS).

Finally, it is an economic success since Genopole related companies raise an average of 70 to 80 million euros per year. The latest operation to date with Ynsect, a specialist in insect proteins created 5 years ago at the Genopole, has just raised $125 million.

Today the Genopole is 5 campuses which should quickly be joined by 2 others which will also house production units, in addition to laboratories.

Genopole’s DNA « is » DNA

Genomics has been a way of answering the very simple question of hereditary characters in nature and the noted exceptions for the past century. Why is it that the child of a blue-eyed couple does not have blue eyes? It was also known that some of the diseases had a genetic component that genomics was trying to explain. The answer was found in the DNA molecule located in the nucleus of our cells”.


®Christophe Hargoues-Genopole

Measuring an average of 2 m, DNA is a long chain which has an impact on the amount of information carried by this molecule. The nucleus of each cell is structured into 23 pairs of chromosomes in humans. Proteins associated with DNA are present in these chromosomes. This DNA molecule is made up of a sequence of 4 types of molecular motifs (or bases) that represent all the information carried by the cell, or 3 billion base pairs!

"The total DNA of an organism represents its genome. Almost all of the 70,000 billion cells that make up the human body have the same DNA. Deciphering the genome means acquiring these 3 billion pieces of information, or 3x1039

It is sufficient that the sequence of the DNA molecule contains errors at one of the bases for the capacities of the proteins produced to be modified, leading to hereditary anomalies or genetic diseases.

 

Sequencing human genome

Until the 1990s we tried to understand these mechanisms, but the genome had to be sequenced. If we tried to take the molecule and read each DNA fragment, at a rate of 1 second for each base, it would take about 100 years to read an individual's entire genome!

Hence the use of massively parallel computing, cutting DNA into a multitude of monobrin fragments. High-speed sequencers executing complex protocols read the sequence of bases of each fragment. The DNA fragments read are then reassembled by computer analysis. The computer reconstructs the genomes and stores them in large databases.

To carry out the first sequencing of human genome, the Human Genome Project was set up in 1990. About ten laboratories around the world have each dedicated themselves to sequencing a chromosome. For France, the sequence of chromosome 14 was revealed in 2001 by Genoscope (CEA laboratory located within the Genopole). The sequencing of the entire human genome was completed in 2003.

The next step is to find the meaning of this message by trying to find genes in the genome, specific sequences of bases, about 22,000 for the human being.

From 100 M$ to 100 $ in 20 years’ time

This sequencing has a cost. While first sequencing by the Human Genome Project could be estimated at $100 million, Moore's Law was then pursued gaining a factor of about 2 every 18 months until 2007. Sequencing then costed about $10 million. “That's when the American company Illumina found a shattering method to massively parallel the sequencing, which drastically reduced the costs to $1,000 per sequencing today”.

This means that we are now able to have genome sequencing at a price similar to a complex biological analysis. This is becoming acceptable in conventional medical practice. The question now is no longer whether we will reach $100 one day but in how long, like 3 years, 6 years?

In addition to the drop in cost, the delay has also dropped drastically. The latest sequencing machines process 48 human genomes in parallel in 44 hours. “With molecular technologies, robotics, computing and Artificial Intelligence, biology is undergoing a methodological revolution. We are entering the era of Big Data and sequencing for all”. This allows multiple new applications to be considered.

First of all, to explore the diversity of the living world by better understanding living species, that is genomics. But also, discovering complex ecosystems formed by a whole community of organisms thanks to their DNA is metagenomics. It will also help to understand the mechanisms of living organisms by determining the function of genes identified in genomes, it is functional genomics. Finally, we will better understand the human being and his health and that is medical genomics.

Genomics calls for the comeback of great explorers

Genomics also allows us to see what we didn't know we were witnessing before. For example, the Tara Oceans expedition, with its hundreds of thousands of water samples taken from all the world's seas and the sequencing carried out identified 117 million genes, more than half of which were previously unknown. We can thus identify species that we had never seen before! Similarly, the sequencing totally changed the vision of the intestinal microbiota. From a few species of bacteria identified 40 years ago, we have grown to several hundred today and many more remain to be discovered!

Plankton_mix_pacific_CS ©Tara Expeditions Tara Expeditions-presdeMaurice-©S.Bollet-Tara Expeditions

Tailoring the medical approach

Sequencing each individual's genome will revolutionize medical practices by personalizing diagnosis and treatment. But for this to be effective, it will be necessary to be able to know and interpret in real time the information contained in the genome, hence the growing need for computing power and performance. “The idea is to be able to process heterogeneous data flow more and more quickly using algorithms, particularly Artificial Intelligence and bioinformatics tools in a context of increasing medical interdisciplinary practices to make the right diagnosis”.

Analyzing its genetic characteristics and knowing the genomic rearrangement of a tumor can provide valuable information to physicians to guide treatments. To do this, it will be necessary to work upstream on the acquisition, interpretation, integration and presentation of data, so as not to be overwhelmed with massive and useless information.

This genomic or personalized medicine is divided into two main branches. Genome analysis will make it possible to precisely detect an individual's predispositions to pathologies according to his genetic identity card and to carry out preventive medicine by giving him behavioral advice or even prescribing preventive treatments to reduce the risks.

Genome analysis will also make it possible to detect rare diseases by making the right diagnosis on the first time. In the case of a proven disease, genome analysis will make it possible to evaluate different treatments and choose the one that will be most effective and fastest with the least side effects, namely curative medicine. “In the last two years or so in oncology, it has been possible not simply to treat a kidney or digestive pathology but apply genetic mutation with a dedicated drug regardless of the location of the tumor. This is a radical change in the vision of treatment”.

All this is at the heart of the great France Médecine Genomique 2025 plan, partly born in Genopole, which aims to provide France with a large number of platforms dedicated to sequencing that will send the information obtained to a Data Analyzer Collector (CAD) which will process and interpret it for each patient and feed a research centre, Crefix, which will validate procedures and devices.

Genomics and Big Data

Unlike many Big Data applications where we process some information on a very large number of individuals, in genomic medicine we process a very large number of descriptors (genetic code, mutations...) on a very small number of individuals.

It is estimated that we will have 1021 bytes of information to process per individual to multiply by several hundred thousand patients per year. In addition to the processing power, this will also require very large storage resources of about 10 ExaBytes per year”.

This also raises questions because if acquisition is no longer an issue, the data must remain confidential yet to be attached to a single medical file in order to be of relevant interest. How do we guarantee this confidentiality? To whom do these data belong, to the patient, to the doctor who prescribed them, to the sequencing center, to the community and therefore to the State? Who will be responsible for keeping them and with what sustainability, because today's information could be useful to treat a pathology in 30 years' time? Who will interpret them and who will reinterpret them as science evolves?

No genomics without digital technology

"This is why we have approached Teratec more closely and you have understood that the medicine of the future will be based on genomics, so that is also why we need ever more efficient computing technology with secure processing of very large volumes of data, while guaranteeing their sustainability over the very very long term. These are concerns that are at the heart of Teratec's activities. We are also planning to create the world's first ''digital genomics'' institute to meet these needs because we have some of the most powerful sequencing centers operating in Europe as well as structures such as Teratec on our territory, with very strong skills in high-performance digital computing (HPC) " concluded Jean-Marc Grognet.

Following a question from Christian Saguez, Jean-Marc Grognet clarified the fact that studying the genome would not deal with everything in personalized medicine. “Genomics alone will not provide the answer to everything. Over the past two decades, it has been discovered that not everything lies in the genome. Contrary to what we thought, our genome is not immutable, while we can experience changes in our genome which could be transmitted to other generations through our life experience and our exposure to external factors (ionizing radiation, chemicals...)”.

This goes against the Lamarck/Darwin dogma where Lamarck defended the heredity of the acquired characters while Darwin said it is immutable. “We thought Lamarck was wrong, but now we realize that the answer is much less firm. It has been shown, for example, that a mouse can transmit to its offspring and to their offspring, the fear resulting from a trauma. Therefore there is a transmission of the acquired characters. It was also discovered that viruses could integrate into the genome and thus be transmitted to a subsequent generation. In addition to genomics, tomorrow's medicine will therefore incorporate other parameters such as life experience and exposure to external factors. This implies the processing of multiple data, both personal and collective”.

Following a question from Karim Azoum on the use of Machine Learning and AI in Genopole's work, Jean-Marc Grognet clarified: “I don't know if we can talk about Artificial Intelligence, but we also have a whole activity that consists in knowing the interaction networks between genes. This part is extremely important because the interaction of gene in networks has a responsibility in pathologies. Other institutions are also working on the prediction of pathologies”.

To download the presentation >>>>

© TERATEC - All rights reserved - Lawful mention