Home > TERATEC FORUM > Workshops > Workshop 7

TERATEC 2018 Forum
Workshop 7 - Wednesday, June 20 from 14h00 to 17h30

Deep Learning by doing

Towards recognizing the world's flora
Alexis JOLY, chargé de recherche INRIA, responsable scientifique de la plateforme Pl@ntNet

Abstract : Automated identification of plants and animals have improved considerably in the last few years, in particular thanks to the recent advances in deep learning. In 2017, a challenge on 10,000 plant species (PlantCLEF) resulted in impressive performances with accuracy values reaching 90%. Another challenge (iNat-2017@CVPR) spanning over 8000 categories of plants, animals, and fungi also reported quite impressive results with accuracy values higher than 81%. One of the most popular plant identification application, Pl@ntNet, nowadays works on 15K plant species. It accounts for million of users all over the world and already has a strong societal impact in several domains including education, landscape management and agriculture. The big challenge, now, is to train such systems at the scale of the world’s biodiversity. Indeed, the natural world is heavily imbalanced, as some species are much more abundant and easier to photograph than others. Consequently, the performance of automated identification tools is also highly imbalanced. They perform well on average but many species belonging to the long tail of the data distribution are not well recognized or are not recognized at all. This is particularly true for the plantae kingdom spanning over hundreds of thousands of species, and even more for the insectae kingdom, spanning over millions of species. In this talk, we will report our first investigations towards training a convolutional neural network at the scale of the world's flora. Therefore, we built a training set of about 10 million images illustrating 275K species. Training a convolutional neural network on such a large dataset can take up to several months on a single node equipped with four recent GPUs. Moreover, to select the best performing architecture and optimize the hyper-parameters, it is often necessary to train several of such networks. Overall, this becomes a highly intensive computational task that has to be necessary distributed on a large HPC infrastructure. So far, we experimented two of such infrastructures, with more or less success. The first one was the OpenPOWER prototype "Ouessant", part of the technology survey intitative from GENCI, hosted at IDRIS and composed of 12 IBM Minsky nodes, each equipped with 4 Nvidia GPU P100. The experimented deep learning framework was CAFFE extended with the Distributed Deep Learning (DDL) library developed by IBM, available within POWER AI stack. The second experimented infrastructure is a a partition of the Supercomputer BULL Sequana X1000 "Joliot-Curie" purchased by GENCI hosted at CEA and composed of 1656 nodes each equipped with 2 Intel Skylake 8168 24cores@2.7GHz. The used deep learning framework for that machine was the Intel's fork of CAFFE dedicated to improving performance when running on CPU. In this talk, we will report our experience in using these two platforms and present the last progress we made at the time of the presentation.

Biography : Alexis Joly is computer scientist at Inria working on multimedia information retrieval challenges with related interests in representation learning, computer vision and data management. He received his PhD degree in Computer Science in 2005 from the University of La Rochelle. He was involved in the steering board of several European projects (CHORUS+ coordination action, MUSCLE Network of Excellence, VITALAS & GLOCAL Integrated Projects) and many national initiatives related to audiovisual archives, web user generated contents and biodiversity informatics. Since 2011, he is co-leader of the Pl@ntNet project which develops a million-users platform dedicated to automated plant identification and monitoring. Since 2014, he is the PI of the LifeCLEF international research platform dedicated to the computer-assisted identification of living organisms (involving tens of research groups world-wide). Lately, he co-edited a Springer book on Multimedia Tools and Applications for Environmental & Biodiversity Informatics (involving about 50 contributors from all over the world). More generally, he regularly serves on numerous scientific program and organising committees in international journals (Ecological informatics, TPAMI, Trans. on Multimedia, CVIU, MTAP) and conferences (ACM Multimedia, ACM ICMR, CVPR, CLEF). He co-authored a large number of scientific publications in these venues.

 

Register now and get your badge here

  • TERATEC Forum is strictly reserved for professionals.
  • Participation to exhibition, conferences and workshops is free (subject to seats available)
  • On line registration is obligatory to attend exhibition, conferences or the workshops.
  • The Vigipirate security plan being raised to its highest level, it is mandatory to register online in advance and come with an identity card order to participate in TERATEC Forum.
  • The badge is free of charge and give you access to all events TERATEC Forum.

For any other information regarding the workshops, please contact :

Jean-Pascal JEGU
Tel : +33 (0)9 70 65 02 10
jean-pascal.jegu@teratec.fr
Campus TERATEC
2, rue de la Piquetterie
91680 BRUYERES-LE-CHATEL
France


 

© Ter@tec - All rights reserved - Lawful mention