skip to main content



JRA 1 (SYNTHESYS 3) – Moving from physical to digital collections

jellyfish2                                    shergotty long

The aim of the Joint Research Activity (JRA) is to improve the quality of and increase access to digital collections and data within natural history institutions’ virtual collections

 

Objective 1: Automated data collection from digital images includes the following tasks:

-       Automatic processing (segmentation) of digital images: research and development of edge detection technology to locate and classify multiple regions of interest within images of NH specimens.

-       Automatic metadata capture: Develop software that will automatically identify properties of an image.

-       High Resolution 3D colour image acquisition: Complementary approaches (colour surface scanning, photogrammetry) will be developed in order to provide complete information (3D and colour data) of the specimen.

 

Objective 2: New methods for 3D digitisation of NH collections includes the following tasks:

-       Research on different 3D techniques: The size, shape and the different structured surfaces of specimens make it necessary to adapt the process of digitisation and to develop different standards by selecting exemplary object classes with which to optimise the process of 3D imaging. The aim is to create digital 3D objects viewable from every angle and with high depth of focus by using stacking techniques.

-       Micro-Computed Tomography (Micro-CT) for NH collections: Develop protocols and workflows for the rapid digitisation of collections (sample preparation, scanning parameters, model creation).

 

Objective 3: Crowdsourcing metadata enrichment of digital images includes the following tasks:

-       Research into crowdsourcing methodologies for NH collections: Identify which digital image data are most appropriate for use with crowdsourcing.

-       Development of website to allow crowdsourcing data capture: Engage with existing crowdsourcing sites and use the results of the pilot study (NA 3 task 1.4) to develop an online mechanism for allowing the public to engage with biodiversity research.

 

Objective 4: Access and management of an integrated European digital collection includes the following tasks:

-        Feasibility research on a “digitise on demand” (DoD) service for European NH Institutions: In order to validate the market for a DoD service, feasibility research will be completed to identify the barriers to adopting a DoD approach across Participants

-       Open Access to captured data: Public-facing, Open Access portals will be utilised to publish the resulting SYNTHESYS content (images and metadata).

Obj 1: Automated data collection from digital images
Obj 2: New methods for 3D digitisation of NH collections
Obj 3: Crowdsourcing metadata enrichment of digital images
Obj 4. Access and management of an integrated European digital collection (with NA2)

 

Obj 1: Automated data collection from digital images

Participants: RBGE (Lead), NHM, RBGK, MNHN, CSIC, BGBM, MfN, HNHM, RBINS, RMCA, NMP

Task 1.1. Automatic processing (segmentation) of digital images

Research and develop edge detection technology to locate and classify multiple regions of interest within images of NH specimens. Using the principle that pixels in a segment are similar with respect to some characteristic or computed property (e.g. colour, intensity, or texture), develop a method to semi-automatically detect, crop and classify these regions of interest such that they can be subject to appropriate additional processing.

Task 1.2. Automatic metadata capture

Develop software that will automatically identify properties of an image. These data “facets” will be automatically captured without human intervention and provide categories of information that allow Users to easily search and browse virtual collections more effectively.

Specimen label data will be subjected to Optical Character Recognition (OCR) software to extract the text string and research methods to improve the accuracy of OCR use on handwritten labels. OCR-extracted text collected from handwritten labels will need to be subject to further processing and validation, such as via crowdsourcing methodologies (obj. 2).

Task 1.3. High Resolution 3D colour image acquisition

Complementary approaches (colour surface scanning, photogrammetry) will be developed in order to provide complete information (3D and colour data) of the specimen. Collaborate with existing European projects such as 3DCOFORM whose focus is on Cultural Heritage digitisation. This task will develop 3DCOFORM outputs to enable their use with NH specimens.

 

Obj 2: New methods for 3D digitisation of NH collections

Participants: MfN (Lead), NHM, RBGE, NHMW, HCMR, BGBM

Task 2.1. Research on different 3D techniques

The size, shape and the different structured surfaces of specimens make it necessary to adapt the process of digitisation and to develop different standards by selecting exemplary object classes with which to optimise the process of 3D imaging. The aim is to create digital 3D objects viewable from every angle and with high depth of focus by using stacking techniques. The resultant 3D images will show all relevant details necessary for determination of the specimen. Every image will have the possibility to zoom into every part of the specimen. It is anticipated that some taxon groups or specimens will not fit the exemplary object classes, and a determination

might not be possible from one 3D scan; high-resolution images attached to the 3D model to how special details (i.e. microscopic pictures of copulatory organs) will make the resultant new virtual collection a multimedia object. The results of this task will input into NA2 handbook of best practice and standards for 3D imaging of type specimens (Task 1.2).

Task 2.2 Micro-Computed Tomography (Micro-CT) for NH collections

Develop protocols and workflows for the rapid digitisation of collections (sample preparation, scanning parameters, model creation). The resulting models will be displayed and disseminated through a web-based framework which will allow the user to manipulate the 3D tomograms through a series of online tools that will be created.

 

Obj 3: Crowdsourcing metadata enrichment of digital images

Participants: VIZZ (Lead), NHM, RGBE, MNHN, CSIC, MfN, NHMW, NMP, VU

Task 3.1. Research into crowdsourcing methodologies for NH collections

Identify which digital image data are most appropriate for use with crowdsourcing. This work will draw on experience with other citizen science crowdsourcing efforts, such as the Zooniverse (www.zooniverse.org) project.

Work will focus on 1) the potential for crowdsourcing transcription of handwritten materials (e.g. specimen labels, catalogue cards, letters and diaries), which contain a vast and untapped wealth of historical information about the distribution, identity and origin of NH specimens and 2) image-based identification of unidentified specimen by expert communities. The goals will be to develop a specification that supports these functions on a website that hosts NH crowdsourcing projects. This specification will include a mechanism to integrate output with existing social media sites, maximising the reach to interested parties.

Task 3.2 Development of website to allow crowdsourcing data capture

Engage with existing crowdsourcing sites and use the results of the pilot study (NA 3 task 1.4) to develop an online mechanism for allowing the public to engage with biodiversity research. As part of this work, map the Darwin Core data standard field to crowdsource label data information, ensuring that the collected data maps to existing NH collections data management systems. Once these integration mechanisms exist, the resultant website will be a sustainable source of volunteers that NH institutions can engage with after the life of the project. SYNTHESYS3 will offer LifeWatch the technology developed to use as a basis for its own crowdsourcing

projects. The crowdsourcing website will be monitored and the User engagement tracked. This will be used further improve User uptake. Recommendations will be made on the organisational embedding and sustainability of the website into Consortium partners’ subsequent workflows.

 

Obj 4: Access and management of an integrated European digital collection

Participants: NHM (Lead), RBGE, UCPH, CSIC, NCB; NMP

Task 4.1. Feasibility research on a “digitise on demand” (DoD) service for European NH Institutions

In order to validate the market for a DoD service, feasibility research will be completed to identify the barriers to adopting a DoD approach across Participants. Specific activities include establishing the technical DoD infrastructure for a detailed market validation; to conduct market validation towards potential adopters; collecting data on the feasibility of the service; and preparing a deployment plan for the future of the DoD network. Research will require careful costing of all activities, including provision for a pay-as-you-go service to help prioritise activities. Requests for digitisation will need to be carefully matched to appropriate technologies through an automated system that incorporates the best practice guide produced in NA3 (task 1.3). Development of a DoD service has the potential to offer access NH specimens across Europe in a highly scalable manner that can be used to either digitise all material for select groups or complete gaps in a particular collection.

Task 4.2. Open Access to captured data

Public-facing, Open Access portals will be utilised to publish the resulting SYNTHESYS content (images and metadata). Initially the data will be made available via GBIF and supplied to the EU-funded LifeWatch project. SYNTHESYS will also develop the protocols to ensure that this output can be accessed by the EU-funded Europeana portal and the international Encyclopedia of Life project. Open licensing and comprehensive dissemination is essential to ensure that all audiences are aware of, and able to access, the NH images and metadata.

Edge detection technology, "Inselect" (Del 4.1, August 2014)

Open source software that can recognise, process and annotate images containing multiple specimens (e.g. whole drawer scans of pinned insects or slide arrays) was developed.  A workshop was held in September 2014 to develop a specification and produce a functional software prototype called Inselect.  This is open source and openly accessible via GitHub.

Read about Inselect in more detail, with presentations from the workshop presentation and summary presentation. In conjunction with NA3 activities, Inselect staff also delivered a series of training presentations and workshops at locations such as Stockholm, Ottawa and the Smithsonian.

Optimal Automated Metadata Capture (Del 4.2, August 2015)

This aspect of the JRA focused on the development of software that is able to automatically identify properties of an image without human intervention, and capture easily searchable information that can be integrated into virtual Natural History Collections.

This research was divided into four aspects

  1. Review of development of tools and workflows which incorporate automatic or semi-automatic metadata capture using Optical Character Recognition (OCR)
  2. Review of development of Natural Language Processing (NLP) for parsing OCR text into Darwin core fields
  3. Review of Handwritten Text Recognition (HTR) and (semi) automatic specimen image classification.
  4. Review of automatic capture of character including colour, shape as well as exif data.

You can read the executive summary and full report here

An additional report "Metadata extraction from digital images using machine learning" can be read here.

Research into Crowdsourcing metadata enrichment methodologies for Natural History Collections (Del 4.3, August 2015)

Multiple strands of research within SYNTHESYS3 investigated how crowdsourcing can be harnessed to help release data from digitised natural history specimens, primarily through transcription of label information.

Rather than developing a new crowdsourcing platform, the SYNTHESYS3 consortium adapted to the quickly-changing landscape and used Notes from Nature, hosted by Zooniverse, to pilot two major crowdsourcing transcription projects based on digitised specimens at NHM London: Miniature Lives Magnified, and Miniature Fossils Magnified.

A full report, including discussion on engagement of crowdsourcing participants and use of social media, can be found here.

Part of this work researched engaging the public in crowdsourced metadata enrichment of digitised natural history collections and the report "Tapping into the Power of the Crowd" can be read here.

An additional report on the data quality of the first series of crowdsourcing projects includes recommendations on how to build more effective projects. The report "Data quality of crowdsourced label transcription: Miniature Lives Magnified" can be read in full here.

Protocols for optimal MicroCT (Del 4.4, August 2016)

A handbook outlining best practice for optimal use of Micro-computer tomography was developed by researchers from HCMR (Greece), MNHN (Paris), and NHM (London) specifically targeted towards imaging of Natural History specimens.  The full handbook can be read here.

This research also resulted in a publication:

Keklikoglou K, Faulwetter S, Chatzinikolaou E, Michalakis N, Filiopoulou I, Minadakis N, Panteri E, Perantinos G, Gougousis A, Arvanitidis C (2016) Micro-CTvlab: A web based virtual gallery of biological specimens using X-ray microtomography (micro-CT). Biodiversity Data Journal 4: e8740. https://doi.org/10.3897/BDJ.4.e8740

Report on feasibility of a digitisation on demand service for natural history collections (Del 4.5, August 2017)

This research examined the possibility and feasibility of implementing a "Digitisation on Demand" (DoD) model within Natural History institutions, allowing users to request digital copies of specimens or specimen data as part of an integrated service model. A survey, distributed to SYNTHESYS3 partners and other NH institutions, assessed current readiness of European NH infrastructure to implement such a service, as well as the current extent of digitisation of collections.  This research also examined potential barriers to developing DoD, and investigated whether priorities for digitisation can be based on crowdfunding initiatives.  Case studies on digitisation of fossil brachiopods by the Naturhistoriske riksmuseet in Stockholm, and  label data in the Munchenberg Herbarium, were also carried out as pilots.

The full report can be read here 

Report on digitisation using automated file renaming and image processing can be read in full here.