Combining image-based serology and machine learning to screen for SARS-CoV-2

Published June 12, 2024

Making infectious disease data available

This focus on SARS-CoV-2 and diseases in general makes this study a fantastic example of data that the Horizon Europe funded BY-COVID project was designed to make accessible and usable.

"Within the BY-COVID project, I'm in charge of bringing COVID-19 and infectious diseases related image datasets into public repositories. I support individual dataset owners in curating and depositing their data."
Isabel Kemmer, Euro-BioImaging's FAIR Image Data Steward

As a result of Isabel's work in the BY-COVID project, the complete image dataset of the mini-immunofluorescence assay is now shared with the public (https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD1076). Since April 1st 2024, all of the raw images as well as segmentation masks are available for reuse at the Bioimage Archive, a free and public online repository that stores and disseminates biological images captured through any imaging modality. “I was immediately excited when my colleague Aastha came to my office one day and told me she’d attended a talk by Lassi Paavolainen about their immunofluorescence assay. I knew right away that this rich dataset would be really valuable if it were publicly available“, explains Isabel Kemmer. “I was all the more delighted that Lassi was also on board with sharing the data so we worked together on the submission.” As Lassi Paavolainen states "I would not have deposited the data without the involvement of Euro-BioImaging" this collaboration was very much appreciated and it made it possible for the data to be reusable.

The data behind the assay

With a total size of over 12 TB and over 650000 single files the dataset is considerably large, in fact, in terms of number of files it is the fourth largest dataset on the BioImage Archive, at the time of release. These files comprise all the raw images from the study and the segmentation masks that underlie the machine learning model. Together with the BioImage Archive Team, Lassi and Isabel planned the submission of this large dataset. Isabel comments, "It was great that Lassi's team already had their dataset well organized, so we could spend more time on other aspects of FAIR data instead of looking for missing files.” In fact, human resources are often the limiting step in making data truly FAIR, even more so when a data steward is not involved, as the time required to share data without a robust and automated pipeline is often underestimated.

“The most time consuming step in data deposition was designing the deposition and then putting all metadata together. Isabel's biggest help was initially filling the metadata documents.”
Lassi Paavolainen, Academy Research Fellow at the Institute for Molecular Medicine Finland

The metadata for the images is described according to the REMBI guidelines (REcommended Metadata for Biological Images). The REMBI framework has come up with eight different categories of metadata and suggests including information on subcategories with each one. These cover everything from information about the study as a whole and the people involved, to specifications of the biological sample and specimen, as well as details about the imaging method, acquisition, and analysis parameters.

Since 2022, the BioImage Archive has implemented the REMBI schema for their submission, and depositors are asked to provide this information. But going beyond, most of the metadata information about the individual images is contained in the so-called file list, a table listing all deposited files along with detailed metadata. In the case of the presented assay, this includes information on the well identities, antigens, antibodies, treatments and much more.

"The more detailed this information is at the individual file level, the more you can reuse the files as separate entities and the dataset as a whole"
Isabel Kemmer, Euro-BioImaging's FAIR Image Data Steward

Designed for artificial intelligence

In addition to the images, the current deposition also includes the segmentation masks for cells and nuclei. Not only does this increase the reproducibility of the published study, but for some reuse cases, these segmentation masks may be even more relevant than the images themselves. For training and developing new machine-learning models, high-quality annotations on images are key. However, the access to such data is often hindered by the lack of standards in the process of data sharing. To solve this issue, the MIFA recommendations have been developed by the imaging community, based on a workshop initiated from the Horizon Europe funded AI4Life project.

MIFA provides guidelines for Metadata, Incentives, Formats, and Accessibility for bioimage annotations with the goal of accelerating the development of AI tools for bioimage analysis by facilitating access to high quality training data. Providing metadata according to MIFA is now implemented for submitting annotated datasets to the Bioimage Archive. Isabel says: “We were one of the first datasets to use the new annotations feature when we submitted our data. It was great working with the BioImage Archive team to beta test this feature.” In addition, it is now possible to make a stand-alone submission where annotations have been created for an existing dataset.

The current serology dataset not only stops at the segmentation masks, but also includes cross-links to materials on Github and Zenodo. Github provides access to the code of the pipeline used for training, prediction, and evaluation of the machine learning models, while Zenodo holds the extracted single cell features for training and testing machine learning models. This makes it easy to reproduce the study and allows for mutual findability and reusability - a further step towards not only FAIR bioimage data but FAIR publications.

What’s next

"We are currently discussing extending the available annotations to include cases where human experts have validated the data," says Isabel. In line with FAIR principles, the BioImage Archive allows for versioning of different sets of annotations. In this way, the provenance of the data is preserved by keeping track of previous versions. "I'd love to keep working with the Lassi’s team and possibly create depositions for follow-up data from the Lassi’s lab," concludes Isabel, adding “I’m also really excited to take on more projects beyond the BY-COVID project with a wider range of topics”. One step at a time, the team at Euro-BioImaging is working hard to make the world of bioimaging more FAIR every day.

More news from Euro-BioImaging

10 Year Anniversary Celebration - CZ BioImaging

April 25, 2025

Czech-BioImaging celebrates its 10 year anniversary!

In 2025, Czech-BioImaging celebrates its 10th anniversary. Czech-BioImaging is the national infrastructure that brings together 16 leading imaging centers across the Czech Republic,…

April 24, 2025

Shedding light on colorectal cancer progression

Colorectal cancer deaths are rising among young adults. Rabe’ah Almahassneh, a PhD student from the University of Valencia who is developing her research work…

April 17, 2025

Euro-BioImaging at the German Biotechnology Days 2025

The Euro-BioImaging Bio-Hub team represented Euro-BioImaging with a booth at the German Biotechnology Days 2025, which took place on 9th and 10th April 2025…

See all news