Combining image-based serology and machine learning to screen for SARS-CoV-2
From our Euro-BioImaging Finnish Advanced Microscopy Node, Lassi Paavolainen and colleagues have recently published their work on a mini-immunofluorescence assay to test for SARS-CoV-2 antibodies in patient blood samples. This publication is not only an advancement in machine-learning applications for low-biosafety disease testing, but also at the forefront of FAIR data sharing. Mobilized as part of the BY-COVID project, the entire dataset containing raw images and segmentation masks is now available for anyone to view and reuse on the BioImage Archive.
In the recently published article ‘Image-based and machine learning-guided multiplexed serology test for SARS-CoV-2’ the international and multidisciplinary team around Lassi Paavolainen, Academy Research Fellow at the Institute for Molecular Medicine Finland, Finnish Advanced Microscopy Node of Euro-BioImaging, describes a miniaturized immunofluorescence assay for measuring antibody response in patient blood samples. The assay employs a custom neural network-based image analysis pipeline to enable the simultaneous measurement of different immunoglobulins against different viral antigens which are expressed through transfection. This approach allows for the assay to be run in an automated and high-throughput manner while still being carried out in a low-biosafety environment. The method was developed and validated using SARS-CoV-2 as a model pathogen, but is broadly applicable to other pathogens and can be rapidly adapted to emerging ones.
Making infectious disease data available
This focus on SARS-CoV-2 and diseases in general makes this study a fantastic example of data that the Horizon Europe funded BY-COVID project was designed to make accessible and usable.
"Within the BY-COVID project, I'm in charge of bringing COVID-19 and infectious diseases related image datasets into public repositories. I support individual dataset owners in curating and depositing their data."
Isabel Kemmer, Euro-BioImaging's FAIR Image Data Steward
As a result of Isabel's work in the BY-COVID project, the complete image dataset of the mini-immunofluorescence assay is now shared with the public (https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD1076). Since April 1st 2024, all of the raw images as well as segmentation masks are available for reuse at the Bioimage Archive, a free and public online repository that stores and disseminates biological images captured through any imaging modality. “I was immediately excited when my colleague Aastha came to my office one day and told me she’d attended a talk by Lassi Paavolainen about their immunofluorescence assay. I knew right away that this rich dataset would be really valuable if it were publicly available“, explains Isabel Kemmer. “I was all the more delighted that Lassi was also on board with sharing the data so we worked together on the submission.” As Lassi Paavolainen states "I would not have deposited the data without the involvement of Euro-BioImaging" this collaboration was very much appreciated and it made it possible for the data to be reusable.
The data behind the assay
With a total size of over 12 TB and over 650000 single files the dataset is considerably large, in fact, in terms of number of files it is the fourth largest dataset on the BioImage Archive, at the time of release. These files comprise all the raw images from the study and the segmentation masks that underlie the machine learning model. Together with the BioImage Archive Team, Lassi and Isabel planned the submission of this large dataset. Isabel comments, "It was great that Lassi's team already had their dataset well organized, so we could spend more time on other aspects of FAIR data instead of looking for missing files.” In fact, human resources are often the limiting step in making data truly FAIR, even more so when a data steward is not involved, as the time required to share data without a robust and automated pipeline is often underestimated.
“The most time consuming step in data deposition was designing the deposition and then putting all metadata together. Isabel's biggest help was initially filling the metadata documents.”
Lassi Paavolainen, Academy Research Fellow at the Institute for Molecular Medicine Finland
The metadata for the images is described according to the REMBI guidelines (REcommended Metadata for Biological Images). The REMBI framework has come up with eight different categories of metadata and suggests including information on subcategories with each one. These cover everything from information about the study as a whole and the people involved, to specifications of the biological sample and specimen, as well as details about the imaging method, acquisition, and analysis parameters.
Since 2022, the BioImage Archive has implemented the REMBI schema for their submission, and depositors are asked to provide this information. But going beyond, most of the metadata information about the individual images is contained in the so-called file list, a table listing all deposited files along with detailed metadata. In the case of the presented assay, this includes information on the well identities, antigens, antibodies, treatments and much more.
"The more detailed this information is at the individual file level, the more you can reuse the files as separate entities and the dataset as a whole"
Isabel Kemmer, Euro-BioImaging's FAIR Image Data Steward
Designed for artificial intelligence
In addition to the images, the current deposition also includes the segmentation masks for cells and nuclei. Not only does this increase the reproducibility of the published study, but for some reuse cases, these segmentation masks may be even more relevant than the images themselves. For training and developing new machine-learning models, high-quality annotations on images are key. However, the access to such data is often hindered by the lack of standards in the process of data sharing. To solve this issue, the MIFA recommendations have been developed by the imaging community, based on a workshop initiated from the Horizon Europe funded AI4Life project.
MIFA provides guidelines for Metadata, Incentives, Formats, and Accessibility for bioimage annotations with the goal of accelerating the development of AI tools for bioimage analysis by facilitating access to high quality training data. Providing metadata according to MIFA is now implemented for submitting annotated datasets to the Bioimage Archive. Isabel says: “We were one of the first datasets to use the new annotations feature when we submitted our data. It was great working with the BioImage Archive team to beta test this feature.” In addition, it is now possible to make a stand-alone submission where annotations have been created for an existing dataset.
The current serology dataset not only stops at the segmentation masks, but also includes cross-links to materials on Github and Zenodo. Github provides access to the code of the pipeline used for training, prediction, and evaluation of the machine learning models, while Zenodo holds the extracted single cell features for training and testing machine learning models. This makes it easy to reproduce the study and allows for mutual findability and reusability - a further step towards not only FAIR bioimage data but FAIR publications.
What’s next
"We are currently discussing extending the available annotations to include cases where human experts have validated the data," says Isabel. In line with FAIR principles, the BioImage Archive allows for versioning of different sets of annotations. In this way, the provenance of the data is preserved by keeping track of previous versions. "I'd love to keep working with the Lassi’s team and possibly create depositions for follow-up data from the Lassi’s lab," concludes Isabel, adding “I’m also really excited to take on more projects beyond the BY-COVID project with a wider range of topics”. One step at a time, the team at Euro-BioImaging is working hard to make the world of bioimaging more FAIR every day.