Recursion released a preprint on applying deep-learning-driven analysis of cellular morphology to develop a scalable “phenomics” platform. The preprint demonstrates the capabilities of Recursion’s platform to model complex immune biology and screen for new therapeutics.

INTRODUCTION

In severe cases, COVID-19 ignites a chronic state of inflammation, culminating in acute respiratory distress syndrome (ARDS).

A subset of patients with severe COVID-19 pneumonia progress to ARDS, from which recovery with existing options such as mechanical ventilation is often unachievable. At present, over 690,000 have died worldwide as a result of COVID-19 complications.

Recursion, a digital biology company industrializing drug discovery, set out to use its immunology platform to model and screen for therapeutics that can treat late-stage COVID-19. A high-dimensional cellular model of severe, inflammatory COVID-19 was treated with a library of approved drugs in service of finding compounds that can relieve the damaging, hyper-inflammatory state. The resulting experiments resulted in 70,384 images and 409 GB of data. Together with the data release of the viral infection dataset, RxRx19b represents the only morphological dataset for the COVID-19 ARDS.

THE BIOLOGY

Patient-informed morphological dataset

Recursion modeled the cytokine storm associated with late-stage COVID-19 in endothelial cells by applying cocktails of circulating proteins that mirror those from severe COVID-19 patients as well as healthy control patients. Cocktails were prepared to mimic concentrations of circulating soluble factors as assessed in healthy and COVID-19-infected patients Liu et al (2020). Each cocktail was applied to vascular endothelial cells (HUVEC) and resulting morphological changes were observed. The resulting high-dimensional phenotype was screened with 1,856 FDA-approved drugs and tool compounds, including benchmark compounds currently being clinically evaluated for COVID-19.

RxRx19b is the first morphological dataset representing inflammatory effects and potential treatments in the context of COVID-19 ARDS. Through RxRx19b, researchers in the scientific community will have access to both the images and the corresponding deep learning embeddings to analyze or apply to their own experimentation. The embeddings are 128-dimensional vectors with one vector for each image and come from Recursion’s internal model trained on additional cell types and perturbation modalities not released here or elsewhere. We provide these embeddings to more easily enable researchers without significant compute resources to still explore and uncover insights from this data. Scientific researchers can use the data to further demonstrate how high-content imaging can be used for compound efficacy screening, and we hope new insights can be derived from this dataset.

THE DATA

RxRx19b consists of 70,384 fluorescence microscopy images and their deep learning embeddings. Each image is 2048x2048x6.

RxRx19b is the first public dataset that demonstrates the rescue of morphological effects of the COVID-19-associated cytokine storm. Through RxRx19b, researchers in the scientific community will have access to both the images and the corresponding deep learning embeddings to analyze or apply to their own experimentation. The embeddings are 128-dimensional vectors with one vector for each image and come from Recursion’s internal model trained on additional cell types and perturbation modalities. Results and conclusions drawn from the in vitro experiments and targeted hypothesis-driven research will contribute to the growing body of scientific data in the fight against COVID-19. 

The RxRx19b dataset is closely related to datasets previously released by Recursion, although there are some key differences. For ease of comparison and understanding, we provide the following table highlighting the primary differences:

Release Date
June 2019
August 2020
April 2020
August 2020
January 2023
Cell Types
HUVEC
RPE
U2OS
HepG2
HUVEC
HRCE
Vero
HUVEC
HUVEC
Stains (Channels)
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Hoechst
ConA
Phalloidin
Syto14
WGA
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Plate Density
384-well
1536-well
1536-well
1536-well
1536-well
Imaging Sites per Well
2
4
4
1
1
Perturbations Evaluated
1,138 siRNAs
434 soluble factors at 6 concentrations
1,672 small molecules at 6+ concentrations
Three viral conditions (active virus, irradiated, mock)
1,856 small molecules at 4-6 concentrations in three COVID-19-associated cytokine storm conditions (severe storm, healthy, and no cytokines)
17,063 CRISPR/Cas9-mediated gene knockouts
1,674 compounds at 8 concentrations each
Total Number of Images
125,510
131,953
305,520
70,384
~2.2M
Image Dimension
512x512x6
1024x1024x6
1024x1024x5
2048x2048x6
2048x2048x6
Compressed Dataset Size
~46GB
~185GB
~450GB
~409GB
~83,100GB
LICENSE

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Note, that this license applies only to the RxRx19b dataset, not RxRx1.

DOWNLOAD

The dataset is available in three different parts so you can download only the part(s) that you are interested in. The deep learning embeddings provide an easy way to explore the dataset without downloading the images.


Metadata

A CSV containing the experiment design, e.g. what cell type and treatment are in each well. The schema is provided in the README.

Deep Learning Embeddings

A large CSV file containing all of the deep learning embeddings for each image.

Images

70,384 8-bit PNG 2048x2048 images. The directory structure is explained in the README.

Stay informed about RxRx datasets & models
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Connect on social media
SPONSORED BY: