Recursion released a preprint on applying deep-learning-driven analysis of cellular morphology to develop a scalable “phenomics” platform. The preprint demonstrates the capabilities of Recursion’s platform to model complex immune biology and screen for new therapeutics.

INTRODUCTION

Cellular response and signaling within the immune microenvironment

High-dimensional morphological profiling captures cellular changes in response to external stimuli such as cytokines, chemokines, growth factors, antibodies and a wide range of other perturbations. Recursion built a comprehensive library of 434 immune stimulants and embarked on a large-scale effort to profile cellular response to not only demonstrate that cellular images are sufficient to appropriately cluster immune perturbations by function but to rapidly employ these states for high-throughput drug screening applications.


Since 2013, Recursion has been generating the industry’s largest fully-relatable dataset of biological images representing human disease biology and pharmaceutical chemistry.

THE BIOLOGY

Morphological analysis of cellular response to immune perturbation

Cells constantly sense their microenvironment to maintain homeostasis and respond to infection and malignancy. In response, cells communicate locally and systemically with a highly diverse array of secreted factors. Both underfunctioning and overfunctioning abnormalities in this process drive a wide range of diseases, and intervention in this process has yielded some of the most successful drugs available. While these processes are typically studied in isolation, a sufficiently high-throughput and high-dimensional approach unlocks the potential for accurate modeling of the immune system and rapid deployment of phenotypic screening programs. Recursion highlights its work in immunity in a recent preprint where four primary human cell types were treated with the same comprehensive immune stimulant library. The images and high-dimensional embeddings from one of these cell types (HUVEC) are available here.

The immune stimulant library

The library components are annotated by general function in the preprint, which serves as a guide for expected high-dimensional similarity (for example, type-I IFNs generate similar morphology).

THE DATA

RxRx2 consists of 131,953 fluorescence microscopy images and their deep learning embeddings. Each image is 1024x1024x6.

RxRx2 demonstrates both the great variety of morphological effects soluble factors have on HUVEC cells and the consistency of these effects within groups of similar function. Through RxRx2, researchers in the scientific community will have access to both the images and the corresponding deep learning embeddings to analyze or apply to their own experimentation. The embeddings are 128-dimensional vectors with one vector for each image and come from Recursion’s internal model trained on additional cell types and perturbation modalities not otherwise released. We provide these embeddings to more easily enable researchers without significant compute resources to still explore and uncover insights from this data. Scientific researchers can use the data to further demonstrate how high-content imaging can be used for screening immune responses and identification of functionally-similar factor groups.


The RxRx2 dataset is closely related to datasets previously released by Recursion, although there are some key differences. For ease of comparison and understanding, we provide the following table highlighting the primary differences:

Release Date
June 2019
August 2020
April 2020
August 2020
January 2023
Cell Types
HUVEC
RPE
U2OS
HepG2
HUVEC
HRCE
Vero
HUVEC
HUVEC
Stains (Channels)
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Hoechst
ConA
Phalloidin
Syto14
WGA
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA
Plate Density
384-well
1536-well
1536-well
1536-well
1536-well
Imaging Sites per Well
2
4
4
1
1
Perturbations Evaluated
1,138 siRNAs
434 soluble factors at 6 concentrations
1,672 small molecules at 6+ concentrations
Three viral conditions (active virus, irradiated, mock)
1,856 small molecules at 4-6 concentrations in three COVID-19-associated cytokine storm conditions (severe storm, healthy, and no cytokines)
17,063 CRISPR/Cas9-mediated gene knockouts
1,674 compounds at 8 concentrations each
Total Number of Images
125,510
131,953
305,520
70,384
~2.2M
Image Dimension
512x512x6
1024x1024x6
1024x1024x5
2048x2048x6
2048x2048x6
Compressed Dataset Size
~46GB
~185GB
~450GB
~409GB
~83,100GB
DOWNLOAD

The dataset is available in three different parts so you can download only the part(s) that you are interested in. The deep learning embeddings provide an easy way to explore the dataset without downloading the images.


Metadata

A CSV containing the experiment design, e.g. what cell type and treatment are in each well. The schema is provided in the README.

Deep Learning Embeddings

A large CSV file containing all of the deep learning embeddings for each image.

Images

131,953 8-bit PNG 1024x1024x6 images. The directory structure is explained in the README.

Stay informed about RxRx datasets & models
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Connect on social media
SPONSORED BY: