NeurIPS 2019 competition
coming soon:

CellSignal: Disentangling biological signal from experimental noise in cellular images.

Building Maps of Biology and Chemistry

At Recursion, we build maps of biology and chemistry to explore uncharted areas of disease biology, unravel its complexity, and industrialize drug discovery. Just as a map helps to navigate the physical world, our maps are designed to help us understand as much as we can about the connectedness of human biology so we can navigate the path to new medicines more efficiently.

Our maps are built using image-based high-dimensional data generated in-house. We conduct up to 2.2 million experiments every week in our highly automated labs, where we use deep learning models to embed high dimensional representations of billions of images of human cells that have been manipulated by CRISPR/Cas9-mediated gene knockouts, compounds, or other reagents. This allows us to create representations that can be compared and contrasted to predict trillions of relationships across biology and chemistry — even without physically testing all of the possible combinations. Recursion's Maps and associated applications help navigate complex biology and chemistry by revealing relationships across genes and chemical compounds.

RxRx3 is a publicly available map of biology that represents a small subset – less than 1% – of Recursion’s total dataset. MolRec™️ is a simple demo example of such an application that can be built on this type of map.

17,063

genes profiled*

spanning CRISPR knockouts of most of the human genome

2.2M

images of HUVEC cells

associated DL embeddings of each image also included

1,674

known chemical entities at 8 concentrations each

FDA approved and commercially available bioactive compounds at 8 concentrations and tens of thousands of control images

<1%

of Recursion’s total dataset

*Approximately 16,000 of these genes are anonymized in the dataset, enabling people to explore and learn from this massive dataset while protecting Recursion’s business interests. Recursion may de-anonymize genes in this dataset in the future.

THE POWER OF DATASET RELEASES

Progress in machine learning is punctuated by seminal dataset releases. Perhaps the most famous of these is ImageNet, which helped usher in the next generation of computer vision models. Fei-Fei Li, creator of ImageNet, set out with the goal to “...map out the entire world of objects” so that the models would be trained on realistic data. Just as ImageNet mapped out the world of objects, RxRx3, and the broader RxRx.ai dataset family, is mapping out biology and chemical space.

RxRx3 is one of, if not the, largest collections of cellular screening data, and as far as we know, the largest generated consistently in a single process at a single site. Our goal is to enable the next generation of machine learning methodologies on these to foster research, methods development, and collaboration.

Comparison with Other Computer Vision Datasets

RxRx3

2023

2.2M

JUMP-CP

2023

823,438

Waymo Open Dataset

2018

~105,000

nuScenes

2018

1000

ImageNet (21k)

2009

14M

COCO

2014

330,000

The RxRx3 dataset is closely related to datasets previously released by Recursion, although there are some key differences. For ease of comparison and understanding, we provide the following table highlighting the primary differences:

Release Date

June 2019

August 2020

April 2020

August 2020

January 2023

Cell Types

HUVEC
RPE
U2OS
HepG2

HUVEC

HRCE
Vero

HUVEC

Stains (Channels)

Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA

Hoechst
ConA
Phalloidin
Syto14
WGA

Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA

Plate Density

384-well

1536-well

Imaging Sites per Well

Perturbations Evaluated

1,138 siRNAs

434 soluble factors at 6 concentrations

1,672 small molecules at 6+ concentrations
Three viral conditions (active virus, irradiated, mock)

1,856 small molecules at 4-6 concentrations in three COVID-19-associated cytokine storm conditions (severe storm, healthy, and no cytokines)

17,063 CRISPR/Cas9-mediated gene knockouts
1,674 compounds at 8 concentrations each

Total Number of Images

125,510

131,953

305,520

70,384

~2.2M

Image Dimension

512x512x6

1024x1024x6

1024x1024x5

2048x2048x6

Compressed Dataset Size

~46GB

~185GB

~450GB

~409GB

~83,100GB

License

CC-BY-NC-SA

CC-BY

CC-BY-NC-SA

Download the Dataset

CITATION AND LICENSE

This work is licensed under a Recursion's Non-Commercial End User License Agreement.

Please use the following format to cite this dataset as a whole:

We used the RxRx3 dataset (Fay et al. (2023). RxRx3: Phenomics Map of Biology. bioRxiv 2023.02.07.527350), available from Recursion at rxrx.ai/rxrx3.

DOWNLOAD

Download links for RxRx3 are currently only available in the MolRec application.

You can view the README for the dataset here.

You can read a preprint about RxRx3 on bioRxiv here.

‍

Stay informed about RxRx datasets & models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Connect on social media