At Recursion, we build maps of biology and chemistry to explore uncharted areas of disease biology, unravel its complexity, and industrialize drug discovery. Just as a map helps to navigate the physical world, our maps are designed to help us understand as much as we can about the connectedness of human biology so we can navigate the path to new medicines more efficiently.
Our maps are built using image-based high-dimensional data generated in-house. We conduct up to 2.2 million experiments every week in our highly automated labs, where we use deep learning models to embed high dimensional representations of billions of images of human cells that have been manipulated by CRISPR/Cas9-mediated gene knockouts, compounds, or other reagents. This allows us to create representations that can be compared and contrasted to predict trillions of relationships across biology and chemistry — even without physically testing all of the possible combinations. Recursion's Maps and associated applications help navigate complex biology and chemistry by revealing relationships across genes and chemical compounds.
RxRx3 is a publicly available map of biology that represents a small subset – less than 1% – of Recursion’s total dataset. MolRec™️ is a simple demo example of such an application that can be built on this type of map.
spanning CRISPR knockouts of most of the human genome
associated DL embeddings of each image also included
FDA approved and commercially available bioactive compounds at 8 concentrations and tens of thousands of control images
*Approximately 16,000 of these genes are anonymized in the dataset, enabling people to explore and learn from this massive dataset while protecting Recursion’s business interests. Recursion may de-anonymize genes in this dataset in the future.
Progress in machine learning is punctuated by seminal dataset releases. Perhaps the most famous of these is ImageNet, which helped usher in the next generation of computer vision models. Fei-Fei Li, creator of ImageNet, set out with the goal to “...map out the entire world of objects” so that the models would be trained on realistic data. Just as ImageNet mapped out the world of objects, RxRx3, and the broader RxRx.ai dataset family, is mapping out biology and chemical space.
RxRx3 is one of, if not the, largest collections of cellular screening data, and as far as we know, the largest generated consistently in a single process at a single site. Our goal is to enable the next generation of machine learning methodologies on these to foster research, methods development, and collaboration.
The RxRx3 dataset is closely related to datasets previously released by Recursion, although there are some key differences. For ease of comparison and understanding, we provide the following table highlighting the primary differences:
This work is licensed under a Recursion's Non-Commercial End User License Agreement.
Please use the following format to cite this dataset as a whole:
We used the RxRx3 dataset (Fay et al. (2023). RxRx3: Phenomics Map of Biology. bioRxiv 2023.02.07.527350), available from Recursion at rxrx.ai/rxrx3.