INTRODUCTION

To accompany OpenPhenom-S/16, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset, image embeddings computed with OpenPhenom-S/16, and associations between the included small molecules and genes.

The RxRx3-core dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community. Learn more about RxRx3-core in Recursion's pre-print published at the LMRL Workshop at ICLR 2025.

Mapping the mechanisms by which drugs exert their actions is an important challenge in advancing the use of high-dimensional biological data like phenomics. We are excited to release the first dataset of this scale probing concentration-response along with a benchmark and model to enable the research community to rapidly advance this space.

Loading the RxRx3-core image dataset

from datasets import load_dataset
rxrx3_core = load_dataset("recursionpharma/rxrx3-core")

Loading OpenPhenom-S/16 embeddings and metadata for RxRx3-core

from huggingface_hub import hf_hub_download
import pandas as pd

file_path_metadata =
hf_hub_download("recursionpharma/rxrx3-core",
filename="metadata_rxrx3_core.csv",repo_type="dataset")
file_path_embs = hf_hub_download("recursionpharma/rxrx3-core", filename="OpenPhenom_rxrx3_core_embeddings.parquet",repo_type="dataset")

open_phenom_embeddings = pd.read_parquet(file_path_embs)
rxrx3_core_metadata = pd.read_csv(file_path_metadata)

Benchmarking code for evaluating biological relationship recall and compound-gene activity using this dataset is available in the EFAAR benchmarking repo.

Stay informed about RxRx datasets & models
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Connect on social media
SPONSORED BY: