To accompany OpenPhenom-S/16, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset, image embeddings computed with OpenPhenom-S/16, and associations between the included small molecules and genes.
The RxRx3-core dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community. Learn more about RxRx3-core in Recursion's pre-print published at the LMRL Workshop at ICLR 2025.
Mapping the mechanisms by which drugs exert their actions is an important challenge in advancing the use of high-dimensional biological data like phenomics. We are excited to release the first dataset of this scale probing concentration-response along with a benchmark and model to enable the research community to rapidly advance this space.
from datasets import load_dataset
rxrx3_core = load_dataset("recursionpharma/rxrx3-core")
from huggingface_hub import hf_hub_download
import pandas as pd
file_path_metadata =
hf_hub_download("recursionpharma/rxrx3-core",
filename="metadata_rxrx3_core.csv",repo_type="dataset")
file_path_embs = hf_hub_download("recursionpharma/rxrx3-core", filename="OpenPhenom_rxrx3_core_embeddings.parquet",repo_type="dataset")
open_phenom_embeddings = pd.read_parquet(file_path_embs)
rxrx3_core_metadata = pd.read_csv(file_path_metadata)
Benchmarking code for evaluating biological relationship recall and compound-gene activity using this dataset is available in the EFAAR benchmarking repo.