To accompany OpenPhenom-S/16, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset, image embeddings computed with OpenPhenom-S/16, and associations between the included small molecules and genes. The dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community.
Mapping the mechanisms by which drugs exert their actions is an important challenge in advancing the use of high-dimensional biological data like phenomics. We are excited to release the first dataset of this scale probing concentration-response along with a benchmark and model to enable the research community to rapidly advance this space.
from datasets import load_dataset
rxrx3_core = load_dataset("recursionpharma/rxrx3-core")
from huggingface_hub import hf_hub_download
import pandas as pd
file_path_metadata =
hf_hub_download("recursionpharma/rxrx3-core",
filename="metadata_rxrx3_core.csv",repo_type="dataset")
file_path_embs = hf_hub_download("recursionpharma/rxrx3-core", filename="OpenPhenom_rxrx3_core_embeddings.parquet",repo_type="dataset")
open_phenom_embeddings = pd.read_parquet(file_path_embs)
rxrx3_core_metadata = pd.read_csv(file_path_metadata)
Benchmarking code for evaluating biological relationship recall and compound-gene activity using this dataset is available in the EFAAR benchmarking repo.