INTRODUCTION

To accompany OpenPhenom-S/16, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset, image embeddings computed with OpenPhenom-S/16, and associations between the included small molecules and genes. The dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community.

Mapping the mechanisms by which drugs exert their actions is an important challenge in advancing the use of high-dimensional biological data like phenomics. We are excited to release the first dataset of this scale probing concentration-response along with a benchmark and model to enable the research community to rapidly advance this space.

Loading the RxRx3-core image dataset

from datasets import load_dataset
rxrx3_core = load_dataset("recursionpharma/rxrx3-core")

Loading OpenPhenom-S/16 embeddings and metadata for RxRx3-core

from huggingface_hub import hf_hub_download
import pandas as pd

file_path_metadata =
hf_hub_download("recursionpharma/rxrx3-core",
filename="metadata_rxrx3_core.csv",repo_type="dataset")
file_path_embs = hf_hub_download("recursionpharma/rxrx3-core", filename="OpenPhenom_rxrx3_core_embeddings.parquet",repo_type="dataset")

open_phenom_embeddings = pd.read_parquet(file_path_embs)
rxrx3_core_metadata = pd.read_csv(file_path_metadata)

Benchmarking code for evaluating biological relationship recall and compound-gene activity using this dataset is available in the EFAAR benchmarking repo.

Stay informed about RxRx datasets & models
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Connect on social media
SPONSORED BY: