Biologically Accurate 3D Cell and Nuclear Segmentation at Scale via Combining Training Assay and Iterative Deep Learning Approaches
Deep neural networks have been widely used for segmentation in microscopy images and have achieved great success for some problems too difficult to tackle by traditional image processing techniques. Regardless of the deep learning models (e.g., U-Net, Mask-RCNN, StarDist, etc.), the most accurate segmentation in general is still achieved by training with large sets of data with target segmentations (usually referred as ground truth). These ground truths are commonly created by manual annotation of the pixels or voxels in images. For 3D images this annotation process is extremely time-consuming and the annotated shape lacks spatial smoothness, especially when the shape has complex morphology. More importantly, the manual annotation may be significantly different from the biologically correct ground truth. Segmentation obtained from models trained with such ground truths will be problematic for biological research where the absolute accuracy matters. In this work, we will present two methods: (1) Training Assay and (2) iterative deep learning. The Training Assay approach is a general computation-experiment co-design concept that can help creating more biologically correct segmentations. Iterative deep learning is a workflow introduced in the Allen Cell Structure Segmenter and specifically designed for building training data without the need for extensive manual annotation of segmentation targets and requiring only very limited human intervention. We combined the iterative deep learning and Training Assay approaches together with additional auxiliary algorithms (e.g. mitotic daughter cell pair detection) to create a workflow to segment with high accuracy all instances of cells and nuclei in 3D microscopy images of tightly packed human induced pluripotent stem cells at scale. This segmentation workflow created ~220,000 single cell images of 25 different cell lines in the Allen Cell Image Data Collection (based on ~18,000 field of view z-stacks), thus overcoming a fundamental challenge to performing image-based single cell analysis at scale.
Building Computational Transfer Functions on 3D Light Microscopy Images: From a General Deep Learning Toolkit to Biology-driven Validation
Cell and developmental biologists have the difficult task of identifying an optimal, balanced set of appropriate microscopy settings for their specific experiment. They must choose the microscope modality, magnifications, resolution settings, laser powers, etc. that permit the collection of the desired data, something made even more difficult if the desired data involves live imaging. Reducing the types of compromises that have to be made for these experiments permits entirely new types of datasets to be collected and analyzed. For example, if we could computationally transform the images in a long timelapse movie of a large field of view (FOV) at low magnification/resolution into images with the resolution comparable to enhanced-resolution microscopy images , this would permit analysis of a large colony of cells for a long time at high resolution. Deep learning methods have been developed to achieve transformations between microscopy images, such as image restoration, resolution enhancement, and denoising, but mostly for 2D images. Collecting high quality 3D training data is challenging as it requires pairs of images of identical samples representing the two different types of images to be transferred. In this work, we will present our open source Transfer Functions toolkit composed of two key parts: (1) a 3D registration workflow to align the training image pairs computationally and (2) a general deep learning framework based on the Conditional Generative Adversarial Network (cGAN) including an optional new Auto-Align module for improving the image pair alignment accuracy if computational alignment is not sufficient. We also present several approaches for quantitative, application-specific biology-driven validation of the prediction results. Since the prediction will never be identical to the real target image, this type of validation is crucial to determine whether predicted images generated by deep-learning models such as this Transfer Function toolkit can be used for appropriate biological interpretation.