Professor and Chair UCLA Los Angeles, California, United States
Introduction: : Directed evolution and high-throughput screening have the potential to drive discovery of novel proteins and cell-based biological products. Advances in machine learning (ML) and computational design are reshaping this landscape, enabling the creation of de novo protein sequences by unlocking the language of biology in unprecedented ways. Yet, the field is only at its infancy and requires substantial volumes of data derived from function-based profiling of the mutation space via high-throughput screening to effectively train predictive models and map the complex fitness landscape. Laboratory automation tools designed for efficient, function-based screening, capable of deriving biologically relevant functional metrics, can bridge this gap. By generating large amounts of high-quality data, these tools can transform biological discovery at scale, enhancing our capacity to engineer proteins with unmatched precision and impact. A key milestone of this research is the ability to comprehensive map sequence–function relationships that define biosensor performance. We introduce a high-throughput deep mutational screening workflow leveraging lab-on-a-particle technology that systematically evaluates variant libraries to generate functional fitness landscapes that facilitate the discovery of numerous high-performing mutants, providing insights into the underlying design rules, paving the way for rational biosensor optimization and broader applications in synthetic biology, diagnostics, and therapeutic monitoring.
Materials and
Methods: : To do this, we leverage PicoShells – a high-throughput, microparticle-based screening platform designed for single-cell encapsulation and efficient screening of vast libraries (Fig. 1a). Their semi-permeable structure allows nutrient and reagent exchange while retaining large proteins and DNA, maintaining a direct link between genotype and phenotype (Fig. 1b). PicoShells seamlessly integrate into any screening workflow, facilitating multi-step processes, compartmentalized storage, and exposure of cells to assay conditions—making them ideal for function-based sorting using flow cytometry (Fig. 1c, 1d). In comparison, traditional automation tools like microwell plates screen only 10³-10⁴ variants per week, consuming substantial lab space and resources for larger campaigns. In contrast, PicoShells can screen millions of clones in a single day at ~100s clones per second with reduced footprint, dramatically accelerating discovery and enabling quantitative assay readouts in biologically relevant contexts. We applied PicoShells combined with flow cytometry to explore the functional diversity of GCaMP – a widely used protein biosensor for calcium sensing (Fig. 1e). By mutating the linkers connecting the binding domains to the circularly permuted GFP, we generated a library of over a million variants, spanning from non-functional sensors to highly responsive biosensors with minimal background and a broad dynamic range. We hypothesized that saturating the linker regions would yield a functionally diverse library, increasing the likelihood of identifying rare, high-performance mutations. After expressing these variants in E. coli and encapsulating them in PicoShells, we employed a three-stage sorting workflow, sequentially screening for brightness, dynamic response, and sensor reversibility in different calcium environments (Fig. 1f).
Results, Conclusions, and Discussions:: Through high-throughput sorting of over one million colonies, we identified biosensor variants exhibiting up to a 10-fold improvement in dynamic response relative to the original GCaMP (Fig. 1g-j). We identified unique motifs of amino acids that resulted in optimum fluorescence sensor properties such as brightness and dynamic response (Fig.1i). Furthermore, functional screening revealed distinct amino acid motifs that correlated with high-performance traits such as low background fluorescence, high sensitivity, and rapid reversibility. Variants were classified into functionally distinct groups, including non-fluorescent, calcium-insensitive, and highly responsive sensors, generating a detailed sequence-to-function map using t-SNE–like projections (Fig. 2). Clusters were analyzed to derive sequence logos, revealing key amino acid residues and compositions associated with specific functional outcomes. We further calculated log₂ enrichment scores for each variant and assessed pairwise appearance patterns across enriched and depleted libraries, creating a comprehensive database of profiled mutations. This enabled classification of mutations as functionally effective or ineffective in driving biosensor response. The resulting large-scale dataset supports the training of machine learning models to predict biosensor performance and guide future engineering through active learning. Our results demonstrate that PicoShells are a powerful laboratory automation tool for scalable, function-based screening, enabling rapid discovery and optimization of synthetic protein biosensors. This approach lays the foundation for data-driven, predictive biosensor design and offers broad utility across synthetic biology, diagnostics, and therapeutic development.
Acknowledgements and/or References (Optional):: This work was supporting by NSF Division of Biological Infrastructure award # #2337423.
Reference. [1] van Zee, M. et al. (2022). Proc. Natl. Acad. Sci. U. S. A., 119. [2] Dana et al. (2019) Nature Methods, 16, 649–657.