In Vivo CRISPR Screens - Analysis Walkthrough with Explanations
In Vivo CRISPR Screens - Analysis Walkthrough with Explanations
In recent years, in vivo CRISPR screens have become an excellent tool for the unbiased discovery of putative immunotherapy targets. The basic concept of such screens is that differing levels of immune pressure in mice xenograft models will selectively enrich or deplete specific genome edits which have been introduced through the transfection of a pool of CRISPR sgRNAs. This pool of guides constitutes a CRISPR screen library, a set of guides which will introduce the aforementioned edits into the cancer cell line to be engrafted. If the introduced edits are predicted to cause serious damage to the target genes, then the genes may be considered “knocked down” and the screen is called a CRISPRko screen. Alternative variations exist such as CRISPR “activation” (CRISPRa) screens, where transcription factors are pulled towards the promoter/transcription start site of the target genes, thus enhancing expression.

Many in vivo CRISPR screens have been completed in the literature - however I will focus on this one for this walkthrough. The CRISPR screens in this paper were conducted primarily by Rob Manguso, published in 2017 in Nature, and are notable for both their novelty at the time of their publication, and for their identification of ptpn2 as a key immunotherapy target.
From a bioinformatics perspective - the analysis of a CRISPR screen can be broken down into a number of steps:
- Data collection - attaining the raw sequencing data, often in bcl or fastq format.
- Poolq - generating a matrix of counts per barcode and sample.
- Preprocessing - evaluating inital statistics as quality control measures and calculating log-fold-changes (LFCs).
- Significance calculations - applying the hypergeometric test or STARS to LFC data.
These steps are reflective of what is currently standard practice in the Manguso lab at the Broad Institute as of March 2026, but these steps are flexible and there is a variety of tools available - a review of some approaches can be found here (the most notable tool being MAGeCK). That said, in this tutorial I will approach the analysis from the ground-up, relying on only basic Python libraries such as Pandas, in order to demonstrate a complete analysis pipeline.
Step 1: Data Collection
In this first step, I will collect data for one of the central screens in the paper, with results presented in figure 1.