Application

Multimodal

Application example for multimodal data using SIGNALseq, which produces separate sequencing libraries for both RNA and protein modalities via combinatorial indexing. In this method, protein levels are captured using DNA barcoded antibodies and transcripts through mRNA reverse transcription. Both modalities are processed into distinct FASTQ files, but share a common barcode pattern: the forward read encodes the feature identity and the reverse read encodes the single-cell identity and Unique Molecular Identifier (UMI).

Download the SRA files for both modalities and extract the paired-end FASTQ reads:

# Download SRA files for Protein and Transcript
wget "https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR28056728/SRR28056728" -O SRR28056728.sra
wget "https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR28056729/SRR28056729" -O SRR28056729.sra

# Extract paired-end FASTQ files
fastq-dump --split-files --gzip SRR28056728.sra
fastq-dump --split-files --gzip SRR28056729.sra

Set up:

To process both modalities, separate ESGI-initialization files must be created. Both modalities share a pattern structure consisting of eight elements, but they differ in how the feature identity is encoded in the forward read.

Forward read: positional element 0

The forward read contains the first pattern element and encodes the feature identity.

  • Protein modality: The first element is a DNA barcode and is followed by a poly-A tail, causing a sequence overlap between the forward and reverse reads.
  • RNA modality: The first element is the genomic transcript sequence. The forward read terminates at the end of the RNA sequence, resulting in no overlap with the reverse read.

Read transition: positional element 1

Both modalities use the read separator [-] to separate the forward and reverse reads.

Reverse read: positional elements 2-7

For both modalities, the reverse read contains the remaining six pattern elements encoding the single-cell identity and UMI.

Element Index Type Encoding
2, 4, 6 Barcode element Single-cell identities
3, 5 Constant element anchors or linkers
7 Random element Unique Molecule Identifier (UMI)

The following eight bracket-enclosed sequence substrings illustrate patterns for both modalities. Each bracket represents a specific pattern element, containing a comma-separated list of possible barcodes for that position. In this pattern, the constant elements have been replaced with 22 and 30 random bases, respectively, to bypass any alignment constraints. This can be useful when strict mapping is unnecessary or when the sequencing quality is suboptimal.

Example of pattern_PROTEIN.txt file:

PROTEIN:[Antibodies.txt][*][BC1.txt][22X][BC2.txt][30X][BC2.txt][10X]

Example of pattern_RNA.txt file:

DNA:[DNA][-][BC1.txt][22X][BC2.txt][30X][BC2.txt][10X]
File name Element length (bases) Number of subsequences Sequence example
Antibodies.txt 15 23 AAGGCAGACGGTGCA,GGCTGCGCACCGCCT,CGTCCTAGGACATAT
BC1.txt 8 96 TTACGAGT,TATCGTTT,CGAGGTAA
BC2.txt 8 96 ATCACGTT,CGATGTTT,TTAGGCAT

Example of a barcode-aligned sequence for the protein modality:

[Antibodies.txt] [BC1.txt]           [22X]          [BC2.txt]              [30X]              [BC2.txt]   [10X]  
AGACAGTGATGTCCG  CCGATCCC   ATCCACGTGCTTGAGACTGTGG  TTAGGCAT  GTGGCCGATGTTTCGCATCGGCGTACGACT  TAACGCTG  TAAAGGAAGT

The maximum allowed mismatches has to be defined as a comma-separated list of integers, where each value maps to a positional element in the pattern. For the protein modality, the feature-encoding barcode is assigned a tolerance of one mismatch, whereas the RNA modality allows none. Both modalities, accept one mismatch for the barcode elements. For the random pattern elements we set the tolerance to zero, although they are not used.

Example of mismatches_PROTEIN.txt file:

1,0,1,0,1,0,1,0

Example of mismatches_RNA.txt file:

0,0,1,0,1,0,1,0

Now, we have defined all structural parameters to create the ESGI-initialization files for both modalities:myExperiment_PROTEIN.ini and myExperiment_RNA.ini.

Both configuration files use the element indexes for feature identities, single-cell IDs, and UMIs from the table, along with the patterns and mismatch information illustrated in the blue example blocks.

myExperiment_PROTEIN.ini:

Path_data = "/path/to/raw_data"
# Includes FASTQ files of forward and reverse reads

Path_background_data = "/path/to/background_data"
# .txt files for barcode pattern, mismatches, annotation

Path_output = "/path/to/output"

# Forward and reverse reads:
forward="${Path_data}/SRR28056728_1.fastq.gz"
reverse="${Path_data}/SRR28056728_2.fastq.gz"

pattern="${Path_background_data}/pattern_PROTEIN.txt"
mismatches="${Path_background_data}/mismatches_PROTEIN.txt"

# Indexing for elements encoding: feature, single-cell ID and UMI:
FEATURE_ID=0
SC_ID=2,4,6
UMI_ID=7

FEATURE_NAMES="${Path_background_data}/antibody_names.txt"
ANNOTATION_IDs="${Path_background_data}/BC1.txt"

threads=10
prefix=MYEXPERIMENT

For the RNA modality, genomic sequences are aligned to the human reference genome (GRCh38) using the STAR aligner.

First, download the GRCh38 primary assembly and the corresponding GENCODE annotation files.

mkdir GRCh38
cd GRCh38

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/GRCh38.primary_assembly.genome.fa.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz
gunzip *.gz

Next, generate the STAR genome index:

STAR --runThreadN 70 \
    --runMode genomeGenerate \
    --genomeDir GRCh38_STAR_index \
    --genomeFastaFiles GRCh38.primary_assembly.genome.fa \
    --sjdbGTFfile gencode.v43.annotation.gtf \
    --sjdbOverhang 73

To integrate the reference genome with ESGI, the initialization file requires an additional parameter defining the directory path to the STAR genome index.

During processing, STAR appends the annotated transcripts to the remaining demultiplexed barcode reads. As a result, the feature identity index (FEATURE_ID) is positioned at the end of the pattern, becoming the last pattern element index + 1.

myExperiment_RNA.ini:

Path_data = "/path/to/raw_data"
# Includes FASTQ files of forward and reverse reads

Path_background_data = "/path/to/background_data"
# .txt files for barcode pattern, mismatches, annotation

Path_output = "/path/to/output"

# Forward and reverse reads:
forward="${Path_data}/SRR28056729_1.fastq.gz"
reverse="${Path_data}/SRR28056729_2.fastq.gz"

pattern="${Path_background_data}/pattern_RNA.txt"
mismatches="${Path_background_data}/mismatches_RNA.txt"

# Indexing for elements encoding: feature, single-cell ID and UMI:
FEATURE_ID=8
SC_ID=2,4,6
UMI_ID=7

genomeDir="/path/to/GRCh38_STAR_index"/

threads=10
prefix=MYEXPERIMENT

Once the configuration files are ready, you can initiate the process for each modality by running ESGI from the terminal.

For the protein modality, execute the following command:

./bin/esgi myExperiment_PROTEIN.ini

For the RNA modality, run this command:

./bin/esgi myExperiment_RNA.ini

Multipattern

Application example for multipattern data, using scIDseq. This technology quantifies intracellular protein abundances using antibodies conjugated to unique DNA barcodes. The approach uses a multipattern design where staggers of varying lengths, in combination with barcode elements, encode the protein identity.

Set up:

ESGI can demultiplex reads for multiple barcode patterns simultaneously. This specific dataset includes eight distinct barcode patterns defined by different stagger lengths, ranging from one to eight bases. To maintain the same total sequence length, the final positional element varies inversely with this stagger length.

The barcode patterns are contained entirely within the forward read and all consist of six pattern elements. The table below outlines the element indexes and what they encode for.

Element Index Type Encoding
0 Constant element Stagger
1 Random element Unique Molecule Identifier (UMI)
2 Barcode element Feature identities
3,5 Constant element Linker
4 Barcode element Well-plate position

The multipattern design contains specific and shared barcodes elements:

  • Feature identity (element 2): each pattern has an unique stagger (length and sequence) and is associated with a unique set of barcode sequences that together encode the protein identity.
  • Well plate position (element 4): universal barcode set to ensure consistent assignment of well-plate positions.

Each pattern is assigned an unique name, with its pattern elements represented as a series of bracket enclosed substrings. Within each set of brackets is a comma-separated list of all possible barcodes for that specific position. In this example, the constant elements defining the linkers are replaced with random bases to bypass any alignment constraints.

Example of patterns.txt file:

PATTERN_1:[][15X][Ab1_barcodes.txt][20X][wellbarcodes.txt][7X]
PATTERN_2:[][15X][Ab2_barcodes.txt][20X][wellbarcodes.txt][6X]
PATTERN_3:[][15X][Ab3_barcodes.txt][20X][wellbarcodes.txt][5X]
PATTERN_4:[][15X][Ab4_barcodes.txt][20X][wellbarcodes.txt][4X]
PATTERN_5:[][15X][Ab5_barcodes.txt][20X][wellbarcodes.txt][3X]
PATTERN_6:[][15X][Ab6_barcodes.txt][20X][wellbarcodes.txt][2X]
PATTERN_7:[][15X][Ab7_barcodes.txt][20X][wellbarcodes.txt][1X]
PATTERN_8:[][15X][Ab8_barcodes.txt][20X][wellbarcodes.txt][1X]

For each pattern, the maximum number of allowed mismatches per element is defined using a comma-separated list. Each integer in the list corresponds to a specific positional element.

Example of mismatches.txt file:

0,0,1,0,1,0
0,0,1,0,1,0
1,0,1,0,1,0
1,0,1,0,1,0
1,0,1,0,1,0
1,0,1,0,1,0
1,0,1,0,1,0
1,0,1,0,1,0

To integrate the raw forward reads with the pattern and mismatch information, we create the ESGI-initialization file, myExperiment.ini. The configuration uses the table’s element indexes for feature identity, well position, and UMIs, along with pattern and mismatch information as illustrated in the blue example blocks.

Path_data = "/path/to/raw_data"
# Includes FASTQ files of forward reads for all plates

Path_background_data = "/path/to/background_data"
# .txt files for barcode pattern, mismatches, annotation

Path_output = "/path/to/output"

# Forward read:
forward="${Path_data}/plate.fastq.gz"

pattern="${Path_background_data}/patterns.txt"
mismatches="${Path_background_data}/mismatches.txt"

# Indexing for elements encoding: feature, single-cell ID and UMI:
FEATURE_ID=2
SC_ID=4
UMI_ID=1

threads=10
prefix=MYEXPERIMENT

Execute ESGI by running the following command in your terminal:

./bin/esgi myExperiment.ini

Spatial

Application example for spatial data using Multiplexed Deterministic Barcoding in Tissue (xDBiT). This technology uses microfluidic-based deterministic barcoding with DNA oligonucleotides to encode transcriptomes alongside their spatial coordinates for multiple tissue sections in parallel.

Set up:

The xDBiT barcode pattern consists of eight pattern elements. The data is generated as two independent forward reads. The first forward read captures the transcript and the second forward read encodes the UMI and spatial (x,y) coordinates.

Forward read I: positional element 0

The read contains the first pattern element, a genomic DNA sequence encoding the transcript.

Read transition: positional element 1

A discrete transition separating the two forward reads without sequence overlap. By including the [-] symbol in the pattern and enabling the independent flag, the tool treats both reads as two distinct sequences in the 5'→3' direction.

Forward read II: positional elements 2-5

The read contains the remaining five pattern elements, encoding the (x,y) spatial coordinates and UMI.

Element Index Type Encoding
0 Genomic sequence Transcript identity
2 Random element UMI
3,5 Barcode element (x,y) spatial coordinates
4,6 Constant element Linkers or anchors

The barcode pattern is represented as seven bracket-encloded sequence substrings. Each bracket corresponds to a positional element and contains a comma-separated list of possible barcodes for that position. The constant elements have been replaced with 30 random bases to bypass any alignment contraints.

Example of patterns.txt file:

SPATIAL:[RNA][-][10X][coordinate_barcode.txt][30X][coordinate_barcode.txt][30X]

The coordinate_barcode.txt file defines the (x,y) spatial coordinates using an 8x12 matrix of 96 unique 8-base long barcode sequences.

Example of mismatches.txt file:

0,0,0,1,0,1,0

Mouse reference genome The first pattern element contains the transcript and will be aligned to the mouse reference genome (GRCm38) using the STAR aligner. Follow the instruction below to download the GRCm38 primary assembly and the corresponding GENCODE annotation files.

mkdir GRCm38
cd GRCm38

wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/GRCm38.primary_assembly.genome.fa.gz -P data/GRCm38
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/gencode.vM25.annotation.gtf.gz -P data/GRCm38
gunzip *.gz

Next, generate the STAR genome index:

STAR --runThreadN 70 \
    --runMode genomeGenerate \
    --genomeDir GRCh38_STAR_index \
    --genomeFastaFiles GRCh38.primary_assembly.genome.fa \
    --sjdbGTFfile gencode.v43.annotation.gtf \
    --sjdbOverhang 73

The ESGI-initialization file requires the directory path to the STAR genome index and all the pattern details described above.

Note that because STAR appends the annotated transcript to the remaining set of demultiplexed barcode elements, the feature identity,FEATURE_ID, will get an index equal to the final pattern element index +1.

myExperiment.ini:

Path_data = "/path/to/raw_data"
# Includes FASTQ files of the two forward reads

Path_background_data = "/path/to/background_data"
# .txt files for barcode pattern, mismatches, annotation

Path_output = "/path/to/output"

# Forward read:
forward="${Path_data}/SRR20073555_1.fastq.gz"
reverse="${Path_data}/SRR20073555_2.fastq.gz"

pattern="${Path_background_data}/patterns.txt"
mismatches="${Path_background_data}/mismatches.txt"

# Indexing for elements encoding: feature, single-cell ID and UMI:
FEATURE_ID=7
SC_ID=3,5

ANNOTATION_IDs=3,5
ANNOTATION_NAMES="${Path_background_data}/xAnnotation.txt","${Path_background_data}/yAnnotation.txt"

genomeDir="/path/to/GRCm38_STAR_index/"
feature=GN

UMI_ID=2
hamming=1

independent=1

threads=10
prefix=MYEXPERIMENT

Finally, execute ESGI using the command line below:

./bin/esgi myExperiment.ini

Refences

  1. Opzoomer, J. W. et al. SIGNAL-seq: Multimodal Single-cell Inter- and Intra-cellular Signalling Analysis (2024).
  2. Krämer, N. et al. Cell-state specific drug-responses are associated with differences in signaling network wiring. Mol. Cell. Proteomics 101529 (2026).
  3. Wirth, J., Huber, N., Yin, K. et al. Spatial transcriptomics using multiplexed deterministic barcoding in tissue. Nat Commun 14, 1523 (2023).