{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/pachterlab/seqspec/blob/devel/docs/UNIFORM.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "---\n",
        "title: Using seqspec\n",
        "date: 2024-09-07\n",
        "authors:\n",
        "  - name: A. Sina Booeshaghi\n",
        "---"
      ],
      "metadata": {
        "id": "hgPw9rQUgbxm"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "`seqspec` enables uniform preprocessing of sequencing reads.\n",
        "\n",
        "# Single-cell preprocessing\n",
        "Single-cell data preprocessing is the procedure where\n",
        "\n",
        "1. Sequencing reads are aligned to a reference\n",
        "2. Barcodes errors are corrected\n",
        "3. UMIs/reads are counted\n",
        "\n",
        "The goal is to produce a count matrix, where rows are cells or samples and columns are biological features such as genes, proteins, or genomic regions.\n",
        "\n",
        "There are many tools that perform single-cell RNA-sequencing preprocessing. For this tutorial we will use `kb-python` (which uses `kallisto` and `bustools`), `STARsolo`, `simpleaf` with `seqspec` to perform alignment and quantification. `kb_python` uses `kallisto` to perform read alignment and `bustools` to perform and barcode correction and UMI counting. `STARsolo` performs performs whole genome alignment and barcode error correction. Like `kb-python`, `simpleaf` uses two separate tools under the hood: `salmon` to perform read alignment and `alevin-fry` to perform barcode error correction and UMI counting.\n",
        "\n",
        "Throughout this tutorial we will use the `dogmaseq-dig` dataset which is a multimodal assay (RNA/ATAC/PROTEIN/TAG). The `seqspec` for this dataset can be found here\n"
      ],
      "metadata": {
        "id": "U8WfUXXNgf44"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Install tools"
      ],
      "metadata": {
        "id": "h7sDNjCpgq0_"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "To understand how each tool works, please review their code and manuscript:\n",
        "\n",
        "| Tool | Code link | Manuscript link | Purpose |\n",
        "|------|-----------|-----------------|---------|\n",
        "| seqspec | [GitHub](https://github.com/pachterlab/seqspec) | [doi](https://doi.org/10.1093/bioinformatics/btae168) | Identify and extract elements in reads |\n",
        "| kb-python | [GitHub](https://github.com/pachterlab/kb_python) | [doi](https://doi.org/10.1101/2023.11.21.568164) | Perform read alignment, error correction, and counting |\n",
        "| gget | [GitHub](https://github.com/pachterlab/gget) | [doi](https://doi.org/10.1093/bioinformatics/btac836) | Fetch species-specific references |\n",
        "| kallisto | [GitHub](https://github.com/pachterlab/kallisto) | [doi](https://doi.org/10.1038/nbt.3519) | Perform read alignment (used in kb-python) |\n",
        "| bustools | [GitHub](https://github.com/BUStools/bustools) | [doi](https://doi.org/10.1038/s41587-021-00870-2) | Perform barcode error correction and UMI counting (used in kb-python) |\n",
        "| BUS file | [GitHub](https://github.com/BUStools/BUS-format) | [doi](https://doi.org/10.1093/bioinformatics/btz279) | Store barocdes, umis, and read alignments (used in kb-python) |\n",
        "| STARsolo | [GitHub](https://github.com/alexdobin/STAR/) | [doi](https://doi.org/10.1101/2021.05.05.442755) | Perform read alignment, error correction, and counting |\n",
        "| simpleaf | [GitHub](https://github.com/COMBINE-lab/simpleaf) | [doi](https://doi.org/10.1093/bioinformatics/btad614) | Perform read alignment, error correction, and counting |"
      ],
      "metadata": {
        "id": "TN5UvCwzhGh8"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Install kb-python, seqspec, gget\n",
        "! pip install --quiet kb-python gget > /dev/null 2>&1  # installing kb-python autoinstalls kallisto and bustools\n",
        "! pip install --quiet git+https://github.com/pachterlab/seqspec@devel > /dev/null 2>&1\n",
        "\n",
        "# Verify installations\n",
        "! seqspec --version\n",
        "! kb --version\n",
        "! gget --version\n",
        "\n",
        "# Install STARsolo and verify installation\n",
        "! wget --quiet --show-progress https://github.com/alexdobin/STAR/archive/2.7.11b.tar.gz\n",
        "! tar -xzf 2.7.11b.tar.gz > /dev/null 2>&1\n",
        "! mv /content/STAR-2.7.11b/bin/Linux_x86_64/STAR /usr/bin\n",
        "! STAR --version\n",
        "\n",
        "# Install alevin-fry, simpleaf and verify installation\n",
        "! curl -S --proto '=https' --tlsv1.2 -LsSf https://github.com/COMBINE-lab/alevin-fry/releases/download/v0.10.0/alevin-fry-installer.sh | sh > /dev/null 2>&1\n",
        "! $HOME/.cargo/bin/alevin-fry --version\n",
        "\n",
        "! curl -S --proto '=https' --tlsv1.2 -LsSf https://github.com/COMBINE-lab/simpleaf/releases/download/v0.17.2/simpleaf-installer.sh | sh > /dev/null 2>&1\n",
        "%env ALEVIN_FRY_HOME=\"$HOME/.cargo/bin/alevin-fry\"\n",
        "! $HOME/.cargo/bin/simpleaf --version\n",
        "\n",
        "!wget --quiet --show-progress https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64\n",
        "!chmod +x jq-linux-amd64\n",
        "!mv jq-linux-amd64 /usr/bin/jq"
      ],
      "metadata": {
        "id": "4EWVWe1hgpAG",
        "outputId": "b8a9391b-71ec-47c3-b141-0e94e915d651",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "seqspec 0.2.0\n",
            "usage: kb [-h] [--list] <CMD> ...\n",
            "\n",
            "kb_python 0.28.2\n",
            "\n",
            "positional arguments:\n",
            "  <CMD>\n",
            "    info      Display package and citation information\n",
            "    compile   Compile `kallisto` and `bustools` binaries from source\n",
            "    ref       Build a kallisto index and transcript-to-gene mapping\n",
            "    count     Generate count matrices from a set of single-cell FASTQ files\n",
            "\n",
            "options:\n",
            "  -h, --help  Show this help message and exit\n",
            "  --list      Display list of supported single-cell technologies\n",
            "gget version: 0.28.6\n",
            "2.7.11b.tar.gz          [        <=>         ]  11.89M  7.50MB/s    in 1.6s    \n",
            "2.7.11b\n",
            "alevin-fry 0.10.0\n",
            "env: ALEVIN_FRY_HOME=\"$HOME/.cargo/bin/alevin-fry\"\n",
            "simpleaf 0.17.2\n",
            "jq-linux-amd64      100%[===================>]   2.21M  --.-KB/s    in 0.02s   \n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Download `seqspec` for the `dogmaseq-dig` data"
      ],
      "metadata": {
        "id": "ccdcR8Nlj9XJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "! wget --quiet --show-progress https://raw.githubusercontent.com/pachterlab/seqspec/devel/examples/specs/dogmaseq-dig/spec.yaml"
      ],
      "metadata": {
        "id": "hoDCIDwhkEKU",
        "outputId": "b167779c-568a-4682-f537-4fbe2b3f939b",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\rspec.yaml             0%[                    ]       0  --.-KB/s               \rspec.yaml           100%[===================>]  14.09K  --.-KB/s    in 0s      \n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "! seqspec print spec.yaml"
      ],
      "metadata": {
        "id": "RIEiCVzNom8_",
        "outputId": "fe89372f-9236-4d64-88ef-c5960c407993",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "                                                                  ┌─'ghost_protein_truseq_read1:0'\n",
            "                                                                  ├─'protein_truseq_read1:33'\n",
            "                                                                  ├─'protein_cell_bc:16'\n",
            "                                 ┌─protein────────────────────────┤\n",
            "                                 │                                ├─'protein_umi:12'\n",
            "                                 │                                ├─'protein_seq:15'\n",
            "                                 │                                └─'protein_truseq_read2:34'\n",
            "                                 │                                ┌─'tag_truseq_read1:33'\n",
            "                                 │                                ├─'tag_cell_bc:16'\n",
            "                                 ├─tag────────────────────────────┼─'tag_umi:12'\n",
            "                                 │                                ├─'tag_seq:15'\n",
            "─────────────────────────────────┤                                └─'tag_truseq_read2:34'\n",
            "                                 │                                ┌─'rna_truseq_read1:33'\n",
            "                                 │                                ├─'rna_cell_bc:16'\n",
            "                                 ├─rna────────────────────────────┼─'rna_umi:12'\n",
            "                                 │                                ├─'cdna:102'\n",
            "                                 │                                └─'rna_truseq_read2:34'\n",
            "                                 │                                ┌─'atac_truseq_read1:33'\n",
            "                                 │                                ├─'gDNA:100'\n",
            "                                 └─atac───────────────────────────┼─'atac_truseq_read2:34'\n",
            "                                                                  ├─'spacer:8'\n",
            "                                                                  └─'atac_cell_bc:16'\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "! seqspec file -m rna -f json -s file -k all spec.yaml"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "8AUi7mH31BiL",
        "outputId": "010c4942-27e2-4623-b7c1-86504e2ef4df"
      },
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[\n",
            "    {\n",
            "        \"file_id\": \"rna_R1_SRR18677638.fastq.gz\",\n",
            "        \"filename\": \"rna_R1_SRR18677638.fastq.gz\",\n",
            "        \"filetype\": \"fastq\",\n",
            "        \"filesize\": 18499436,\n",
            "        \"url\": \"https://github.com/pachterlab/seqspec/raw/devel/examples/specs/dogmaseq-dig/fastqs/rna_R1_SRR18677638.fastq.gz\",\n",
            "        \"urltype\": \"https\",\n",
            "        \"md5\": \"7eb15a70da9b729b5a87e30b6596b641\"\n",
            "    },\n",
            "    {\n",
            "        \"file_id\": \"rna_R2_SRR18677638.fastq.gz\",\n",
            "        \"filename\": \"rna_R2_SRR18677638.fastq.gz\",\n",
            "        \"filetype\": \"fastq\",\n",
            "        \"filesize\": 45812569,\n",
            "        \"url\": \"https://github.com/pachterlab/seqspec/raw/devel/examples/specs/dogmaseq-dig/fastqs/rna_R2_SRR18677638.fastq.gz\",\n",
            "        \"urltype\": \"https\",\n",
            "        \"md5\": \"5e6915770e50f72e462e5b2575089c66\"\n",
            "    },\n",
            "    {\n",
            "        \"file_id\": \"RNA-737K-arc-v1.txt\",\n",
            "        \"filename\": \"RNA-737K-arc-v1.txt\",\n",
            "        \"filetype\": \"txt\",\n",
            "        \"filesize\": 2142553,\n",
            "        \"url\": \"https://github.com/pachterlab/qcbc/raw/main/tests/10xMOME/RNA-737K-arc-v1.txt.gz\",\n",
            "        \"urltype\": \"https\",\n",
            "        \"md5\": \"a88cd21e801ae6f9a7d9a48b67ccf693\"\n",
            "    }\n",
            "]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "! seqspec file -m rna -f json -s file -k all spec.yaml     | jq '.[].url' | xargs wget --continue --quiet --show-progress\n",
        "! seqspec file -m atac -f json -s file -k all spec.yaml    | jq '.[].url' | xargs wget --continue --quiet --show-progress\n",
        "! seqspec file -m tag -f json -s file -k all spec.yaml     | jq '.[].url' | xargs wget --continue --quiet --show-progress\n",
        "! seqspec file -m protein -f json -s file -k all spec.yaml | jq '.[].url' | xargs wget --continue --quiet --show-progress"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "c1ZjLfnb1EL2",
        "outputId": "d87fb519-e67d-448e-df2b-2ca5b74e7452"
      },
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "rna_R1_SRR18677638. 100%[===================>]  17.64M  --.-KB/s    in 0.07s   \n",
            "rna_R2_SRR18677638. 100%[===================>]  43.69M   270MB/s    in 0.2s    \n",
            "RNA-737K-arc-v1.txt 100%[===================>]   2.04M  --.-KB/s    in 0.03s   \n",
            "atac_R1_SRR18677642 100%[===================>]  38.33M   165MB/s    in 0.2s    \n",
            "atac_R2_SRR18677642 100%[===================>]  20.01M  --.-KB/s    in 0.1s    \n",
            "atac_R3_SRR18677642 100%[===================>]  34.88M  --.-KB/s    in 0.1s    \n",
            "ATA-737K-arc-v1.txt 100%[===================>]   2.35M  --.-KB/s    in 0.03s   \n",
            "tag_R1_SRR18677640. 100%[===================>]  17.20M  --.-KB/s    in 0.07s   \n",
            "tag_R2_SRR18677640. 100%[===================>]   7.13M  --.-KB/s    in 0.05s   \n",
            "RNA-737K-arc-v1.txt 100%[===================>]   2.04M  --.-KB/s    in 0.03s   \n",
            "tag_feature_barcode 100%[===================>]     208  --.-KB/s    in 0s      \n",
            "protein_R1_SRR18677 100%[===================>]  17.33M  --.-KB/s    in 0.1s    \n",
            "protein_R2_SRR18677 100%[===================>]   8.98M  --.-KB/s    in 0.05s   \n",
            "RNA-737K-arc-v1.txt 100%[===================>]   2.04M  --.-KB/s    in 0.03s   \n",
            "protein_feature_bar 100%[===================>]   4.55K  --.-KB/s    in 0s      \n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "! gunzip *.txt.gz"
      ],
      "metadata": {
        "id": "SmUey5ga1xnB"
      },
      "execution_count": 14,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Single-cell/nuclei RNAseq quantification"
      ],
      "metadata": {
        "id": "zhQsQD42giYi"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### `kb-python (kallisto bustools)`"
      ],
      "metadata": {
        "id": "joQ-Vzgagi_9"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# seqspec commands to get onlist, technology string, and file\n",
        "! seqspec index -m rna -t kb -s file spec.yaml\n",
        "! seqspec file -m rna -s region -k filename spec.yaml\n",
        "! seqspec file -m rna -s read -f paired -k filename spec.yaml  | tr \"\\t\\n\" \"  \""
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UmwOAvSu2hik",
        "outputId": "a9851d77-485e-419d-e1a7-8ebdc48cda07"
      },
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "0,0,16:0,16,28:1,0,102\n",
            "RNA-737K-arc-v1.txt\n",
            "rna_R1_SRR18677638.fastq.gz rna_R2_SRR18677638.fastq.gz "
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lIYdn1woOS1n"
      },
      "outputs": [],
      "source": [
        "# standard reference\n",
        "! kb ref \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-f1 transcriptome.fa \\\n",
        "$(gget ref --ftp -w dna,gtf homo_sapiens)\n",
        "\n",
        "# standard quantification\n",
        "! kb count \\\n",
        "--h5ad \\\n",
        "-t 16 \\\n",
        "-m 32G \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-o kb_out \\\n",
        "-x $(seqspec index -m rna -t kb -s file spec.yaml) \\\n",
        "-w $(seqspec file -m rna -s region -k filename spec.yaml) \\\n",
        "$(seqspec file -m rna -s read -f paired -k filename spec.yaml  | tr \"\\t\\n\" \"  \")"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# spliced, unspliced, ambiguous reference\n",
        "! kb ref \\\n",
        "--workflow nac \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-f1 spl.fa \\\n",
        "-f2 unspl.fa \\\n",
        "-c1 spl.t2c.txt \\\n",
        "-c2 unspl.t2c.txt \\\n",
        "$(gget ref --ftp -w dna,gtf homo_sapiens)\n",
        "\n",
        "# spliced, unspliced, ambiguous quantification\n",
        "! kb count \\\n",
        "--h5ad \\\n",
        "--workflow=nac \\\n",
        "-t 32 \\\n",
        "-m 64G \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-c1 spl.t2c.txt \\\n",
        "-c2 unspl.t2c.txt \\\n",
        "-o kb_out_nac \\\n",
        "-x $(seqspec index -m rna -t kb -s file spec.yaml) \\\n",
        "-w $(seqspec file -m rna -s region -k filename spec.yaml) \\\n",
        "$(seqspec file -m rna -s read -f paired -k filename spec.yaml  | tr \"\\t\\n\" \"  \")"
      ],
      "metadata": {
        "id": "TpfdS7oS4bFy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### `STARsolo`"
      ],
      "metadata": {
        "id": "ZGYN-d6245Jq"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# download reference\n",
        "## todo\n",
        "\n",
        "# run quantification\n",
        "! star \\\n",
        "--soloFeatures Gene \\\n",
        "--genomeDir index \\\n",
        "--soloType Droplet \\\n",
        "--soloCBwhitelist \\\n",
        "$(seqspec file -m rna -s region -k filename spec.yaml) \\\n",
        "$(seqspec index -m rna -t starsolo -s file spec.yaml) \\\n",
        "--readFilesIn $(seqspec file -m rna -s read -f paired -k filename spec.yaml  | tr \"\\t\\n\" \"  \")"
      ],
      "metadata": {
        "id": "QWN3QgbA4-vo"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "! seqspec file -m rna -s read -f paired -k filename spec.yaml | awk '{print \"-1 \"$1\" -2 \"$2}'"
      ],
      "metadata": {
        "id": "4BIrMTgH58Tk",
        "outputId": "5454b71b-03fa-4da2-b2a3-d111a576ec1e",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "-1 rna_R1_SRR18677638.fastq.gz -2 rna_R2_SRR18677638.fastq.gz\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### `simpleaf`"
      ],
      "metadata": {
        "id": "K6D0Gyn45ZpL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "! mkdir -p simpleaf_ref\n",
        "\n",
        "# Download reference genome and gene annotations\n",
        "! wget -qO- https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz | tar xzf - --strip-components=1 -C ./simpleaf_ref\n",
        "\n",
        "# simpleaf index\n",
        "! simpleaf index \\\n",
        "--output ./out \\\n",
        "--fasta ./simpleaf_ref/fasta/genome.fa \\\n",
        "--gtf ./simpleaf_ref/genes/genes.gtf \\\n",
        "--rlen 91 \\\n",
        "--threads 16 \\\n",
        "--use-piscem  # remove this if missing piscem\n",
        "\n",
        "! simpleaf quant \\\n",
        "-r cr-like \\\n",
        "-i simpleaf_ref/ \\\n",
        "-m t2g.txt \\\n",
        "-c $(seqspec index -m rna -t simpleaf -s file spec.yaml) \\\n",
        "-o out/ -x $w \\\n",
        "$(seqspec file -m rna -s read -f paired -k filename spec.yaml | awk '{print \"-1 \"$1\" -2 \"$2}')"
      ],
      "metadata": {
        "id": "n-EtzzQU5cTX"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Single-cell/nuclei TAG quantification"
      ],
      "metadata": {
        "id": "OGc3z4a96dxL"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### `kb-python (kallisto bustools)`"
      ],
      "metadata": {
        "id": "Ts3Nst9W6eni"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# build alignment reference\n",
        "kb ref \\\n",
        "--workflow kite \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-f1 transcriptome.fa \\\n",
        "tag_feature_barcodes.txt\n",
        "\n",
        "w=$(seqspec onlist -m tag -o onlist.txt -s region-type -i barcode spec.yaml)\n",
        "x=$(seqspec index -m tag -t kb -s file spec.yaml)\n",
        "f=$(seqspec file -m tag -s read -f paired -k url spec.yaml  | tr \"\\t\\n\" \"  \")\n",
        "\n",
        "# perform alignment, error correction, and counting\n",
        "kb count \\\n",
        "--workflow kite \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-x $x \\\n",
        "-w $w \\\n",
        "-o out --h5ad -t 2 \\\n",
        "$f"
      ],
      "metadata": {
        "id": "xGhxxFDI6ePO"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Single-cell/nuclei PROTEIN quantification"
      ],
      "metadata": {
        "id": "S9gJjEVE6ltZ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### `kb-python` (kallisto bustools)"
      ],
      "metadata": {
        "id": "Tfp3h6y06ob7"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# build alignment reference\n",
        "kb ref \\\n",
        "--workflow kite \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-f1 transcriptome.fa \\\n",
        "protein_feature_barcodes.txt\n",
        "\n",
        "w=$(seqspec onlist -m protein -o onlist.txt -s region-type -i barcode spec.yaml)\n",
        "x=$(seqspec index -m protein -t kb -s file spec.yaml)\n",
        "f=$(seqspec file -m protein -s read -f paired -k url spec.yaml  | tr \"\\t\\n\" \"  \")\n",
        "\n",
        "# perform alignment, error correction, and counting\n",
        "kb count \\\n",
        "--workflow kite \\\n",
        "-i index.idx \\\n",
        "-g t2g.txt \\\n",
        "-x $x \\\n",
        "-w $w \\\n",
        "-o out --h5ad -t 2 \\\n",
        "$f"
      ],
      "metadata": {
        "id": "YCkH0Qq76n9H"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Single-cell/nuclei CRISPR quantification"
      ],
      "metadata": {
        "id": "g3mrKO3j6v2E"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Note that single-cell CRISPR guide RNAs can be quantified in the same way as TAG and PROTEIN data. Simply supply the guide RNA barcode file as the “feature barcodes” file."
      ],
      "metadata": {
        "id": "X2Da0_y86yPz"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "kupLer4S6wZW"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Single-cell/nuclei ATAC quantification"
      ],
      "metadata": {
        "id": "olhjtMy660Iy"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "ThaYUsgv60oO"
      },
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "colab": {
      "name": "scratchpad",
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}