sleuth_prep | R Documentation |
A sleuth is a group of kallistos. Borrowing this terminology, a 'sleuth' object stores a group of kallisto results, and can then operate on them while accounting for covariates, sequencing depth, technical and biological variance.
sleuth_prep(sample_to_covariates, full_model = NULL, target_mapping = NULL, aggregation_column = NULL, num_cores = max(1L, parallel::detectCores() - 1L), ...)
sample_to_covariates |
a |
full_model |
an R |
target_mapping |
a |
aggregation_column |
a string of the column name in |
num_cores |
an integer of the number of computer cores mclapply should use to speed up sleuth preparation |
... |
any of several other arguments that can be used as advanced options for sleuth preparation. See details. |
This method takes a list of samples with kallisto results and returns a sleuth
object with the defined normalization of the data across samples (default is the DESeq method;
See basic_filter
), and then the defined transformation of the data (default is log(x + 0.5)).
This also collects all of the bootstraps for the modeling done using sleuth_fit
. This
function also takes several advanced options that can be used to customize your analysis.
Here are the advanced options for sleuth_prep
:
Extra arguments related to Bootstrap Summarizing:
extra_bootstrap_summary
: if TRUE
, compute extra summary
statistics for estimated counts. This is not necessary for typical analyses; it is only needed
for certain plots (e.g. plot_bootstrap
). Default is FALSE
.
read_bootstrap_tpm
: read and compute summary statistics on bootstraps on the TPM.
This is not necessary for typical analyses; it is only needed for some plots (e.g. plot_bootstrap
)
and if TPM values are used for sleuth_fit
. Default is FALSE
.
max_bootstrap
: the maximum number of bootstrap values to read for each
transcript. Setting this lower than the total bootstraps available will save some time, but
will likely decrease the accuracy of the estimation of the inferential noise.
Advanced Options for Filtering:
filter_fun
: the function to use when filtering. This function will be applied to the raw counts
on a row-wise basis, meaning that each feature will be considered individually. The default is to filter out
any features that do not have at least 5 estimated counts in at least 47
for more information). If the preferred filtering method requires a matrix-wide transformation or otherwise
needs to consider multiple features simultaneously instead of independently, please consider using
filter_target_id
below.
filter_target_id
: character vector of target_ids to filter using methods that
can't be implemented using filter_fun
. If non-NULL, this will override filter_fun
.
Advanced Options for the Normalization Step: (NOTE: Be sure you know what you're doing before you use these options)
normalize
: boolean for whether normalization and other steps should be performed.
If this is set to false, bootstraps will not be read and transformation of the data will not be done.
This should only be set to FALSE
if one desires to do a quick check of the raw data.
The default is TRUE
.
norm_fun_counts
: a function to perform between sample normalization on the estimated counts.
The default is the DESeq method. See norm_factors
for details.
norm_fun_tpm
: a function to perform between sample normalization on the TPM.
The default is the DESeq method. See norm_factors
for details.
Advanced Options for the Transformation Step: (NOTE: Be sure you know what you're doing before you use these options)
transform_fun_counts
: the transformation that should be applied
to the normalized counts. Default is 'log(x+0.5)'
(i.e. natural log with 0.5 offset).
transform_fun_tpm
: the transformation that should be applied
to the TPM values. Default is 'x'
(i.e. the identity function / no transformation)
Advanced Options for Gene Aggregation:
gene_mode
: Set this to TRUE
to get the old counts-aggregation method
for doing gene-level analysis. This requires aggregation_column
to be set. If
TRUE
, this will override the p-value aggregation mode, but will allow for gene-centric
modeling, plotting, and results.
a sleuth
object containing all kallisto samples, metadata,
and summary statistics
sleuth_fit
to fit a model, sleuth_wt
or
sleuth_lrt
to perform hypothesis testing
# Assume we have run kallisto on a set of samples, and have two treatments, genotype and drug. colnames(s2c) # [1] "sample" "genotype" "drug" "path" so <- sleuth_prep(s2c, ~genotype + drug)