This function automatically processes geNomad output files by detecting sample names from the directory structure and optionally integrates CheckV quality assessment results.
pre_genomad(
genomad_out_dir = "",
checkV_out_dir = NULL,
provirus = TRUE,
filter = TRUE,
checkV_out_prefix = NULL,
min_length = 1000,
min_completeness = 50
)Character. Path to the geNomad output directory. This directory should contain sample-specific subdirectories with the pattern "*.contigs_summary".
Character. Optional path to the CheckV output directory. If provided, quality summary will be integrated. Default is NULL.
Logical. Whether to identify and separate provirus sequences. Default is TRUE.
Logical. Whether to apply quality filtering to viral sequences. Default is TRUE.
Character. Optional prefix to remove from CheckV contig IDs.
Numeric. Minimum sequence length for filtering. Default is 1000.
Numeric. Minimum completeness score for CheckV filtering. Default is 50.
An object of class "virus_res" containing four components:
Detected sample name
Integrated data frame with geNomad and optional CheckV results
Gene-level annotations from geNomad
Filtered high-quality viral sequences
The function automatically detects sample names by searching for directories with the pattern "*.contigs_summary" within the genomad_out_dir. It then extracts the sample name by removing the ".contigs_summary" suffix.
if (FALSE) { # \dontrun{
# Basic usage - sample name will be automatically detected
virus_results <- pre_genomad(genomad_out_dir = "~/Documents/R/Lung_virome/data/genomad_out2/")
# Access the detected sample name
sample_name <- virus_results$sample
print(paste("Detected sample:", sample_name))
} # }