Reformat Taxonomic Lineage using taxonkit

taxonkit_reformat(
  file_path,
  delimiter = NULL,
  add_prefix = FALSE,
  prefix_kingdom = "K__",
  prefix_phylum = "p__",
  prefix_class = "c__",
  prefix_order = "o__",
  prefix_family = "f__",
  prefix_genus = "g__",
  prefix_species = "s__",
  prefix_subspecies = "t__",
  prefix_strain = "T__",
  fill_miss_rank = FALSE,
  format_string = "",
  miss_rank_repl_prefix = "unclassified ",
  miss_rank_repl = "",
  miss_taxid_repl = "",
  output_ambiguous_result = FALSE,
  lineage_field = 2,
  taxid_field = NULL,
  pseudo_strain = FALSE,
  trim = FALSE,
  text = FALSE,
  data_dir = NULL
)

Arguments

file_path

The path to the input file with taxonomic lineages. Or file text (text=TRUE)

delimiter

The field delimiter in the input lineage (default ";").

add_prefix

Logical, indicating whether to add prefixes for all ranks (default: FALSE).

prefix_kingdom

The prefix for kingdom, used along with --add-prefix (default: "K__").

prefix_phylum

The prefix for phylum, used along with --add-prefix (default: "p__").

prefix_class

The prefix for class, used along with --add-prefix (default: "c__").

prefix_order

The prefix for order, used along with --add-prefix (default: "o__").

prefix_family

The prefix for family, used along with --add-prefix (default: "f__").

prefix_genus

The prefix for genus, used along with --add-prefix (default: "g__").

prefix_species

The prefix for species, used along with --add-prefix (default: "s__").

prefix_subspecies

The prefix for subspecies, used along with --add-prefix (default: "t__").

prefix_strain

The prefix for strain, used along with --add-prefix (default: "T__").

fill_miss_rank

Logical, indicating whether to fill missing rank with lineage information of the next higher rank (default: FALSE).

format_string

The output format string with placeholders for each rank.

miss_rank_repl_prefix

The prefix for estimated taxon level for missing rank (default: "unclassified ").

miss_rank_repl

The replacement string for missing rank.

miss_taxid_repl

The replacement string for missing taxid.

output_ambiguous_result

Logical, indicating whether to output one of the ambiguous result (default: FALSE).

lineage_field

The field index of lineage. Input data should be tab-separated (default: 2).

taxid_field

The field index of taxid. Input data should be tab-separated. It overrides -i/--lineage-field.

pseudo_strain

Logical, indicating whether to use the node with lowest rank as strain name (default: FALSE).

trim

Logical, indicating whether to not fill missing rank lower than current rank (default: FALSE).

text

logical

data_dir

directory containing nodes.dmp and names.dmp (default "/Users/asa/.taxonkit")

Value

A character vector containing the reformatted taxonomic lineages.

Examples

if (FALSE) {
# Use taxid
taxids2 <- system.file("extdata/taxids2.txt", package = "pctax")
reformatted_lineages <- taxonkit_reformat(taxids2,
  add_prefix = TRUE, taxid_field = 1, fill_miss_rank = TRUE
)
reformatted_lineages
taxonomy <- strsplit2(reformatted_lineages, "\t")
taxonomy <- strsplit2(taxonomy$V2, ";")

# Use lineage result
taxonkit_lineage("9606\n63221", show_name = TRUE, show_rank = TRUE, text = TRUE) %>%
  taxonkit_reformat(text = TRUE)
}