Skip to contents

Create an arrow schema from a tasks.json config file. For use when opening an arrow dataset.

Usage

create_hub_schema(
  config_tasks,
  partitions = list(model_id = arrow::utf8()),
  output_type_id_datatype = c("auto", "character", "double", "integer", "logical",
    "Date"),
  r_schema = FALSE
)

Arguments

config_tasks

a list version of the content's of a hub's tasks.json config file created using function read_config().

partitions

a named list specifying the arrow data types of any partitioning column.

output_type_id_datatype

character string. One of "auto", "character", "double", "integer", "logical", "Date". Defaults to "auto" indicating that output_type_id will be determined automatically from the tasks.json config file. Other data type values can be used to override automatic determination. Note that attempting to coerce output_type_id to a data type that is not possible (e.g. trying to coerce to "double" when the data contains "character" values) will likely result in an error or potentially unexpected behaviour so use with care.

r_schema

Logical. If FALSE (default), return an arrow::schema() object. If TRUE, return a character vector of R data types.

Value

an arrow schema object that can be used to define column datatypes when opening model output data. If r_schema = TRUE, a character vector of R data types.

Examples

hub_path <- system.file("testhubs/simple", package = "hubUtils")
config_tasks <- read_config(hub_path, "tasks")
schema <- create_hub_schema(config_tasks)