Skip to contents

[Experimental] This function is experimental, and may change. Any feedback would be greatly appreciated - https://github.com/TomCodd/NutritionTools/issues

The Data Imputer is an interactive function used to find suitable imputations for food items, either from the same dataset (by default) or from a different dataset. The output by default is a console output and a .txt file containing the lines of code needed to implement the imputations that the user has picked, however this can be changed to a data.frame with the changes implemented.

Usage

Data_Imputer(
  df,
  receiver_title_column,
  receiver_title,
  receiver_desc_column,
  receiver_exclude_terms = c(),
  receiver_id_column,
  missing_nutrient_column,
  water_column = "WATERg",
  comment_col = "comments",
  donor_fct_column = "Source",
  donor_df = df,
  donor_id_column = receiver_id_column,
  donor_search_column = receiver_desc_column,
  donor_search_terms = c(),
  extra_info_columns = c(),
  exclude_receiver_terms = TRUE,
  donor_search_collapse = c(","),
  Assume_continue = FALSE,
  term_search = "AND",
  water_balance = TRUE,
  code_output = TRUE,
  txt_output = TRUE,
  round_imputed_figure = TRUE
)

Arguments

df

Required - the data.frame which contains the food items with missing values.

receiver_title_column

Required - The name of the column in df which contains food groups or titles.

receiver_title

Required - The name of the food group or title you wish to examine - must be an item in the receiver_title_column column.

receiver_desc_column

Required - The name of the column within df that contains detailed food names.

receiver_exclude_terms

Optional - default: c() - Here you can enter the words you would like to exclude from a match search - e.g. if searching for replacement values for 'goat liver' in 'animal offals', you might want to exclude 'cow' and focus on results from other animals, such as sheep.

receiver_id_column

Required - The name of the column within df that contains the ID numbers of the food items.

missing_nutrient_column

Required - The name of the column within df AND donor_df that contains the nutrient you are trying to impute values for.

water_column

Required - default: 'WATERg' - The name of the column within df AND donor_df that contains water values, in grams per 100g.

comment_col

Required - default: 'comments' - The name of the column within df that contains comments.

donor_fct_column

Required - default: 'Source' - The name of the column within df that contains the source Food Composition Table the food items are from.

donor_df

Required - default: df - The name of the data.frame that you are looking to get fill-in values from. The default is the same df as the donor df, as this function was developed to run over large multi-FCT composite tables.

donor_id_column

Required - default: receiver_id_column - The name of the column within donor_df that contains the ID numbers of the food items. The default is suitable when using df as the donor and receiver.

donor_search_column

Required - default: receiver_desc_column - The name of the column within donor_df that contains detailed food names. The default is suitable when using df as the donor and receiver.

donor_search_terms

Required - The search terms you would like to use to find suitable imputation values to use. Added to the food descriptions of the receiver items.

extra_info_columns

Optional - The name of columns present in both data.frames (df AND donor_df) you would like to see the contents of when making decisions about which items should be used for imputation.

exclude_receiver_terms

Required - default: TRUE - Either TRUE or FALSE. The donor search terms by default are generated from the food descriptions found in receiver_desc_column. If TRUE then key words present in receiver_title will be excluded from these items. For example, if a food item is in a Food group 'Goat, Offal', and the food description is 'Goat, liver, raw'; if this option is set to TRUE then the search terms used will be 'liver, raw'; if set to FALSE then the full 'Goat, lever, raw' will be used.

donor_search_collapse

Optional - default: c(",") - The string used to separate the search terms. if the search terms are 'Goat, liver, raw', then using the default ',' will mean that the function will return items that match 'goat' and 'liver' and 'raw'. If none are used, only items that match the entire string; 'Goat, liver, raw'; will be returned.

Assume_continue

Required - default: FALSE - Either TRUE or FALSE. There are several checks throughout the process to double-check inputs. If set to TRUE, this setting skips them, assuming the inputs are correct.

Required - default: "AND" - Either "AND" or "OR". Decide whether the imputation value search should results should find items which match all the search terms at once ("AND") or only one of them "OR".

water_balance

Required - default: TRUE - Either TRUE or FALSE. If TRUE then the function will water-balance the values.

code_output

Required - default: TRUE - Either TRUE or FALSE. Decides whether the output should be pre-written code to be inserted just above where this function was called (by default), or a data.frame with the changes made (if set to FALSE).

txt_output

Required - default: TRUE - Either TRUE or FALSE. If using code_output, then this option attaches the generated code to a .txt file and saves it in your working directory.

round_imputed_figure

Required - default: TRUE - Either TRUE or FALSE. Decide whether the imputed values should be rounded to 2 decimal places.

Value

Either code that applies the imputations (if code_output is set to TRUE, as it is by default), or an altered data.frame with the imputations applied.

Examples

# Unfortunately due to the functions interactive nature these examples cannot
# be run from within the package help - please copy them, uncomment them, and
# run them manually.
#
#' #
# First we'll run through a demonstration of the Data_Imputer imputing from
# within the same dataset. Because this is the default setting, fewer inputs
# are needed.
#
# The dataset can be viewed using View(KE18_subset_modified)
#
# The dataset in question is missing some VITB12mcg values for 'Lamb liver,
# raw' and 'Lamb, liver, boiled (without salt)'. However, within the same
# dataset are some goat values which could be a good imputation value.
#
# Data_Imputer(
#    df = KE18_subset_modified,
#    receiver_title_column = "food_group",
#    receiver_title = "MEAT, POULTRY AND EGGS",
#    receiver_desc_column = "food_desc",
#    receiver_exclude_terms = c("lean", "blood"), #We don't need to see any
#    #of the 'lean' or 'blood' results
#    receiver_id_column = "fdc_id",
#    term_search = "OR",
#    missing_nutrient_column = "VITB12mcg",
#    donor_search_terms = c("goat"),
#    water_column = "WATERg",
#    comment_col = "comments",
#    donor_fct_column = "source_fct"
#  )


# In this example we'll impute values from a different data.frame - the West
# Africa FCT subset, WA19_subset. This can be viewed using View(WA19_subset).
# We also want to look at some extra columns when we want to choose an item,
# so we've added two columns to the extra_info_columns option.

#  Data_Imputer(
#    df = KE18_subset_modified,
#    receiver_title_column = "food_group",
#    receiver_title = "MEAT, POULTRY AND EGGS",
#    receiver_desc_column = "food_desc",
#    receiver_exclude_terms = c("lean", "blood"), #We don't need to see any
#    #of the 'lean' or 'blood' results
#    receiver_id_column = "fdc_id",
#    missing_nutrient_column = "VITB12mcg",
#    donor_search_terms = c("goat"),
#    water_column = "WATERg",
#    comment_col = "comments",
#    donor_fct_column = "source_fct",
#    donor_df = WA19_subset,
#    donor_id_column = "fdc_id",
#    term_search = "OR",
#    donor_search_column = "food_desc",
#    extra_info_columns = c("PROCNTg", "CHOAVLDFg")
#  )