This function is experimental, and may change. Any feedback would be greatly appreciated - https://github.com/TomCodd/NutritionTools/issues
The Data Imputer is an interactive function used to find suitable imputations for food items, either from the same dataset (by default) or from a different dataset. The output by default is a console output and a .txt file containing the lines of code needed to implement the imputations that the user has picked, however this can be changed to a data.frame with the changes implemented.
Usage
Data_Imputer(
df,
receiver_title_column,
receiver_title,
receiver_desc_column,
receiver_exclude_terms = c(),
receiver_id_column,
missing_nutrient_column,
water_column = "WATERg",
comment_col = "comments",
donor_fct_column = "Source",
donor_df = df,
donor_id_column = receiver_id_column,
donor_search_column = receiver_desc_column,
donor_search_terms = c(),
extra_info_columns = c(),
exclude_receiver_terms = TRUE,
donor_search_collapse = c(","),
Assume_continue = FALSE,
term_search = "AND",
water_balance = TRUE,
code_output = TRUE,
txt_output = TRUE,
round_imputed_figure = TRUE
)
Arguments
- df
Required - the data.frame which contains the food items with missing values.
- receiver_title_column
Required - The name of the column in
df
which contains food groups or titles.- receiver_title
Required - The name of the food group or title you wish to examine - must be an item in the
receiver_title_column
column.- receiver_desc_column
Required - The name of the column within
df
that contains detailed food names.- receiver_exclude_terms
Optional - default:
c()
- Here you can enter the words you would like to exclude from a match search - e.g. if searching for replacement values for 'goat liver' in 'animal offals', you might want to exclude 'cow' and focus on results from other animals, such as sheep.- receiver_id_column
Required - The name of the column within
df
that contains the ID numbers of the food items.- missing_nutrient_column
Required - The name of the column within
df
ANDdonor_df
that contains the nutrient you are trying to impute values for.- water_column
Required - default:
'WATERg'
- The name of the column withindf
ANDdonor_df
that contains water values, in grams per 100g.- comment_col
Required - default:
'comments'
- The name of the column withindf
that contains comments.- donor_fct_column
Required - default:
'Source'
- The name of the column withindf
that contains the source Food Composition Table the food items are from.- donor_df
Required - default:
df
- The name of the data.frame that you are looking to get fill-in values from. The default is the same df as the donor df, as this function was developed to run over large multi-FCT composite tables.- donor_id_column
Required - default:
receiver_id_column
- The name of the column withindonor_df
that contains the ID numbers of the food items. The default is suitable when usingdf
as the donor and receiver.- donor_search_column
Required - default:
receiver_desc_column
- The name of the column withindonor_df
that contains detailed food names. The default is suitable when usingdf
as the donor and receiver.- donor_search_terms
Required - The search terms you would like to use to find suitable imputation values to use. Added to the food descriptions of the receiver items.
- extra_info_columns
Optional - The name of columns present in both data.frames (
df
ANDdonor_df
) you would like to see the contents of when making decisions about which items should be used for imputation.- exclude_receiver_terms
Required - default:
TRUE
- EitherTRUE
orFALSE
. The donor search terms by default are generated from the food descriptions found inreceiver_desc_column
. IfTRUE
then key words present inreceiver_title
will be excluded from these items. For example, if a food item is in a Food group 'Goat, Offal', and the food description is 'Goat, liver, raw'; if this option is set toTRUE
then the search terms used will be 'liver, raw'; if set toFALSE
then the full 'Goat, lever, raw' will be used.- donor_search_collapse
Optional - default:
c(",")
- The string used to separate the search terms. if the search terms are 'Goat, liver, raw', then using the default ',' will mean that the function will return items that match 'goat' and 'liver' and 'raw'. If none are used, only items that match the entire string; 'Goat, liver, raw'; will be returned.- Assume_continue
Required - default:
FALSE
- EitherTRUE
orFALSE
. There are several checks throughout the process to double-check inputs. If set toTRUE
, this setting skips them, assuming the inputs are correct.- term_search
Required - default:
"AND"
- Either"AND"
or"OR"
. Decide whether the imputation value search should results should find items which match all the search terms at once ("AND"
) or only one of them"OR"
.- water_balance
Required - default:
TRUE
- EitherTRUE
orFALSE
. IfTRUE
then the function will water-balance the values.- code_output
Required - default:
TRUE
- EitherTRUE
orFALSE
. Decides whether the output should be pre-written code to be inserted just above where this function was called (by default), or a data.frame with the changes made (if set toFALSE
).- txt_output
Required - default:
TRUE
- EitherTRUE
orFALSE
. If usingcode_output
, then this option attaches the generated code to a .txt file and saves it in your working directory.- round_imputed_figure
Required - default:
TRUE
- EitherTRUE
orFALSE
. Decide whether the imputed values should be rounded to 2 decimal places.
Value
Either code that applies the imputations (if code_output
is
set to TRUE
, as it is by default), or an altered data.frame with the
imputations applied.
Examples
# Unfortunately due to the functions interactive nature these examples cannot
# be run from within the package help - please copy them, uncomment them, and
# run them manually.
#
#' #
# First we'll run through a demonstration of the Data_Imputer imputing from
# within the same dataset. Because this is the default setting, fewer inputs
# are needed.
#
# The dataset can be viewed using View(KE18_subset_modified)
#
# The dataset in question is missing some VITB12mcg values for 'Lamb liver,
# raw' and 'Lamb, liver, boiled (without salt)'. However, within the same
# dataset are some goat values which could be a good imputation value.
#
# Data_Imputer(
# df = KE18_subset_modified,
# receiver_title_column = "food_group",
# receiver_title = "MEAT, POULTRY AND EGGS",
# receiver_desc_column = "food_desc",
# receiver_exclude_terms = c("lean", "blood"), #We don't need to see any
# #of the 'lean' or 'blood' results
# receiver_id_column = "fdc_id",
# term_search = "OR",
# missing_nutrient_column = "VITB12mcg",
# donor_search_terms = c("goat"),
# water_column = "WATERg",
# comment_col = "comments",
# donor_fct_column = "source_fct"
# )
# In this example we'll impute values from a different data.frame - the West
# Africa FCT subset, WA19_subset. This can be viewed using View(WA19_subset).
# We also want to look at some extra columns when we want to choose an item,
# so we've added two columns to the extra_info_columns option.
# Data_Imputer(
# df = KE18_subset_modified,
# receiver_title_column = "food_group",
# receiver_title = "MEAT, POULTRY AND EGGS",
# receiver_desc_column = "food_desc",
# receiver_exclude_terms = c("lean", "blood"), #We don't need to see any
# #of the 'lean' or 'blood' results
# receiver_id_column = "fdc_id",
# missing_nutrient_column = "VITB12mcg",
# donor_search_terms = c("goat"),
# water_column = "WATERg",
# comment_col = "comments",
# donor_fct_column = "source_fct",
# donor_df = WA19_subset,
# donor_id_column = "fdc_id",
# term_search = "OR",
# donor_search_column = "food_desc",
# extra_info_columns = c("PROCNTg", "CHOAVLDFg")
# )