Skip to contents

Combines nutrients or variables that are spread out over multiple columns into a single new column new_var, depending on a user-set hierarchy. The hierarchy is set so that var1_column is the main variable, and the priority. If no values for var1_column are available (i.e. the var1_column has blanks, or NA values), then values from var2_column are used instead. If there are still blanks, then values from var3_column are used, then var4_column, then var5_column and finally var6_column. Please note - the use of var3_column - var6_column are optional, however var1_column and var2_column must be present. Comments can also be used to record the origin of these values.

Usage

nutri_combiner(
  df,
  var1_column,
  var2_column,
  var3_column,
  var4_column,
  var5_column,
  var6_column,
  new_var,
  fill_missing = FALSE,
  comment = TRUE,
  comment_col = "comments"
)

Arguments

df

Required - the data.frame the data is currently stored in.

var1_column

Required - The column name of the primary variable to pull values from. This should be the variable you most want to use.

var2_column

Required - The column name of the secondary variable to pull values from. This should be the variable you most want to use, if you can't use var1_column.

var3_column

Optional - The column name of the tertiary variable to pull values from. This should be the variable you most want to use, if you can't use var1_column or var2_column.

var4_column

Optional - The column name of the fourth most appropriate variable to pull values from. This should be the next most appropriate variable after the ones selected for var1_column, var2_column, and var3_column.

var5_column

Optional - The column name of the fifth most appropriate variable to pull values from, after the columns selected for var1_column to var4_column.

var6_column

Optional - The column name of the sixth variable. This should be the least appropriate variable to use, as it will only be used if a value cannot be found using var1_column to var5_column.

new_var

Required - The name of the new column that will be created by combining the variable columns. It is recommended to use the nutrient's INFOODS Tagname, followed by the units - e.g. Thiamine in milligrams would be 'THIAmg'. The suffix '_combined' is automatically attached to the inputted name.

fill_missing

Optional - default: FALSE - TRUE or FALSE. If set to TRUE, this will cause the nutri_combiner to check for missing columns (or inputs that don't match columns in the df). If it finds them, instead of throwing an error as it normally would, the function removes the ones which aren't valid, and then fills in the variables in the correct order out of the remaining valid column names.

comment

Required - default: TRUE - TRUE or FALSE. If comment is set to TRUE (as it is by default), when the function is run a comment describing the source of new_var column is added to the comment_col. If no comment_col is selected, and comment = TRUE, one is created.

comment_col

Optional - default: 'comments' - A potential input variable; the column which contains the metadata comments for the food item in question. Not required if the comment parameter is set to FALSE. If set to true, and the comment_col entry is not found in the df, it will create a column with the name of the entry.

Value

Original data.frame with a new _combined column, and (depending on the options selected) an additional comment/comments column and comment.

Examples

# An example data.frame has been created to give an example of using the
# nutri_combiner to combine FAT values.
breakfast_df <- breakfast_df[, c("food_code", "food_name", "FATg", "FAT_g",
"FATCEg", "comments")]
breakfast_df
#>    food_code      food_name FATg FAT_g FATCEg                       comments
#> 1      F0001          Bacon   21  20.9     NA                               
#> 2      F0002          Beans   NA  12.0     NA These are imaginary food items
#> 3      F0003          Toast   NA    NA     NA                           <NA>
#> 4      F0004       Mushroom   NA    NA   16.0 With imaginary nutrient values
#> 5      F0005           Eggs   11  10.9     NA                               
#> 6      F0006         Tomato   NA  33.0   33.0                     And blanks
#> 7      F0007        Sausage   13  12.1     NA                           <NA>
#> 8      F0008         Butter   16  16.1   15.9      To test different outputs
#> 9      F0009    Brown Sauce   NA    NA     NA                               
#> 10     F0010 Tomato Ketchup   NA    NA     NA                  And scenarios

# We start with a data.frame containing multiple patchy values for fat. Ideally
# we would like to combine these into a single 'combined' column with as few gaps
# as possible. We would like to use FATg as the main value, and then fill in with
# FAT_g as a second choice, and then FATCEg as a last resort. We would like the
# new column to be called 'FAT_g_combined'.
#
# In this case, the following nutri_combiner input would be used:

Fat_combined_results <- nutri_combiner(
  breakfast_df,
  "FATg",
  "FAT_g",
  "FATCEg",
  new_var = "FAT_g")
#> ---------------------------
#> 
#> Breakdown of values used:
#> 
#>             FAT_g_combined equal to FATCEg 
#>                                          1 
#>              FAT_g_combined equal to FAT_g 
#>                                          2 
#>               FAT_g_combined equal to FATg 
#>                                          4 
#> No suitable value for FAT_g_combined found 
#>                                          3 
#> 
#> ---------------------------

Fat_combined_results
#>    food_code      food_name FATg FAT_g FATCEg
#> 1      F0001          Bacon   21  20.9     NA
#> 2      F0002          Beans   NA  12.0     NA
#> 3      F0003          Toast   NA    NA     NA
#> 4      F0004       Mushroom   NA    NA   16.0
#> 5      F0005           Eggs   11  10.9     NA
#> 6      F0006         Tomato   NA  33.0   33.0
#> 7      F0007        Sausage   13  12.1     NA
#> 8      F0008         Butter   16  16.1   15.9
#> 9      F0009    Brown Sauce   NA    NA     NA
#> 10     F0010 Tomato Ketchup   NA    NA     NA
#>                                                          comments
#> 1                                    FAT_g_combined equal to FATg
#> 2   These are imaginary food items; FAT_g_combined equal to FAT_g
#> 3                      No suitable value for FAT_g_combined found
#> 4  With imaginary nutrient values; FAT_g_combined equal to FATCEg
#> 5                                    FAT_g_combined equal to FATg
#> 6                       And blanks; FAT_g_combined equal to FAT_g
#> 7                                    FAT_g_combined equal to FATg
#> 8         To test different outputs; FAT_g_combined equal to FATg
#> 9                      No suitable value for FAT_g_combined found
#> 10      And scenarios; No suitable value for FAT_g_combined found
#>    FAT_g_combined
#> 1              21
#> 2              12
#> 3              NA
#> 4              16
#> 5              11
#> 6              33
#> 7              13
#> 8              16
#> 9              NA
#> 10             NA

# Note how the values are filled in according to the priority order - with
# a note added to the comments column showing the origins for each.

# As an example of the fill_missing function, see what happens when the
# function is run with an incorrect column entered:

Fat_combined_results_2 <- nutri_combiner(
  breakfast_df,
  "FATg",
  "FAT_g",
  "NonExistant_Fat_Value",
  "FATCEg",
  new_var = "FAT_g",
  fill_missing = TRUE)
#> Error - the following columns are not present in df. nutri-combiner will attempt to shift variables to fill the gap in the heirachy, if present.
#> NonExistant_Fat_Value
#> ---------------------------
#> 
#> Breakdown of values used:
#> 
#>             FAT_g_combined equal to FATCEg 
#>                                          1 
#>              FAT_g_combined equal to FAT_g 
#>                                          2 
#>               FAT_g_combined equal to FATg 
#>                                          4 
#> No suitable value for FAT_g_combined found 
#>                                          3 
#> 
#> ---------------------------

Fat_combined_results_2
#>    food_code      food_name FATg FAT_g FATCEg
#> 1      F0001          Bacon   21  20.9     NA
#> 2      F0002          Beans   NA  12.0     NA
#> 3      F0003          Toast   NA    NA     NA
#> 4      F0004       Mushroom   NA    NA   16.0
#> 5      F0005           Eggs   11  10.9     NA
#> 6      F0006         Tomato   NA  33.0   33.0
#> 7      F0007        Sausage   13  12.1     NA
#> 8      F0008         Butter   16  16.1   15.9
#> 9      F0009    Brown Sauce   NA    NA     NA
#> 10     F0010 Tomato Ketchup   NA    NA     NA
#>                                                          comments
#> 1                                    FAT_g_combined equal to FATg
#> 2   These are imaginary food items; FAT_g_combined equal to FAT_g
#> 3                      No suitable value for FAT_g_combined found
#> 4  With imaginary nutrient values; FAT_g_combined equal to FATCEg
#> 5                                    FAT_g_combined equal to FATg
#> 6                       And blanks; FAT_g_combined equal to FAT_g
#> 7                                    FAT_g_combined equal to FATg
#> 8         To test different outputs; FAT_g_combined equal to FATg
#> 9                      No suitable value for FAT_g_combined found
#> 10      And scenarios; No suitable value for FAT_g_combined found
#>    FAT_g_combined
#> 1              21
#> 2              12
#> 3              NA
#> 4              16
#> 5              11
#> 6              33
#> 7              13
#> 8              16
#> 9              NA
#> 10             NA

# See how the columns reorder, with notice.