Combines nutrients or variables that are spread out over
multiple columns into a single new column new_var
, depending on a
user-set hierarchy. The hierarchy is set so that var1_column
is
the main variable, and the priority. If no values for var1_column
are available (i.e. the var1_column
has blanks, or NA values),
then values from var2_column
are used instead. If there are still
blanks, then values from var3_column
are used, then
var4_column
, then var5_column
and finally
var6_column
. Please note - the use of var3_column
-
var6_column
are optional, however var1_column
and
var2_column
must be present. Comments can also be used to record
the origin of these values.
Usage
nutri_combiner(
df,
var1_column,
var2_column,
var3_column,
var4_column,
var5_column,
var6_column,
new_var,
fill_missing = FALSE,
comment = TRUE,
comment_col = "comments"
)
Arguments
- df
Required - the data.frame the data is currently stored in.
- var1_column
Required - The column name of the primary variable to pull values from. This should be the variable you most want to use.
- var2_column
Required - The column name of the secondary variable to pull values from. This should be the variable you most want to use, if you can't use
var1_column
.- var3_column
Optional - The column name of the tertiary variable to pull values from. This should be the variable you most want to use, if you can't use
var1_column
orvar2_column
.- var4_column
Optional - The column name of the fourth most appropriate variable to pull values from. This should be the next most appropriate variable after the ones selected for
var1_column
,var2_column
, andvar3_column
.- var5_column
Optional - The column name of the fifth most appropriate variable to pull values from, after the columns selected for
var1_column
tovar4_column
.- var6_column
Optional - The column name of the sixth variable. This should be the least appropriate variable to use, as it will only be used if a value cannot be found using
var1_column
tovar5_column
.- new_var
Required - The name of the new column that will be created by combining the variable columns. It is recommended to use the nutrient's INFOODS Tagname, followed by the units - e.g. Thiamine in milligrams would be 'THIAmg'. The suffix '_combined' is automatically attached to the inputted name.
- fill_missing
Optional - default:
FALSE
-TRUE
orFALSE
. If set toTRUE
, this will cause the nutri_combiner to check for missing columns (or inputs that don't match columns in the df). If it finds them, instead of throwing an error as it normally would, the function removes the ones which aren't valid, and then fills in the variables in the correct order out of the remaining valid column names.- comment
Required - default:
TRUE
-TRUE
orFALSE
. If comment is set toTRUE
(as it is by default), when the function is run a comment describing the source ofnew_var
column is added to the comment_col. If no comment_col is selected, andcomment = TRUE
, one is created.- comment_col
Optional - default:
'comments'
- A potential input variable; the column which contains the metadata comments for the food item in question. Not required if the comment parameter is set toFALSE
. If set to true, and the comment_col entry is not found in the df, it will create a column with the name of the entry.
Value
Original data.frame with a new _combined
column, and
(depending on the options selected) an additional comment/comments column
and comment.
Examples
# An example data.frame has been created to give an example of using the
# nutri_combiner to combine FAT values.
breakfast_df <- breakfast_df[, c("food_code", "food_name", "FATg", "FAT_g",
"FATCEg", "comments")]
breakfast_df
#> food_code food_name FATg FAT_g FATCEg comments
#> 1 F0001 Bacon 21 20.9 NA
#> 2 F0002 Beans NA 12.0 NA These are imaginary food items
#> 3 F0003 Toast NA NA NA <NA>
#> 4 F0004 Mushroom NA NA 16.0 With imaginary nutrient values
#> 5 F0005 Eggs 11 10.9 NA
#> 6 F0006 Tomato NA 33.0 33.0 And blanks
#> 7 F0007 Sausage 13 12.1 NA <NA>
#> 8 F0008 Butter 16 16.1 15.9 To test different outputs
#> 9 F0009 Brown Sauce NA NA NA
#> 10 F0010 Tomato Ketchup NA NA NA And scenarios
# We start with a data.frame containing multiple patchy values for fat. Ideally
# we would like to combine these into a single 'combined' column with as few gaps
# as possible. We would like to use FATg as the main value, and then fill in with
# FAT_g as a second choice, and then FATCEg as a last resort. We would like the
# new column to be called 'FAT_g_combined'.
#
# In this case, the following nutri_combiner input would be used:
Fat_combined_results <- nutri_combiner(
breakfast_df,
"FATg",
"FAT_g",
"FATCEg",
new_var = "FAT_g")
#> ---------------------------
#>
#> Breakdown of values used:
#>
#> FAT_g_combined equal to FATCEg
#> 1
#> FAT_g_combined equal to FAT_g
#> 2
#> FAT_g_combined equal to FATg
#> 4
#> No suitable value for FAT_g_combined found
#> 3
#>
#> ---------------------------
Fat_combined_results
#> food_code food_name FATg FAT_g FATCEg
#> 1 F0001 Bacon 21 20.9 NA
#> 2 F0002 Beans NA 12.0 NA
#> 3 F0003 Toast NA NA NA
#> 4 F0004 Mushroom NA NA 16.0
#> 5 F0005 Eggs 11 10.9 NA
#> 6 F0006 Tomato NA 33.0 33.0
#> 7 F0007 Sausage 13 12.1 NA
#> 8 F0008 Butter 16 16.1 15.9
#> 9 F0009 Brown Sauce NA NA NA
#> 10 F0010 Tomato Ketchup NA NA NA
#> comments
#> 1 FAT_g_combined equal to FATg
#> 2 These are imaginary food items; FAT_g_combined equal to FAT_g
#> 3 No suitable value for FAT_g_combined found
#> 4 With imaginary nutrient values; FAT_g_combined equal to FATCEg
#> 5 FAT_g_combined equal to FATg
#> 6 And blanks; FAT_g_combined equal to FAT_g
#> 7 FAT_g_combined equal to FATg
#> 8 To test different outputs; FAT_g_combined equal to FATg
#> 9 No suitable value for FAT_g_combined found
#> 10 And scenarios; No suitable value for FAT_g_combined found
#> FAT_g_combined
#> 1 21
#> 2 12
#> 3 NA
#> 4 16
#> 5 11
#> 6 33
#> 7 13
#> 8 16
#> 9 NA
#> 10 NA
# Note how the values are filled in according to the priority order - with
# a note added to the comments column showing the origins for each.
# As an example of the fill_missing function, see what happens when the
# function is run with an incorrect column entered:
Fat_combined_results_2 <- nutri_combiner(
breakfast_df,
"FATg",
"FAT_g",
"NonExistant_Fat_Value",
"FATCEg",
new_var = "FAT_g",
fill_missing = TRUE)
#> Error - the following columns are not present in df. nutri-combiner will attempt to shift variables to fill the gap in the heirachy, if present.
#> NonExistant_Fat_Value
#> ---------------------------
#>
#> Breakdown of values used:
#>
#> FAT_g_combined equal to FATCEg
#> 1
#> FAT_g_combined equal to FAT_g
#> 2
#> FAT_g_combined equal to FATg
#> 4
#> No suitable value for FAT_g_combined found
#> 3
#>
#> ---------------------------
Fat_combined_results_2
#> food_code food_name FATg FAT_g FATCEg
#> 1 F0001 Bacon 21 20.9 NA
#> 2 F0002 Beans NA 12.0 NA
#> 3 F0003 Toast NA NA NA
#> 4 F0004 Mushroom NA NA 16.0
#> 5 F0005 Eggs 11 10.9 NA
#> 6 F0006 Tomato NA 33.0 33.0
#> 7 F0007 Sausage 13 12.1 NA
#> 8 F0008 Butter 16 16.1 15.9
#> 9 F0009 Brown Sauce NA NA NA
#> 10 F0010 Tomato Ketchup NA NA NA
#> comments
#> 1 FAT_g_combined equal to FATg
#> 2 These are imaginary food items; FAT_g_combined equal to FAT_g
#> 3 No suitable value for FAT_g_combined found
#> 4 With imaginary nutrient values; FAT_g_combined equal to FATCEg
#> 5 FAT_g_combined equal to FATg
#> 6 And blanks; FAT_g_combined equal to FAT_g
#> 7 FAT_g_combined equal to FATg
#> 8 To test different outputs; FAT_g_combined equal to FATg
#> 9 No suitable value for FAT_g_combined found
#> 10 And scenarios; No suitable value for FAT_g_combined found
#> FAT_g_combined
#> 1 21
#> 2 12
#> 3 NA
#> 4 16
#> 5 11
#> 6 33
#> 7 13
#> 8 16
#> 9 NA
#> 10 NA
# See how the columns reorder, with notice.