A tool to test the integrity of a decimal system in a dataframe
Source:R/DecimalSystemCheck.R
Decimal_System_Checker.Rd
This function reads in a dataframe, as well as the names of 2-4 columns which comprise the decimal system within that dataframe. It then checks the integrity of each series of decimal identities in each row against the rest of the decimal identities within that row, picking up any inconsistencies. Any inconsistencies are reported, eith in console messages or in a error report dataframe.
Arguments
- df
Required - The data frame containing the decimal system.
- first
Required - The first column of the decimal system - the most basic item ID; e.g. 01 .
- second
Required - The second column of the decimal system - the first subdivision from the base ID; e.g. 01.005 .
- third
Optional - The third column of the decimal system - the second subdivision from the base ID and the first from the second ID; e.g. 01.005.03 .
- fourth
Optional - The fourth column of the decimal system - the third subdivision from the base ID, second subdivision from the second ID, and first subdivision from the third ID; e.g. 01.005.03.01 .
Examples
#Two examples will be covered - one that results in the output error table,
#another that produces the output messages only (not recommended for large
#dataframes).
#First, we must create a test dataframe:
test_df <- data.frame( c("Merlot", "pinot grigio", "Chateauneuf-du-Pape",
"Tokaji", "Champagne", "Sauvignon Blanc", "Chardonnay", "Malbec"), c("01",
"01", "01", "01", "02", "02", "02", "02"), c("02.01", "01.01", "01.02",
"01.02", "02.01", "02.01", "02.02", "02.02"), c("02.01.0111", "01.01.0131",
"01.02.0001", "01.02.2031", "02.01.1001", "02.01.1001", "02.02.3443",
"02.03.4341"), c("02.01.0111.01", "01.01.0131.04", "01.02.0001.01",
"01.02.2031.03", "02.01.1001.06", "02.01.1001.06", "02.01.3443.02",
"02.02.4341.03") )
#Then we should rename the columns of the dataframe:
colnames(test_df) <-
c("Wine names",
"ID1",
"ID2",
"ID3",
"ID4"
)
#This first line runs the dataframe, and has an output variable listed. This
#means that as well as putting a message in the console when an error is
#found, all the error reports will be saved to a dataframe too.
output_test <- Decimal_System_Checker(test_df, first = "ID1", second =
"ID2", third = "ID3", fourth = "ID4")
#> Tertiary decimal level used
#>
#> Quaternary decimal level used
#>
#> duplicate codes found in fourth level: 02.01.1001.06
#> [1] "02.01.0111.01"
#> The first part of the secondary code (02) does not match the primary code (01); 02.01 vs. 01.
#> [1] "02.01.0111.01"
#> The first part of the tertiary code (02) does not match the primary code (01); 02.01.0111 vs. 01.
#> [1] "02.01.0111.01"
#> The first part of the quaternary code (02) does not match the primary code (01); 02.01.0111.01 vs. 01.
#> [1] "02.01.3443.02"
#> The second part of the quaternary code (01) does not match the secondary code (02); 02.01.3443.02 vs. 02.02.
#> [1] "02.02.4341.03"
#> The second part of the tertiary code (03) does not match the secondary code (02); 02.03.4341 vs. 02.02.
#However, if we only want to get the readouts and not have an error
#dataframe to refer back to, then the code can be run like so:
Decimal_System_Checker(test_df, first = "ID1", second = "ID2", third =
"ID3", fourth = "ID4")
#> Tertiary decimal level used
#>
#> Quaternary decimal level used
#>
#> duplicate codes found in fourth level: 02.01.1001.06
#> [1] "02.01.0111.01"
#> The first part of the secondary code (02) does not match the primary code (01); 02.01 vs. 01.
#> [1] "02.01.0111.01"
#> The first part of the tertiary code (02) does not match the primary code (01); 02.01.0111 vs. 01.
#> [1] "02.01.0111.01"
#> The first part of the quaternary code (02) does not match the primary code (01); 02.01.0111.01 vs. 01.
#> [1] "02.01.3443.02"
#> The second part of the quaternary code (01) does not match the secondary code (02); 02.01.3443.02 vs. 02.02.
#> [1] "02.02.4341.03"
#> The second part of the tertiary code (03) does not match the secondary code (02); 02.03.4341 vs. 02.02.
#> primary code secondary code tertiary code quaternary code
#> 1 01 02.01 02.01.0111 02.01.0111.01
#> 5 02 02.01 02.01.1001 02.01.1001.06
#> 6 02 02.01 02.01.1001 02.01.1001.06
#> 7 02 02.02 02.02.3443 02.01.3443.02
#> 8 02 02.02 02.03.4341 02.02.4341.03
#> error
#> 1 The first part of the secondary code (02) does not match the primary code (01); 02.01 vs. 01. - The first part of the tertiary code (02) does not match the primary code (01); 02.01.0111 vs. 01. - The first part of the quaternary code (02) does not match the primary code (01); 02.01.0111.01 vs. 01.
#> 5 duplicate codes found in fourth level: 02.01.1001.06
#> 6 duplicate codes found in fourth level: 02.01.1001.06
#> 7 The second part of the quaternary code (01) does not match the secondary code (02); 02.01.3443.02 vs. 02.02.
#> 8 The second part of the tertiary code (03) does not match the secondary code (02); 02.03.4341 vs. 02.02.
#This will do the same thing as the previous run, producing error printouts,
#but it will not create an error report dataframe.