Title: | A Value Replacement Utility |
---|---|
Description: | Updates values within csv format data files using a custom, User-built csv format lookup file. Based on 'data.table' package. |
Authors: | Bandur Dragos [aut, cre] |
Maintainer: | Bandur Dragos <[email protected]> |
License: | GPL-3 |
Version: | 1.0.2 |
Built: | 2025-02-13 05:13:36 UTC |
Source: | https://github.com/cran/replacer |
User-intended function to process a list of pairs of data files and associated lookup files listed in this order.
bReplace(dir, x, save = TRUE, msgs = FALSE)
bReplace(dir, x, save = TRUE, msgs = FALSE)
dir |
Quoted character of length = 1 describing the path to the directory containing the data and associated lookup files, with either forward or double backward slash and no end slash (e.g. "C:/path/to/directory"). |
x |
List of character vectors each of length 2 containing full names of the data file and the associated lookup file, as described in replaceVals. |
save |
Logical, default TRUE: save results to directory; FALSE: display only. |
msgs |
Logical, default FALSE: suppress messages. TRUE: print a named list containing messages specific to each run. |
A named list displaying updated data and multiple replacement count tables. Also, updated csv files which are saved to dir.
In examples, please leave argument save to FALSE. Otherwise, copy all content of folder "extdata" found in the installed package root into a directory on your machine. Use the absolute path to this directory as dir argument.
if (interactive()) { # A list of data/lookup names: fs = list(c('data.csv', 'lookup.csv') , c("data_unique.csv", "lookup_unique.csv") , c('data_id.csv', 'lookupNA.csv') , c('data_id.csv', 'lookupDUP.csv') , c('chile.csv', 'chile_nadup.csv') , c('data_id.csv', 'lookup_id.csv') , c('data_id.csv', 'lookup_idsimple.csv') , c('chile.csv', 'chile_id.csv') ) ##Not run: dir = system.file("extdata", package = "replacer") bReplace(dir, fs, save = FALSE, msgs = TRUE) }
if (interactive()) { # A list of data/lookup names: fs = list(c('data.csv', 'lookup.csv') , c("data_unique.csv", "lookup_unique.csv") , c('data_id.csv', 'lookupNA.csv') , c('data_id.csv', 'lookupDUP.csv') , c('chile.csv', 'chile_nadup.csv') , c('data_id.csv', 'lookup_id.csv') , c('data_id.csv', 'lookup_idsimple.csv') , c('chile.csv', 'chile_id.csv') ) ##Not run: dir = system.file("extdata", package = "replacer") bReplace(dir, fs, save = FALSE, msgs = TRUE) }
This helper prevents the error in fcoalesce when attempting to coalesce two vectors of different data type (double/integer).
con2fcoales(u, z)
con2fcoales(u, z)
u , z
|
Vectors of equal length and of different data types (e.g. double and integer). Missing values are accepted. |
A double data type vector of same length as the arguments.
The function sends the prepared data.tables to sReplace, receives updated data, displays a list of updated data and of counts of multiple replacements and saves updated data to disk (see Details).
replaceVals(dir, ..., save = TRUE)
replaceVals(dir, ..., save = TRUE)
dir |
Quoted character of length = 1 describing the path to the directory containing the data and associated lookup files, with either forward or double backward slash and no end slash (e.g. "C:/path/to/directory"). |
... |
Not used when file names are "data.csv", "lookup.csv". Otherwise, custom names including file extension, within quotation marks, such as "<data_name>.csv", "<lookup_name>.csv", entered in this order!. |
save |
Logical, default TRUE: save results to dir. FALSE: display only. See Note below. |
The workflow:
The function reads the data/lookup pair converting each file to "data.table" class, performs conformance checks on associated lookup, removes uninvolved data columns and non-standard lookup columns. Upon return from sReplace, re-structures updated result in the original format, saves the updated data to dir and displays a one-run named list containing updated data along with counts of duplicated and/or missing values replacements requests.
The function displays messages and comments regarding the internal workflow. It is recommended reading these messages/comments as first troubleshooting step since they are specific to each file pair and request type. To suppress messages, wrap the function with suppressMessages. The vignette contains definitions of terms.
A named list containing updated data and multiple replacement counts. Also, a csv file saved in the same directory, under the name updated_<data_name>using<lookup_name>.csv.
In examples, please leave argument save to FALSE. Otherwise, copy all content of folder "extdata" found in the installed package root into a directory on your machine. Use the absolute path to this directory as dir argument.
## Not run: datasets with default names "data.csv", "lookup.csv" located in *dir* if (interactive()) { dir = system.file("extdata", package = "replacer") replaceVals(dir, save = FALSE) ## no messages (not recommended!) suppressMessages(replaceVals(dir, save = FALSE)) }
## Not run: datasets with default names "data.csv", "lookup.csv" located in *dir* if (interactive()) { dir = system.file("extdata", package = "replacer") replaceVals(dir, save = FALSE) ## no messages (not recommended!) suppressMessages(replaceVals(dir, save = FALSE)) }
The function is not intended for direct use. Once called by replaceVals it firstly checks for index presence in lookup. Upon the result of this check, the function moves along the branches of a decision tree (see Details).
sReplace(x, y0, uv)
sReplace(x, y0, uv)
x , y0
|
Data.tables |
uv |
Character vector or list of same length as x, containing unique names of involved columns in data. |
The function starts by checking the presence of a User-made index in lookup.
The function calls the helper whichDups to find the duplicated values in data. Also, looks for missing values set for multiple replacements and for eventual splits on missing data. In case of mixed simple/multiple requests the function splits lookup into maximum 3 subsets: one for simple replacements, for which it creates an internal index, one for multiple replacements of duplicated values for which it creates an internal index, and one for multiple replacements of missing values for which an internal index is not necessary.
The internal index contains row numbers corresponding to all the elements of distinct subsets of duplicated values
found within each involved data column and loops the function data.table::set()
to perform replacements on these
columns.
As mentioned above, no index is created for multiple replacements of missing values as there is only one generic value
per data column. The missing values data subset is then reshaped, and the columns are coalesced (see data.table
Manual) with corresponding data columns, for each generic value entered in lookup.
As stated above, simple replacements of unique values without User-made index are possible. Once the internal index created, the subset is reshaped, joined with the data on index and the corresponding columns are coalesced.
The function subsets the lookup using the special index values 0 and/or NA (or empty). At maximum, 3 subsets of lookup are formed as above. The replacement process is similar with the process used for absent index with the difference that simple replacements already have User-made index.
Following the decision tree described above, the function calls utility's helpers and functions imported from the data.table package to process all lookup requests, in one single run.
A named list containing updated involved columns in x, count of multiple replacements of duplicated values (if requested), count of multiple replacements of missing values (if requested).
The function finds duplicated values in each column of the data file. Although not intended for direct use, it can be applied to a data file once converted into "data.table" class.
whichDups(x)
whichDups(x)
x |
A data.table. |
A named character vector. Data columns containing distinct sets of duplicated values have the names indexed.
if (interactive()) { dir = system.file('extdata', package = 'replacer') setwd(dir) x = data.table::fread('data.csv', na.strings = c(NA_character_, '')) whichDups(x) }
if (interactive()) { dir = system.file('extdata', package = 'replacer') setwd(dir) x = data.table::fread('data.csv', na.strings = c(NA_character_, '')) whichDups(x) }