create_object_list: creates a list of objects used in all butterfly functions
Source:R/create_object_list.R
create_object_list.Rd
Arguments
- df_current
data.frame, the newest/current version of dataset x.
- df_previous
data.frame, the old version of dataset, for example x - t1.
- datetime_variable
string, which variable to use as unique ID to join
df_current
anddf_previous
. Usually a "datetime" variable.- ...
Other
waldo::compare()
arguments can be supplied here, such astolerance
ormax_diffs
. See?waldo::compare()
for a full list.
Value
A list containing boolean where TRUE indicates no changes to previous data and FALSE indicates unexpected changes, a dataframe of the current data without new rows and a dataframe of new rows only
Details
This function matches two dataframe objects by their unique identifier (usually "time" or "datetime in a timeseries).
It informs the user of new (unmatched) rows which have appeared, and then
returns a waldo::compare()
call to give a detailed breakdown of changes.
The main assumption is that df_current
and df_previous
are a newer and
older versions of the same data, and that the datetime_variable
variable
name always remains the same. Elsewhere new columns can of appear, and these
will be returned in the report.
Examples
butterfly_object_list <- butterfly::create_object_list(
butterflycount$february, # New or current dataset
butterflycount$january, # Previous version you are comparing to
datetime_variable = "time" # Unique ID variable they have in common
)
#> The following rows are new in 'butterflycount$february':
#> time count
#> 1 2024-02-01 17
#> ✔ And there are no differences with previous data.
butterfly_object_list
#> $butterfly_status
#> [1] TRUE
#>
#> $df_current_without_new_row
#> time count
#> 1 2024-01-01 22
#> 2 2023-12-01 55
#> 3 2023-11-01 11
#>
#> $df_current_new_rows
#> time count
#> 1 2024-02-01 17
#>
# You can pass other `waldo::compare()` options such as tolerance here
butterfly_object_list <- butterfly::create_object_list(
butterflycount$march, # New or current dataset
butterflycount$february, # Previous version you are comparing it to
datetime_variable = "time", # Unique ID variable they have in common
tolerance = 2
)
#> The following rows are new in 'butterflycount$march':
#> time count
#> 1 2024-03-01 23
#> ✔ And there are no differences with previous data.
butterfly_object_list
#> $butterfly_status
#> [1] TRUE
#>
#> $df_current_without_new_row
#> time count
#> 1 2024-02-01 17
#> 2 2024-01-01 22
#> 3 2023-12-01 55
#> 4 2023-11-01 18
#>
#> $df_current_new_rows
#> time count
#> 1 2024-03-01 23
#>