This function matches two dataframe objects by their unique identifier (usually "time" or "datetime in a timeseries), and returns a new dataframe which contains the new rows (if present) but matched rows which contain changes from previous data will be dropped.
Arguments
- df_current
data.frame, the newest/current version of dataset x.
- df_previous
data.frame, the old version of dataset, for example x - t1.
- datetime_variable
string, which variable to use as unique ID to join
df_current
anddf_previous
. Usually a "datetime" variable.- include_new
boolean, should new rows be included? Default is TRUE.
Value
A dataframe which contains only rows of df_current
that have not changed from df_previous
, and includes new rows.
also returns a waldo object as in loupe()
.
Examples
# Dropping matched rows which contain changes, and returning unchanged rows
df_released <- butterfly::release(
butterflycount$march, # This is your new or current dataset
butterflycount$february, # This is the previous version you are comparing it to
datetime_variable = "time", # This is the unique ID variable they have in common
include_new = TRUE # Whether to include new rows or not, default is TRUE
)
#> The following rows are new in 'df_current':
#> time count
#> 1 2024-03-01 23
#>
#> ℹ The following values have changes from the previous data.
#> old vs new
#> count
#> old[1, ] 17
#> old[2, ] 22
#> old[3, ] 55
#> - old[4, ] 18
#> + new[4, ] 11
#>
#> `old$count`: 17 22 55 18
#> `new$count`: 17 22 55 11
#>
#> ℹ These will be dropped, but new rows are included.
df_released
#> time count
#> 1 2024-03-01 23
#> 2 2024-02-01 17
#> 3 2024-01-01 22
#> 4 2023-12-01 55