Skip to contents

This function creates a list of objects which is used by all of loupe(), catch() and release().

Usage

create_object_list(df_current, df_previous, datetime_variable)

Arguments

df_current

data.frame, the newest/current version of dataset x.

df_previous

data.frame, the old version of dataset, for example x - t1.

datetime_variable

string, which variable to use as unique ID to join df_current and df_previous. Usually a "datetime" variable.

Value

A list containing boolean where TRUE indicates no changes to previous data and FALSE indicates unexpected changes, a dataframe of the current data without new rows and a dataframe of new rows only

Details

This function matches two dataframe objects by their unique identifier (usually "time" or "datetime in a timeseries).

It informs the user of new (unmatched) rows which have appeared, and then returns a waldo::compare() call to give a detailed breakdown of changes.

The main assumption is that df_current and df_previous are a newer and older versions of the same data, and that the datetime_variable variable name always remains the same. Elsewhere new columns can of appear, and these will be returned in the report.

Examples

butterfly_object_list <- butterfly::create_object_list(
  butterflycount$february, # This is your new or current dataset
  butterflycount$january, # This is the previous version you are comparing it to
  datetime_variable = "time" # This is the unique ID variable they have in common
)
#> The following rows are new in 'butterflycount$february': 
#>         time count
#> 1 2024-02-01    17
#>  And there are no differences with previous data.

butterfly_object_list
#> $butterfly_status
#> [1] TRUE
#> 
#> $df_current_without_new_row
#>         time count
#> 1 2024-01-01    22
#> 2 2023-12-01    55
#> 3 2023-11-01    11
#> 
#> $df_current_new_rows
#>         time count
#> 1 2024-02-01    17
#>