Skip to contents

A loupe is a simple, small magnification device used to examine small details more closely.

Usage

loupe(df_current, df_previous, datetime_variable)

Arguments

df_current

data.frame, the newest/current version of dataset x.

df_previous

data.frame, the old version of dataset, for example x - t1.

datetime_variable

string, which variable to use as unique ID to join df_current and df_previous. Usually a "datetime" variable.

Value

A boolean where TRUE indicates no changes to previous data and FALSE indicates unexpected changes.

Details

This function is intended to aid in the verification of continually updating timeseries data where we expect new values but want to ensure previous values remains unchanged.

This function matches two dataframe objects by their unique identifier (usually "time" or "datetime in a timeseries).

It informs the user of new (unmatched) rows which have appeared, and then returns a waldo::compare() call to give a detailed breakdown of changes.

The main assumption is that df_current and df_previous are a newer and older versions of the same data, and that the datetime_variable variable name always remains the same. Elsewhere new columns can of appear, and these will be returned in the report.

The underlying functionality is handled by create_object_list().

Examples

# Checking two dataframes for changes, and returning TRUE (no changes) or FALSE (changes)
# This example contains no differences with previous data
butterfly::loupe(
  butterflycount$february, # This is your new or current dataset
  butterflycount$january, # This is the previous version you are comparing it to
  datetime_variable = "time" # This is the unique ID variable they have in common
)
#> The following rows are new in 'df_current': 
#>         time count
#> 1 2024-02-01    17
#>  And there are no differences with previous data.
#> [1] TRUE

# This example does contain differences with previous data
butterfly::loupe(
  butterflycount$march,
  butterflycount$february,
  datetime_variable = "time"
)
#> The following rows are new in 'df_current': 
#>         time count
#> 1 2024-03-01    23
#> 
#>  The following values have changes from the previous data.
#> old vs new
#>            count
#>   old[1, ]    17
#>   old[2, ]    22
#>   old[3, ]    55
#> - old[4, ]    18
#> + new[4, ]    11
#> 
#> `old$count`: 17 22 55 18
#> `new$count`: 17 22 55 11
#> [1] FALSE