---
title: "Introduction to PowerSDI"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to PowerSDI}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: bibliography.bib
---
# Introduction
{PowerSDI} calculates the Standardized Precipitation (SPI; @mckee1993) and Standardized Precipitation-Evapotranspiration (SPEI; @VicenteSerrano2010) indices using gridded-data from the NASA-POWER project.
The package includes functions (`ScientSDI()`, `Reference()`, `Accuracy()` and `PlotData()`), which evaluate how well the drought indices meet their conceptual assumptions.
The `OperatSDI()` function allows users to download NASA-POWER data for specific monitoring periods.
The package adopts a quasi-weekly basic time scale (`quart.month`), allowing for four index calculations per month: days 1 to 7, days 8 to 14, days 15 to 21, and days 22 to 28, 29, 30, or 31 depending on the month.
For instance, if the time scale is set to 4 (`TS = 4`), it corresponds to a moving window with a 1-month length that is calculated four times each month.
If `TS = 48`, the time scale corresponds to a moving window with a 12-month length that is calculated four times each month.
{PowerSDI} uses functions from the {nasapower} [@Sparks2018; @Sparks2023] and {Lmom} [@Hosking2023] packages.
# Basic Instructions: Getting Started
Load the library in your R session.
```r
library("PowerSDI")
```
## Using PlotData() to visually inspect NASA-POWER data
The `PlotData()` function allows a visual inspection of the indices' inputs (precipitation and potential evapotranspiration) obtained from the NASA-POWER project.
These inputs (in millimetres) are show at the 1-quart.month time scale.
## Example 1: Inspecting indices' input for Campinas-SP, Brazil
```r
PlotData(
lon = -47.3,
lat = -22.65,
start.date = "1991-01-01",
end.date = "2022-12-31"
)
#> Just a sec. Downloading NASA POWER data and calculating the other parameters.
```
## ScientSDI(): Replacing suspicious data and calculating the distributions' parameters
The first step of the SPI and SPEI algorithms is to calculate the cumulative probabilities of their input variables [@Guttman1999].
The `ScientSDI()` function estimates the parameters of the gamma, generalized extreme value (GEV), or generalized logistic distributions (GLO) through the L-moments method. This function also allows users to replace suspicious values from the data sample.
## Example 2: Applying the ScientSDI() function for Campinas-SP, Brazil
The plots generated by the `PlotData()` function for Campinas revealed one suspicious rain value larger than 250 mm.
Since this suspicious record represents less than 1% of the data sample, we replaced it with (`RainUplim = 250`) before calculating the parameters of the gamma and GEV distributions. The PE data showed no suspicious values.
```r
Campinas <- ScientSDI(
lon = -47.3,
lat = -22.65,
start.date = "1991-01-01",
end.date = "2022-12-31",
distr = "GEV",
TS = 4,
Good = "no",
RainUplim = 250,
RainLowlim = NULL,
PEUplim = NULL,
PELowlim = NULL
)
#> Removed row(s) above limit: 420 for rain
#> The calculations started on: 1991-01-01
```
Check the SDI values.
```r
head(Campinas$SDI)
#> Year Month quart.month Rain PE.Harg PE.PM PPE.Harg PPE.PM SPI
#> 1 1991 1 4 244.61 170.7373 146.4035 73.87270 98.20646 0.09109635
#> 2 1991 2 5 314.52 162.4491 135.7954 152.07087 178.72457 1.03838634
#> 3 1991 2 6 323.98 155.9019 131.5357 168.07813 192.44427 1.33262036
#> 4 1991 2 7 300.04 151.1668 128.4917 148.87323 171.54828 1.01676070
#> 5 1991 2 8 209.40 135.6084 115.1231 73.79158 94.27689 0.47371145
#> 6 1991 3 9 240.75 128.9954 109.1543 111.75463 131.59571 1.08669958
#> SPEI.Harg SPEI.PM Categ.SPI Categ.SPEI.Harg Categ.SPEI.PM
#> 1 -0.1936137 -0.1557081 Normal Normal Normal
#> 2 0.8203645 0.8938564 mod.wet Normal Normal
#> 3 1.3080700 1.4026983 mod.wet mod.wet mod.wet
#> 4 0.9191998 0.9590826 mod.wet Normal Normal
#> 5 0.4402641 0.5283875 Normal Normal Normal
#> 6 1.0332707 1.1195075 mod.wet mod.wet mod.wet
```
Check the DistPar
```r
head(Campinas$DistPar)
#> lon lat quart.month alfa.rain beta.rain probzero.rain loc.harg sc.harg
#> [1,] -47.3 -22.65 1 7.408528 29.87982 0 13.73670 74.65810
#> [2,] -47.3 -22.65 2 8.764616 27.03333 0 34.70564 77.84755
#> [3,] -47.3 -22.65 3 9.446275 24.96878 0 36.18941 71.86870
#> [4,] -47.3 -22.65 4 9.797904 25.09108 0 60.80832 88.85291
#> [5,] -47.3 -22.65 5 7.515294 30.45793 0 50.58665 98.79377
#> [6,] -47.3 -22.65 6 8.519209 25.71193 0 40.44269 87.16465
#> sh.harg loc.pm sc.pm sh.pm TS
#> [1,] 0.04643763 35.10085 77.23558 0.05326288 4
#> [2,] 0.10306408 54.98155 82.17268 0.12652166 4
#> [3,] 0.05582726 54.32006 75.73398 0.05512823 4
#> [4,] 0.35711219 80.30166 96.80879 0.39667736 4
#> [5,] 0.51834901 69.37979 106.06754 0.58681673 4
#> [6,] 0.42728104 55.63610 91.23542 0.44648066 4
```
## Accuracy(): Verifying how well NASA-POWER data actually represent real-world/observed data
A basic assumption related to remote sensing data is that they really represent the "real-world" conditions.
Thus, the `Accuracy()` function calculates the following measures of accuracy: the absolute mean error (AME), root-mean-square-error (RMSE), original (dorig), modified (dmod), refined (dref) Willmott's indices of agreement, and Pearson determination coefficient (R2).
## Example 3: Applying the Accuracy function to compare observed (obs) and NASA-POWER PE data in Campinas-SP, Brazil
In the example below, `data("ObsEst")` contains pairs of observed and estimated potential evapotranspiration (PE) data.
The observed and estimated data were obtained from a weather station in Campinas and from the NASA-POWER project, respectively.
PE was estimated using the Hargreaves & Samani method [@Hargreaves1985].
```r
data("ObsEst")
Compare_PE_Cps <- Accuracy(obs_est = ObsEst, conf.int = "No")
Compare_PE_Cps
#> AME RMSE dorig dmod dref RQuad
#> 2.470223 3.144231 0.9718557 0.8386579 0.838266 0.9197235
```
## OperatSDI(): Generating routine operational NASA-SPI and NASA-SPEI estimates
Considering that the parameters of the gamma (SPI) and GEV (SPEI) distributions have already been estimated by the `ScientSDI()` function, we can now use the `OperatSDI()` function to calculate these two indices on a routine basis.
## Example 4: Applying the OperatSDI function to calculate the SPI and SPEI in Campinas-SP, Brazil
The OperatSDI function enables users to download NASA-POWER data only for the period they intend to monitor.
```r
parms <- Campinas$DistPar # Obtained in Example 2
SDI <- OperatSDI(
lon = -47.3,
lat = -22.65,
start.date = "2023-08-01",
end.date = "2023-08-07",
parms = Campinas$DistPar,
TS = 4
)
#> Calculating...
#> Considering the selected `TS`,4, the calculations started on: 2023-06-22
SDI
#> Lon Lat Year Month quart.month Rain PE PPE SPI SPEI
#> 1 -47.3 -22.65 2023 7 27 15.67 83.24573 -67.57573 -0.17758163 -0.3648784
#> 2 -47.3 -22.65 2023 7 28 16.54 90.45608 -73.91608 -0.09934836 -0.2155000
#> 3 -47.3 -22.65 2023 8 29 16.50 98.82854 -82.32854 -0.23425126 -0.5376825
#> Categ.SPI Categ.SPEI
#> 1 Normal Normal
#> 2 Normal Normal
#> 3 Normal Normal
```
# Not So Basic Instructions:
## The ScientSDI() function: Removing suspicious data and assessing the indices' conceptual assumptions
Standardized indices, both SPI and SPEI are expected to provide spatially consistent interpretations of their values [@Guttman1999].
Therefore, their frequency distributions are expected to always approach the standard normal distribution regardless of the region, season, or time scale at which the indices are calculated [@Wu2006; @Stagge2015; @Blain2017].
In this context, the ScientSDI function calculates two normality-checking procedures described by @Wu2006 and @Stagge2015.
Additionally, the calculation algorithm of the SPI and SPEI relies on fitting a parametric distribution to their input data.
Therefore, the `ScientSDI()` function also applies the widely-used Lilliefors [@Lilliefors1967] and Anderson-Darling [@Anderson1954] goodness-of-fit tests to the gamma and GEV/GLO distributions.
The function `ScientSDI()` allows users to select significance levels ranging from 5 to 10% to run these tests.
Additionally, the SPEI algorithm often uses the generalized extreme value (GEV) or the generalized logistic (GLO) [@VicenteSerrano2010; Begueria2013; @Stagge2015; @Stagge2015a; @VicenteSerrano2015; @Blain2017].
A review of these studies suggests that the performance of these two distributions for calculating the SPEI tends to be similar to each other over most of the range of the index possible values (e.g. -2.0:2:0).
However, these studies also found significant differences between the two probability functions in the lower and upper tails [@VicenteSerrano2015].
In this context, the `ScientSDI()` function also allows the users to choose between these two models when calculating the SPEI.
## Example 5.1: Applying the ScientSDI() function with the GEV and verifying conceptual assumptions (Campinas-SP).
```r
Campinas.GEV <- ScientSDI(
lon = -47.3,
lat = -22.65,
start.date = "1991-01-01",
end.date = "2022-12-31",
distr = "GEV",
TS = 4,
Good = "yes",
sig.level = 0.95,
RainUplim = 250,
RainLowlim = NULL,
PEUplim = NULL,
PELowlim = NULL
)
#> Removed row(s) above limit: 420 for rain
#> The calculations started on: 1991-01-01
```
Check the goodness of fit.
```r
head(Campinas.GEV$GoodFit)
#> Lili.Rain Crit Lili.PPEHarg Crit Lili.PPEPM Crit AD.Rain
#> [1,] 0.11611448 0.1396603 0.09643114 0.1222336 0.10632197 0.1236653 0.6390502
#> [2,] 0.11600471 0.1428267 0.09792190 0.1253470 0.08309344 0.1267864 0.9182668
#> [3,] 0.13690470 0.1413462 0.13726836 0.1223576 0.13620839 0.1265062 0.8535305
#> [4,] 0.12338364 0.1381886 0.12333484 0.1208748 0.11852563 0.1225573 0.7133716
#> [5,] 0.05656565 0.1435980 0.06200571 0.1226508 0.05552784 0.1203243 0.1413624
#> [6,] 0.07813975 0.1403786 0.08737362 0.1228580 0.09892327 0.1225442 0.6182675
#> Crit AD.PPEHarg Crit AD.PPEPM Crit
#> [1,] 0.7332467 0.3436237 0.5310983 0.3620798 0.5272992
#> [2,] 0.7060077 0.3130080 0.5854438 0.2911684 0.6386024
#> [3,] 0.7261569 0.5908702 0.5463382 0.6880857 0.5640758
#> [4,] 0.7292376 0.6114124 0.5115284 0.6220234 0.5193580
#> [5,] 0.7369261 0.1526510 0.5074809 0.1144797 0.5235433
#> [6,] 0.7463400 0.2623157 0.5169554 0.3154090 0.5154911
```
Check the normality.
```r
head(Campinas.GEV$Normality)
#> SPI.Shap SPI.Shap.p SPI.AbsMed
#> [1,] "0.967900296734503" "0.463209059642815" "0.0997964641569041"
#> [2,] "0.96619340372341" "0.420858893940406" "0.0104781137427224"
#> [3,] "0.975582768775712" "0.682405906920205" "0.129821941048217"
#> [4,] "0.953414578207946" "0.180061525951205" "0.168690442662682"
#> [5,] "0.930008418348121" "0.0391836955173206" "0.291796230478462"
#> [6,] "0.940736018465501" "0.0785562689884901" "0.198234408496805"
#> SPEI.Harg.Shap SPEI.Harg.Shap.p SPEI.Harg.AbsMed
#> [1,] "0.968381434480606" "0.475655103439624" "0.146211508037505"
#> [2,] "0.965983791350213" "0.415857065519545" "0.0430601773978751"
#> [3,] "0.977751926965489" "0.747676355214095" "0.146518713888501"
#> [4,] "0.978980071731148" "0.76943094784405" "0.000945228380311781"
#> [5,] "0.966177284178725" "0.401102999039477" "0.116531689396188"
#> [6,] "0.948926963831835" "0.134350899702926" "0.0873165947811334"
#> SPEI.PM.Shap SPEI.PM.Shap.p SPEI.PM.AbsMed SPI.testI
#> [1,] "0.968798285250651" "0.486612320910051" "0.165875549466575" "Normal"
#> [2,] "0.975597247176717" "0.682843863812409" "0.109628810903115" "Normal"
#> [3,] "0.984353877822483" "0.918729743222738" "0.122398459123283" "Normal"
#> [4,] "0.975156717722638" "0.651747489859029" "0.0378055330733986" "Normal"
#> [5,] "0.967804350100931" "0.441114432321209" "0.106759663113951" "NoNormal"
#> [6,] "0.944436532965484" "0.100097027435938" "0.116250636028372" "NoNormal"
#> SPEI.Harg.testI SPEI.PM.testI SPI.testII SPEI.Harg.testII SPEI.PM.testII
#> [1,] "Normal" "Normal" "Normal" "Normal" "Normal"
#> [2,] "Normal" "Normal" "Normal" "Normal" "Normal"
#> [3,] "Normal" "Normal" "Normal" "Normal" "Normal"
#> [4,] "Normal" "Normal" "Normal" "Normal" "Normal"
#> [5,] "Normal" "Normal" "NoNormal" "Normal" "Normal"
#> [6,] "Normal" "Normal" "Normal" "Normal" "Normal"
```
## Example 5.2: Applying the ScientSDI() function with the GLO and verifying conceptual assumptions (Campinas-SP).
```r
Campinas.GLO <- ScientSDI(
lon = -47.3,
lat = -22.65,
start.date = "1991-01-01",
end.date = "2022-12-31",
distr = "GLO",
TS = 4,
Good = "yes",
sig.level = 0.95,
RainUplim = 250,
RainLowlim = NULL,
PEUplim = NULL,
PELowlim = NULL
)
#> Removed row(s) above limit: 420 for rain
#> The calculations started on: 1991-01-01
```
Check the goodness of fit.
```r
head(Campinas.GLO$GoodFit)
#> Lili.Rain Crit Lili.PPEHarg Crit Lili.PPEPM Crit AD.Rain
#> [1,] 0.11611448 0.1408241 0.07793223 0.1267743 0.08361417 0.1294131 0.6390502
#> [2,] 0.11600471 0.1438986 0.07754312 0.1248404 0.07863811 0.1290235 0.9182668
#> [3,] 0.13690470 0.1394402 0.11096348 0.1283290 0.10719062 0.1275217 0.8535305
#> [4,] 0.12338364 0.1390644 0.10560516 0.1236858 0.10132682 0.1254306 0.7133716
#> [5,] 0.05656565 0.1378510 0.05057530 0.1234080 0.04670884 0.1240447 0.1413624
#> [6,] 0.07813975 0.1377996 0.11117785 0.1267045 0.12181073 0.1251562 0.6182675
#> Crit AD.PPEHarg Crit AD.PPEPM Crit
#> [1,] 0.7021159 0.2188857 0.5549677 0.21990399 0.5609734
#> [2,] 0.7345871 0.2930398 0.5376209 0.27915703 0.5696080
#> [3,] 0.7112916 0.4200223 0.5585982 0.51359779 0.5502796
#> [4,] 0.7054471 0.5535407 0.5338313 0.57487863 0.5582377
#> [5,] 0.7249042 0.1123251 0.5474743 0.09516567 0.5540126
#> [6,] 0.7072738 0.4195567 0.5615291 0.50640990 0.5446173
```
Check the normality.
```r
head(Campinas.GLO$GoodFit)
#> Lili.Rain Crit Lili.PPEHarg Crit Lili.PPEPM Crit AD.Rain
#> [1,] 0.11611448 0.1408241 0.07793223 0.1267743 0.08361417 0.1294131 0.6390502
#> [2,] 0.11600471 0.1438986 0.07754312 0.1248404 0.07863811 0.1290235 0.9182668
#> [3,] 0.13690470 0.1394402 0.11096348 0.1283290 0.10719062 0.1275217 0.8535305
#> [4,] 0.12338364 0.1390644 0.10560516 0.1236858 0.10132682 0.1254306 0.7133716
#> [5,] 0.05656565 0.1378510 0.05057530 0.1234080 0.04670884 0.1240447 0.1413624
#> [6,] 0.07813975 0.1377996 0.11117785 0.1267045 0.12181073 0.1251562 0.6182675
#> Crit AD.PPEHarg Crit AD.PPEPM Crit
#> [1,] 0.7021159 0.2188857 0.5549677 0.21990399 0.5609734
#> [2,] 0.7345871 0.2930398 0.5376209 0.27915703 0.5696080
#> [3,] 0.7112916 0.4200223 0.5585982 0.51359779 0.5502796
#> [4,] 0.7054471 0.5535407 0.5338313 0.57487863 0.5582377
#> [5,] 0.7249042 0.1123251 0.5474743 0.09516567 0.5540126
#> [6,] 0.7072738 0.4195567 0.5615291 0.50640990 0.5446173
```
## The Accuracy() function: Applying the Accuracy function and calculating confidence intervals.
The `Accuracy()` function may also provide confidence intervals (CI) for AME, RMSE, dorig, dmod and dref.
As emphasized by @Willmott1985, the CI specifies a range of values within which these statistics are expected to vary by chance.
Consequently, users can interpret the magnitude of the CI as an indicator of the reliability of the estimated values for the comparison metrics @Willmott1985.
The function allows users to select between 90 and 95% CIs.
## Example 6: Applying the Accuracy function and calculating confidence intervals (Campinas-SP).
```r
data("ObsEst")
Compare_PE_Cps_withCI <-
Accuracy(obs_est = ObsEst,
conf.int = "yes",
sig.level = 0.95)
#> Just a sec
Compare_PE_Cps_withCI
#> AMECIinf AME AMECIsup RMSECIinf RMSE RMSECIsup dorig_CIinf dorig
#> 2.467782 2.470223 2.473673 3.140299 3.144231 3.148564 0.9717026 0.9718557
#> dorigCIsup dmodCIinf dmod dmodCIsup dref_CIinf dref dref_CIsup RQuad_CIinf
#> 0.9718942 0.8382366 0.8386579 0.8387874 0.8377858 0.838266 0.838368 0.9194093
#> RQuad RQuad_CIsup
#> 0.9197235 0.919914
```
# References