You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is no noise applied to those covariates `a` and `b`, moderate noise on the raw data `z`, so the two additional effects should be recover-able by a statistical model.
@@ -465,8 +463,8 @@ It might make sense to incorporate categorical variables into the binning.
465
463
With binning comes the immediate question also part of step 3:
466
464
**what do we actually quantify as "difference"?**
467
465
468
-
Conventionally, *variance* (VAR) is the mean squared difference of observed values (or a subset of observations, e.g.in a group or bin) from their mean.
469
-
It is implemented in R with the `var` function (note that R implements the "sample variance", i.e.the formula normalizing by `n - 1` for [Bessel's correction](https://en.wikipedia.org/wiki/Bessel%27s_correction)).
466
+
Conventionally, *variance* (VAR) is the mean squared difference of observed values (or a subset of observations, e.g.in a group or bin) from their mean.
467
+
It is implemented in R with the `var` function (note that R implements the "sample variance", i.e.the formula normalizing by `n - 1` for [Bessel's correction](https://en.wikipedia.org/wiki/Bessel%27s_correction)).
470
468
However, applying this formula for variograms is ~~wrong~~ unconventional!
471
469
472
470
In *variograms*, the mean is replaced by a given point on the landscape (we want to look at differences from that focus point), and then we iterate over adjacent points.
@@ -479,7 +477,7 @@ Personally, I find "constant between samples" a bit fishy: is it "constant betwe
479
477
If something is "independent of location", why bother computing a spatial interpolation?
480
478
The answer lies somewhere between (among?) the very exact maths hidden in the literature.
481
479
482
-
And, anyways, if the assumption holds, we get a neat formula for semivariance ("Method-of-Moments Estimator" according to Cressie 1993, 69, eqn. 2.4.2), which goes back to Matheron (1962).
480
+
And, anyways, if the assumption holds, we get a neat formula for semivariance ("Method-of-Moments Estimator" according to [Cressie, 1993, p. 69](#ref-Cressie1993), eqn. 2.4.2), which goes back to Matheron ([1962](#ref-Matheron1962)).
483
481
484
482
We define the **semivariance**\\(\gamma\\):
485
483
@@ -597,7 +595,7 @@ There are some general convenience wrappers for classical regression in R, thoug
597
595
However, [base-r `optim` does all we need](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/optim) (*sometimes*).
598
596
There are [other libraries](https://cran.r-project.org/web/views/Optimization.html).
599
597
600
-
We choose between well-known "Nelder-Mead" optimization algorithm (Nelder and Mead 1965), or the more versatile ["L-BFGS-B"](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) (Zhu et al. 1997; Byrd et al. 1995).
598
+
We choose between well-known "Nelder-Mead" optimization algorithm ([Nelder & Mead, 1965](#ref-nelder1965)), or the more versatile ["L-BFGS-B"](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) ([Byrd *et al.*, 1995](#ref-byrd1995); [Zhu *et al.*, 1997](#ref-zhu1997)).
601
599
They are interchangeable to some degree, yet the latter allows to define logical parameter boundaries to facilitate optimization convergence.
- The regression fits the data more or less well, quantified by the mean square error (`mse`).
701
699
- Optimizer did converge (`convergence 0`, [see "convergence" here](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/optim.html)), which should not be overrated (the regression might still be irrelevant).
702
-
- Parameters can be measured, in this case intercept (\\(0.43\\)) and slope (\\(4\times 10^{-4}\\)).
700
+
- Parameters can be measured, in this case intercept (\\(0.43\\)) and slope (\\(0.0004\\)).
703
701
704
702
We can do better, of course.
705
703
@@ -745,8 +743,8 @@ alt="Figure 8: Although it is inverted, scaled, y-shifted, and centered on zero
745
743
<figcaption>Although it is inverted, scaled, y-shifted, and centered on zero, you will certainly recognize our beloved bell-curve: the Gaussian. Note that we will only use the right half to proceed.</figcaption><br>
746
744
747
745
I still have a hard time to associate anything maths-related to the words `nugget` and `sill`: they could equally well be some ancient greek letters spelled out in a non-greek way, such as `σίγμα`.
748
-
Historically, they stem from what I think were the earliest applications of variogram-like analysis, as my colleague Hans Van Calster confirmed me when reviewing this tutorial:
749
-
> nugget comes from "gold" nugget in mining. In sampling gold, the chances of finding a nugget of gold from adjacent locations may differ a lot - hence they have a large "nugget" effect (large differences at very small distances).
746
+
Historically, they stem from what I think were the earliest applications of variogram-like analysis, as my colleague Hans Van Calster confirmed me when reviewing this tutorial:
747
+
\> nugget comes from "gold" nugget in mining. In sampling gold, the chances of finding a nugget of gold from adjacent locations may differ a lot - hence they have a large "nugget" effect (large differences at very small distances).
750
748
We have to accept that they are frequently encountered in the variogram literature.
751
749
752
750
- The `nugget` is the value our function takes at the zero intercept, i.e. baseline variance, i.e. the lowest difference we can get (often defined by measurement uncertainty).
I initially had trouble fitting this function, because I simplified (leaving out `nugget` and `nu`); the version above is quite flexible to fit our variogram.
985
983
Note that the function is not defined at zero, which is why I filter `NA`.
986
-
Sigma (\\(\sigma\\)) is related to the turning point of the Matérn function.
987
984
The Matérn implementation does not allow decreasing or oscillating semivariance (sometimes seen in real data), but on the other hand decreasing semivariance would griefly violate Tobler's observation.
988
985
986
+
Note that there are different definitions of the Matérn range and the parameter `sigma`.
987
+
Make sure you know which range your toolbox of choice is reporting.
988
+
Here and below, I will report the range that is determined by `sigma` in the above function definition.
989
+
That sigma (\\(\sigma\\)) is related to the turning point of the Matérn function.
990
+
989
991
Any regression function demands a specific plot function:
990
992
991
993
```r
@@ -1476,17 +1478,17 @@ Thank you for reading!
1476
1478
1477
1479
# References
1478
1480
1479
-
Byrd, Richard H., Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. 1995. "A Limited Memory Algorithm for Bound Constrained Optimization." *SIAM Journal on Scientific Computing* 16 (5): 1190--1208. <https://doi.org/10.1137/0916069>.
1481
+
Byrd R.H., Lu P., Nocedal J. & Zhu C. (1995). A Limited Memory Algorithm for Bound Constrained Optimization.SIAM Journal on Scientific Computing 16 (5): 1190--1208. <https://doi.org/10.1137/0916069>.
1480
1482
1481
-
Cressie, Noel. 1993. *Statistics for Spatial Data*. John Wiley & Sons.
1483
+
Cressie N. (1993). Statistics for spatial data. John Wiley & Sons.
1482
1484
1483
-
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. "An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach." *Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 73 (4): 423--98. <https://doi.org/10.1111/j.1467-9868.2011.00777.x>.
1485
+
Lindgren F., Rue H. & Lindström J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423--498. <https://doi.org/10.1111/j.1467-9868.2011.00777.x>.
1484
1486
1485
-
Matheron, Georges. 1962. *Traité de Géostatistique Appliquée*. Memoires du Bureau de Recherches Geologiques et Minieres, Editions Technip, Paris.
1487
+
Matheron G. (1962). Traité de géostatistique appliquée. No. 14. Memoires du Bureau de Recherches Geologiques et Minieres, Editions Technip, Paris.
1486
1488
1487
-
Nelder, J. A., and R. Mead. 1965. "A Simplex Method for Function Minimization." *The Computer Journal* 7 (4): 308--13. <https://doi.org/10.1093/comjnl/7.4.308>.
1489
+
Nelder J.A. & Mead R. (1965). A Simplex Method for Function Minimization.The Computer Journal 7 (4): 308--313. <https://doi.org/10.1093/comjnl/7.4.308>.
1488
1490
1489
-
Zhu, Ciyou, Richard H. Byrd, Peihuang Lu, and Jorge Nocedal. 1997. "Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization." *ACM Transactions on Mathematical Software* 23 (4): 550--60. <https://doi.org/10.1145/279232.279236>.
1491
+
Zhu C., Byrd R.H., Lu P. & Nocedal J. (1997). Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software 23 (4): 550--560. <https://doi.org/10.1145/279232.279236>.
I initially had trouble fitting this function, because I simplified (leaving out `nugget` and `nu`); the version above is quite flexible to fit our variogram.
1061
1060
Note that the function is not defined at zero, which is why I filter `NA`.
1062
-
Sigma ($\sigma$) is related to the turning point of the Matérn function.
1063
1061
The Matérn implementation does not allow decreasing or oscillating semivariance (sometimes seen in real data), but on the other hand decreasing semivariance would griefly violate Tobler's observation.
1064
1062
1063
+
Note that there are different definitions of the Matérn range and the parameter `sigma`.
1064
+
Make sure you know which range your toolbox of choice is reporting.
1065
+
Here and below, I will report the range that is determined by `sigma` in the above function definition.
1066
+
That sigma ($\sigma$) is related to the turning point of the Matérn function.
1067
+
1065
1068
1066
1069
Any regression function demands a specific plot function:
0 commit comments