Chapter 13 Sampling designs and estimation in forest inventory
In Chapters 11 and 12, we focused on inference using sample data collected under a SRS design. In subsequent chapters, we step beyond SRS and introduce several additional probability sampling designs commonly used in forestry. Recall that a sampling design is a set of rules for selecting units from a population and for using the resulting data to generate statistically valid estimates for parameters of interest. Our focus is on four main designs that are treated in detail in the chapters that follow: 1) systematic, 2) stratified, 3) cluster sampling, and 4) multistage sampling. These designs differ in how they configure and select observations and in the estimators they prescribe. In most cases, a design is chosen to improve efficiency from a statistical, logistical, or combined standpoint.
In addition to these sampling designs, we also introduce estimation methods that incorporate auxiliary information. These methods—such as ratio and regression estimation—are not sampling designs themselves. Instead, they provide ways to improve precision by combining the sample data with information from additional variables available for all units in the population. These methods appear repeatedly in later chapters and are used in conjunction with multiple designs.
As discussed in Section 12.1, forest inventory typically relies on an areal sampling frame, where sampling locations are identified by placing points within the frame’s spatial extent. Under plot sampling, a location might mark the center of a circular plot, a corner of a rectangular plot, or an anchor point for more complex plot shapes. Under point sampling, a location might serve as the point from which a forester projects the discerning angle to identify measurement trees. Unless stated otherwise, the sampling designs developed in subsequent chapters can be used with either plot or point sampling—that is, the design governs the selection of sampling locations and associated estimators, while a separate set of rules determines how trees are selected around each location.
As in other mensuration texts (Kershaw et al. 2016; Burkhart, Avery, and Bullock 2018), we present the estimators associated with each design. The text by Iles (Iles 2003) is especially valuable for readers seeking practical guidance: it expands on these methods, discusses their implementation in the field, and provides real-world perspectives and advancements. Our added contribution is to show how these estimators can be implemented using tidyverse tools and other R packages, while also building intuition for their application. Emphasis is placed on developing computing workflows that support efficient and repeatable analysis, which are carried forward throughout the design-specific chapters that follow.
13.1 A gentle introduction to sf
Because sampling in forestry is inherently spatial, it’s helpful to work with data structures that explicitly represent geographic location and geometry. In R, the primary tool for handling spatial vector data is the sf package.
If you’ve used GIS software before, many of the underlying concepts will feel familiar. Spatial vector data come in three main types: points (e.g., sampling locations), lines (e.g., transects or travel paths), and polygons (e.g., stands, management units, or forest extents). The sf package represents each of these as a special kind of data frame with an attached geometry column that stores the spatial information.
You can work with sf objects much like regular data frames, using familiar tidyverse tools such as filter(), select(), and mutate(). Throughout the chapters that follow, we’ll introduce key sf data structures and functions as needed. For additional background, see the sf documentation at https://r-spatial.github.io/sf or the package vignette via vignette("sf1").