This book is a work in progress and is Open Review. We want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right hand corner of the page

Chapter 12 Estimating forest parameters

Building on statistical concepts developed in Chapter 11, this chapter covers the basic steps and considerations needed to estimate forest parameters from a sample. The chapter closes with a section that embeds forest parameter estimation within a highly efficient tidyverse workflow, leveraging dplyr and tidyr functions covered in Chapters 7 and 8. We generalize and expand on these concepts and programming tools to accommodate more advanced forestry applications in Chapter 13.

12.1 Sampling units for forestry applications

Collecting a sample used to estimate forest parameters is typically accomplished in three steps.117 First, sample observation locations, selected using some random mechanism, are placed within the forest. Second, a rule is used to select measurement trees around each location. Third, for each location, tree measurements are expanded to the desired per unit area basis and summarized to represent one observation within the sample. The most common rules for selecting measurement trees are plot sampling and point sampling.

Plot sampling is applied in many forestry applications, particularly in long-term monitoring efforts. A fixed-area plot is positioned at each location and trees in the plot boundary are measured. A location can define the center for a circular plot, corner for a rectangular plot, or position for more complex plot shapes and configurations such as plot clusters or nested plots of different sizes. Under the plot sampling rule, all trees in the population have equal probability of being measured.

Point sampling is also a very common approach to tree selection and a popular method used to cruise timber. Under point sampling, the probability of selecting a tree in the population for measurement is conditional on some function of its size. For this reason, the approach is often called probability proportional to size (PPS) sampling. Depending on the survey objectives, compared with plot sampling, point sampling can offer greater efficiency from a field effort and statistical standpoint. In general, there are gains in efficiency when trees are preferentially selected for measurement based on a size attribute related to the parameter of interest. For example, if we’re interested in estimating volume, it’s more efficient to select measurement trees proportional to their basal area (because basal area and volume are strongly related). However, if we’re interested in estimating tree density (i.e., number of trees per unit area) or growth rates, plot sampling would be the preferred approach.

The next section connects forestry applications with concepts from Chapter 11. Specifically, we introduce the areal sampling frame, which takes the place of the list sampling frame introduced in Section 11.2.2. For the applications covered here, the areal sampling frame is used for both plot and point sampling. Then, because it’s conceptually easier, we introduce plot sampling, expansion factors, boundary effects, and other related topics. This is followed by point sampling and how expansion factors and other sampling considerations change under this tree selection rule.

12.2 Areal sampling frame

Recall from Section 11.2.2 that SRS and related probability sampling methods require a sampling frame from which the sample is selected. In the examples presented in the previous chapter, the sampling frame is a list of all population units from which sampling units are randomly selected. For instance, in the Harvard Forest biomass analysis described in Section 11.3.5, the sampling frame was constructed by overlaying a grid of equal-sized squares across the forest extent (Figure 11.3). This grid delineated the \(N\) population units, and the sampling frame list was defined by the unit indices from 1 through \(N\). Sampling units were then selected by drawing \(n\) random integers within that range. This is a valid and intuitive approach to sample selection.

Geographic information systems (GIS) or R’s spatial data packages (see, e.g., Pebesma and Bivand (2023); Brunsdon and Comber (2019)) can be used to grid, or tessellate, the geographic extent of a population using equal-area, non-overlapping shapes such as rectangles as in the Harvard Forest example. Creating such grids is straightforward when you’re at a computer. However, logistical challenges, such as locating sampling unit boundaries and measuring trees within selected units in the field, often make these approaches impractical in forestry applications.

A more common approach in forestry is to use an areal sampling frame. This frame allows for the selection of any point location defined by geographic coordinates (e.g., longitude and latitude) within the defined extent of the population. This extent is typically delineated by one or more polygons representing the geographic boundary. Sampling locations are then identified by placing points at random within the defined area.

Clearly, selecting sampling units from a list sampling frame is different from selecting a set of points that define sampling locations from the infinite number of points within an area. Fortunately, much of the statistical theory we rely on to yield valid inference is shared between list and areal sampling frames. However, using an areal sampling frame brings additional survey design, data collection, and analysis considerations, which are covered in this and subsequent chapters.

Using an areal sampling frame for SRS introduces both statistical and logistical challenges. When sampling locations are randomly distributed, there’s a real possibility that sampling plots will spatially overlap, particularly when using fixed-area plot sampling, if locations fall too close together. This can result in the same trees being measured at multiple locations, which may introduce spatial redundancy in the sample and reduce statistical efficiency, especially when variables of interest exhibit spatial autocorrelation. While this doesn’t violate the assumptions of SRS or the validity of estimators, it can lead to less informative samples. Logistically, it can also be inefficient to navigate to a random set of widely dispersed locations in the field, especially in forested or rugged terrain. To address these issues, forest inventories often rely on systematic sampling designs in which sampling locations are spaced evenly across the landscape. To preserve the properties of design-based inference, the systematic grid is positioned using a random mechanism. See Figure 1.2 for illustration and Section 13.2 on Systematic Random Sampling for further details.

Despite these practical considerations, we continue to use SRS in this chapter for both plot and point sampling because the concepts introduced, such as defining sampling units, expanding tree-level measurements, and summarizing observations, are foundational and independent of the specific sampling design. Moreover, several of the more complex sampling designs introduced in Chapter 13 adopt or extend estimators originally developed under SRS. The ideas we establish here provide the conceptual grounding needed to understand and implement more advanced forest sampling strategies that follow.

12.2.1 Finite population correction

While we emphasized the role of the finite population correction (FPC) in Section 11.3, it’s rarely applied in forest inventory practice. The FPC is relevant when sampling is conducted from a list sampling frame, where each sampling unit is discrete and identifiable. The correction reduces the standard error of the mean because, once selected, a unit is no longer eligible for reselection.

In contrast, when an areal sampling frame is used, the number of possible sampling locations is effectively infinite, and there is no predefined list of distinct units. Instead, sampling locations can occur anywhere within the geographic area that defines the population. Because of this, the conditions required for applying the FPC do not hold, and the correction is not used when estimating standard errors for samples selected from an areal sampling frame.

12.3 Plot sampling

As noted at the beginning of this chapter, plot sampling is a common approach for collecting forest inventory data used to estimate population parameters. Fixed-area plots, often circular or rectangular, are positioned at each sampling location selected using an areal sampling frame. Then all trees within the plot boundary, belonging to the population of interest, are measured. The criteria for when a tree is “in” the plot are typically defined in the field measurement protocol—for example, a standing tree is measured if the center of its stem at DBH is within the plot boundary.

The survey objectives define the population(s) of interest. For example, in a timber cruise, we might measure only marketable trees—live stems of a certain DBH, species, and log quality. In an Emerald Ash Borer (EAB) forest health assessment, we might measure live and dead ash species. In a regeneration survey, we might measure seedlings and saplings of desirable species.

In practice, we’re almost always interested in estimating parameters for multiple populations. Sometimes the frequency of trees in different populations varies across an area. For example, say we’re interested in assessing regeneration potential and timber volume in a naturally regenerated, uneven-aged stand. These two stand characteristics, regeneration and overstory, can be viewed as separate populations. Such stands typically have more seedlings and saplings than larger overstory trees on a per unit area basis. So, an ideal large plot size for overstory trees would likely result in excessive time spent measuring regeneration. Conversely, an ideal small plot for regeneration would include too few overstory trees to accurately estimate timber volume with a feasible number of plots.

The solution is to co-locate plots sized appropriately for each population of interest, often by nesting smaller plots within larger ones. This might look like a 1/1000-th acre regeneration plot nested within a 1/20-th acre overstory plot. Smaller nested plots are often referred to as subplots.

When using plot sampling, all trees have equal probability of selection for measurement. More specifically, a tree’s selection probability equals the plot area on which it’s measured divided by the geographic extent of the population. These ideas are further developed in Section 12.3.1 and illustrated in Section 12.3.6. They’re key to expanding tree measurements to a per unit area basis (e.g., per acre or hectare) and computing plot-level summaries that represent one observation within the sample.

12.3.1 Expansion factors and estimates

In most forestry applications, estimates are expressed on a standard per unit area basis (i.e., acre or hectare). This means that at some point in the analysis we need to scale tree measurements to the desired per unit area basis. For example, recall in Section 11.1.5 we were interested in estimating the average biomass (Mg/ha) on the Harvard Forest. However, because our sampling units (i.e., plots) were 0.25 ha, we needed to first multiply the total tree biomass measured on each 0.25 ha unit by 4 (i.e., 1/0.25) to express them on the desired per ha basis. Here, the value 4 is referred to as an expansion factor. An expansion factor is a value used to scale measurements expressed on one area basis to another area basis. In the Harvard Forest analysis, the individual tree measurements were summarized to the 0.25 ha plot in advance and the expansion factor was simply applied to the plot-level summaries to arrive at the desired per ha expression.

Two perspectives of measurement tree selection using a fixed-area circular plot. (a) Plot-centered, where the circle identifies the plot boundary. (b) Tree-centered, where circles identify each tree's inclusion zone with solid and dashed lines corresponding to measurement and non-measurement trees, respectively.

FIGURE 12.1: Two perspectives of measurement tree selection using a fixed-area circular plot. (a) Plot-centered, where the circle identifies the plot boundary. (b) Tree-centered, where circles identify each tree’s inclusion zone with solid and dashed lines corresponding to measurement and non-measurement trees, respectively.

To better understand expansion factors, it’s instructive to consider two different perspectives for measurement tree selection. In the preceding section, our description of measurement tree selection takes a plot-centered perspective, i.e., as illustrated in Figure 12.1(a), a plot boundary is established at a sampling location, then those trees in the plot are measured. Alternatively, we can take a tree-centered perspective where an imaginary plot, identical in shape and size to the plot intended for the sampling location, is established using each tree’s location. The tree’s imaginary plot is called its inclusion zone and is illustrated in Figure 12.1(b).118 Then, all trees with inclusion zones containing the sampling location are measured. Notice in Figure 12.1 that both the plot-centered and tree-centered perspectives result in the same set of measurement trees.

The \(j\)-th tree’s measurements are scaled to the per unit area using the tree expansion factor, or tree factor (TF) for short, defined as \[\begin{equation} \text{TF}_j = \frac{\text{unit area}}{\text{inclusion zone area}_j}. \tag{12.1} \end{equation}\] In most applications, the unit area numerator is 43,560 (ft\(^2\)/ac) or 10,000 (m\(^2\)/ha), and the denominator is the \(j\)-th tree’s inclusion zone area in ft\(^2\) or m\(^2\). Regardless of the measurement system or chosen base units, the TF is the number of trees per unit area the given measurement tree represents.

As illustrated in Figure 12.1, under plot sampling, a tree’s inclusion zone equals the plot area used to sample the population to which the tree belongs. Hence, following (12.1), all trees measured on a given plot size will have the same TF.

We’re careful to note “population” in the preceding paragraph because it’s often the case that different populations are measured using different plot sizes within the same survey effort. For example, following the description of overstory and regeneration plots in Section 12.3, a tree measured on the 1/20-th acre (0.05 ac or 2178 ft\(^2\)) fixed-area overstory plot will have a 0.05 acre inclusion zone and represent \(\text{TF} = 43560/2178 = 20\) trees per acre. Similarly, a seedling or sapling measured on the 1/1000-th acre (0.001 ac or 43.56 ft\(^2\))119 subplot will have a 0.001 acre inclusion zone and represent \(\text{TF} = 43560/43.56 = 1000\) trees per acre.

The per unit area expansion factor for the \(j\)-th tree’s continuous or binary variable measurement (e.g., basal area, volume, biomass, logs, live/dead) is computed using \[\begin{equation} y_j \left(\text{units}/\text{unit area}\right) = y_j \left(\text{units}/\text{tree}\right) \cdot \text{TF}_j \left(\text{trees}/\text{unit area}\right), \tag{12.2} \end{equation}\] where \(y_j\) represents the variable measurement in the given units.

For example, say the \(j\)-th tree measured on a 0.05 acre plot had a volume \(v_j\) equal to 10.4 ft\(^3\). Given the tree’s TF\(_j\) equals 20 trees per acre, its expanded volume is \[\begin{align*} v_j \left(\text{ft}^3/\text{acre}\right) &= v_j \left(\text{ft}^3/\text{tree}\right)\cdot \text{TF}_j \left(\text{trees}/\text{acre}\right)\\ &= 10.4 \cdot 20 \\ &= 208, \end{align*}\] with each term’s units added in parentheses.

Following (12.2), a general formula to expand the \(j\)-th tree’s DBH to basal area (BA) per unit area is \[\begin{equation} \text{BA}_j \cdot \text{TF}_j = c\cdot\text{DBH}^2_j \cdot \text{TF}_j, \tag{12.3} \end{equation}\] where \(c\) = 0.005454 if DBH is measured in inches and BA is expressed in ft\(^2\), or \(c\) = 0.00007854 if DBH is measured in cm and BA is expressed in m\(^2\).120

So, for example, if the \(j\)-th tree measured on a 0.05-acre plot had a DBH of 10 inches, its basal area per acre is \[\begin{align*} \text{BA}_j \left(\text{ft}^2/\text{acre}\right) &= \left(c\cdot \text{DBH}^2\right) \left(\text{ft}^2/\text{tree}\right)\cdot \text{TF}_j \left(\text{trees}/\text{acre}\right)\\ &= 0.005454 \cdot 10^2 \cdot 20 \\ &= 10.908. \end{align*}\]

For a given sampling location, the estimate for number of trees per unit area is the sum of the measurement trees’ TF, i.e., \(\sum^m_{j=1}\text{TF}_j\), where \(m\) is the number of measurement trees. For plot sampling, this summation reduces to \(m\!\cdot\!\text{TF}\), where TF is the constant tree expansion factor for all trees measured on the plot.121

Similarly, a sampling location’s per unit area estimate for other variables is obtained by summing the per unit area contributions of its measured trees. Following the examples above, a plot’s estimates of volume and basal area per unit area are given by \(\sum^m_{j=1}v_j \cdot \text{TF}_j\) and \(\sum^m_{j=1}c \cdot \text{DBH}_j^2 \cdot \text{TF}_j\), respectively. We refer to the process of expanding and summing tree measurements within each plot as the plot-level summary.

Taking the tree-centered perspective has several advantages. First, as we’ve just demonstrated, it aids in introducing tree expansion factors. Second, from a practical computing standpoint, it leads to a single analytical workflow developed in Section 12.6 that yields population parameter estimates for plot and point sampling data. Third, it helps make clear the connection between forest sampling based on an areal sampling frame and broader survey sampling theory (see, e.g., Grosenbaugh (1958), Oderwald (1981), Roesch, Green, and Scott (1993), and Gregoire and Valentine (2007)). Fourth, as explored in Section 12.3.3, it helps us more easily recognize and understand statistical issues that can arise when implementing survey designs, along with possible remedies.

12.3.2 Sample-based estimates

For \(n\) sampling locations, the corresponding \(n\) per unit area plot-level summaries of a given variable (see Section 12.3.1) serve as the sample observations used to estimate population parameters via the estimators described in Section 11.3. Following the notation introduced in Section 11.3, the \(n\) plot-level summaries are denoted as \(y_i\), for \(i = 1, 2, \ldots, n\), and are used to compute the sample mean, standard error of the mean, and corresponding confidence intervals as given in Equations (11.15), (11.21), and (11.31), respectively.

Estimates for population totals, e.g., total basal area or total volume, are derived from the per unit area estimates by multiplying the sample mean by the total area of the forest. This is consistent with the approach in Section 11.3.3, where population size \(N\) is replaced by forest area \(A\), expressed in the same units as the per unit area estimates.

Computation of estimates for several variables is illustrated in Section 12.3.6.

12.3.3 Boundary effects

The validity of plot and point sampling requires that the probability of selecting a tree for measurement is proportional to the tree’s inclusion zone. If any portion of a tree’s inclusion zone falls outside the forest boundary, then that tree has a lower than expected probability of selection. This reduced probability can result in biased estimates, referred to as boundary slopover bias. This bias can, and should, be corrected in the field using methods found in most forest measurement books (see, e.g., Burkhart, Avery, and Bullock (2018), Kershaw et al. (2016), and Gregoire and Valentine (2007) for an in-depth tour of methods proposed to correct boundary slopover bias).

Below, we review the mirage and walkthrough methods. Both are easy to implement and common in practice. They effectively correct for those trees with inclusion zones that extend beyond the boundary. We introduce the methods using circular plots, but they can also be applied for non-circular plots.

The mirage method was proposed by Schmid (1969).122 The method can be applied when the boundary near the plot in question is straight (or made up of straight segments with well-defined corners) and you are able to work outside the boundary. The method is implemented using the following steps.

  1. Establish the plot center. Measure all trees that fall inside the plot and are also inside the boundary. If any portion of the plot is over the boundary, then implement Step 2.
  2. Determine the bearing with the shortest distance \(d\) from the plot center to the boundary. Establish a “mirage” (i.e., temporary) plot center along the bearing at distance \(d\) outside the boundary. From that mirage plot center, measure all trees that fall inside the mirage plot and also inside the boundary. Add these mirage plot measurement trees to the measurement trees recorded in Step 1.

Figure 12.2 shows three fixed-area circular plots with increasing amounts of boundary overlap. Following Step 2 above, the mirage sampling locations are established at a distance \(d\) outside the boundary. The shaded regions within each sampling location plot show how many times each measurement tree should be recorded. Notice that trees falling within the overlap between the sampling location plot and the mirage plot are recorded twice.

Three fixed-area circular plots with increasing boundary overlap. Boundary indicated by the dashed line. Mirage plots outlined with dotted line.

FIGURE 12.2: Three fixed-area circular plots with increasing boundary overlap. Boundary indicated by the dashed line. Mirage plots outlined with dotted line.

Consider Figure 12.3 to build some intuition for how the mirage method works. Like Figure 12.1, it shows measurement tree selection from both a plot- and tree-centered perspective. Figure 12.3(a) shows a plot with boundary overlap and two measurement trees. Following the mirage method, Tree 1 is recorded twice because it falls within the region where the mirage plot overlaps the sampling location plot. Tree 2 is recorded only once because it doesn’t fall within the overlap. Double counting Tree 1 effectively corrects for the fact that its inclusion zone is reduced by the portion extending beyond the boundary.

Given this notion, consider the corresponding tree-centered view in Figure 12.3(b), where a mirage tree inclusion zone has been established for each measurement tree. For a given tree, notice the inclusion zone area outside the boundary is equal to the mirage inclusion zone area inside the boundary—effectively correcting the inclusion zone area reduction, or making it whole.

It might help to think of the portion of a tree’s inclusion zone area that extends beyond the boundary as being folded back onto itself along the boundary. The sampling location falls within Tree 1’s inclusion zone and inclusion zone correction region; hence, that tree is recorded twice. For Tree 2, the sampling location falls within its inclusion zone but not its inclusion zone correction region; hence, it’s recorded only once.

Plot-centered and tree-centered view of the mirage method using two example trees.

FIGURE 12.3: Plot-centered and tree-centered view of the mirage method using two example trees.

The walkthrough method was developed by Ducey, Gove, and Valentine (2004). This method follows the same basic principles as the mirage method but is more general and works well even if the boundary near the plot in question is not straight, or if you are not able to work outside the boundary. The method is implemented using the following steps.

  1. Establish the plot center. Measure all trees that fall inside the plot and are also inside the boundary.
  2. Identify each tree in the plot that is closer to the boundary than to the plot center. For each of these trees, take the bearing from the plot center to the tree and also measure the distance \(d\) from the plot center to the tree. Then, following this bearing, walk from the tree for distance \(d\). The point at distance \(d\) from the tree (i.e., where you stop walking) is called the “walkthrough point.” If the walkthrough point is outside the boundary, then the tree is recorded twice; otherwise, the tree is recorded only once. It doesn’t matter how many times you cross the boundary on the way to the walkthrough point—it’s whether the walkthrough point is inside or outside the boundary that determines if the tree is recorded once or twice. It’s only necessary to walk to the walkthrough point if it’s not obvious whether the point is inside the boundary.

Consider the example in Figure 12.4. Here, a fixed-area circular plot center falls near a boundary. Six trees fall within the plot and boundary. Each tree’s walkthrough point is also identified in the figure. In practice, only Trees 1, 2, 5, and 6 would likely be considered using the walkthrough method, since Trees 3 and 4 are clearly closer to the plot center than to the boundary. Trees 1 and 6 have walkthrough points outside the boundary, so they’re recorded twice. The other trees have walkthrough points within the boundary and are recorded only once.

Fixed-area circular plot indicated by the solid line that overlaps a boundary indicated by the dashed line. Six trees fall in the plot. Arrows indicate direction and distance to walkthrough points denoted with asterisked tree number.

FIGURE 12.4: Fixed-area circular plot indicated by the solid line that overlaps a boundary indicated by the dashed line. Six trees fall in the plot. Arrows indicate direction and distance to walkthrough points denoted with asterisked tree number.

12.3.4 Slope corrections

Forest parameters are reported on a horizontal land area basis (e.g., tons per hectare or volume per acre). Horizontal areas are delineated on the horizontal plane. The land surface, with all its interesting terrain features, can be projected onto the horizontal plane; however, this projection results in some distortion that we must consider. The key issue here is that a fixed-area plot established on a slope (i.e., oblique plane) has a smaller area when projected onto the horizontal plane. Hence, tree factors computed for horizontal plot areas should not be applied to trees measured on sloped plots. Plots that fall on a slope require slope correction prior to use in estimation.

Let’s build some intuition about the need for slope correction. Say we’re using a fixed-area circular plot with radius \(R\) to conduct our cruise. If this circular plot is established on an oblique plane with slope \(\alpha\), the plot shape when projected onto the horizontal plane becomes an ellipse with major radius \(R\), minor radius \(R^\ast = \cos(\alpha)R\), and area \(\pi RR^\ast\) (which is smaller than \(\pi R^2\)). This relationship is illustrated in Figure 12.5(a), where the upper sloped plot area is a circle with radius \(R\) that becomes an ellipse when projected down onto the horizontal plane. Importantly, the area of this projected ellipse is smaller than \(\pi R^2\)—hence, applying the tree factor computed assuming an area of \(\pi R^2\) will cause downward bias in resulting estimates.

Illustration of how fixed-area plot dimensions change when projected between the horizontal (lower) and oblique (upper) planes. (a) Plot area when projecting a circular plot on an oblique plane onto the horizontal plane. (b,c) Examples used to find the oblique plane plot’s radius and critical distance given \(R\) and slope angle \(\alpha\).

FIGURE 12.5: Illustration of how fixed-area plot dimensions change when projected between the horizontal (lower) and oblique (upper) planes. (a) Plot area when projecting a circular plot on an oblique plane onto the horizontal plane. (b,c) Examples used to find the oblique plane plot’s radius and critical distance given \(R\) and slope angle \(\alpha\).

There are a few approaches for slope correction. The most general is to ensure that all measurements are made using horizontal distance, not the distance along the oblique plane. For example, if you had a circular plot that falls on an oblique plane with slope \(\alpha\), the plot shape becomes an ellipse with minor axis radius \(R\) perpendicular to the slope, major axis radius \(C = R/\cos(\alpha)\) parallel with the slope, and area \(\pi R C\). Once projected to the horizontal plane, this elliptical plot has the desired plot area of \(\pi R^2\) (i.e., \(C = R\) when \(\alpha = 0\)). This projection and associated dimensions are illustrated in Figure 12.5(b).

In practice, we don’t actually lay out an elliptical slope plot—that would be painful. Rather, we check if each tree around the sloped plot center is within the corresponding horizontal plot radius. If a tree is within the horizontal plot radius, then it’s a measurement tree.

Checking if a tree is “in” or “out” of the elliptical plot is straightforward once you recognize that it amounts to computing a right triangle’s hypotenuse length with known angle \(\alpha\) and adjacent side \(R\), then comparing the hypotenuse length to the distance from the plot center to the tree in question. We’ll call this hypotenuse \(C\), the tree’s critical distance. If the slope distance from the plot center to the tree is smaller than the tree’s critical distance, then it’s a measurement tree.

A tree’s critical distance is determined by the slope between the plot center and the tree’s location. The \(j\)-th tree has critical distance \(C_j = R/\cos(\alpha_j)\), where \(\alpha_j\) is the slope from the plot center to the center of the tree’s stem (slope measured parallel to the ground at the tree’s stem center). This slope can be found using a clinometer or laser rangefinder that is able to calculate slope.

For example, consider Figure 12.5(c), and say we’d like to compute a critical distance for a tree positioned along the direction indicated by line \(C_1\). Using a clinometer, you find the slope \(\alpha_1\) is 30 degrees. The cruise protocol calls for a circular plot with \(R = 26.33\) (ft) (the horizontal distance radius of a 1/20-th acre plot). The tree’s critical distance is then \(C_1 = R/\cos(\alpha_1) = 26.33/\cos(30) = 30.4\) (ft).123 So, the tree must be within \(30.4\) (ft) of the plot center (measured along the slope) to be a measurement tree.

Now say the next tree is positioned along the \(C_2\) line. Using a clinometer, you find the slope \(\alpha_2\) is 40 degrees. The tree’s critical distance is \(C_2 = R/\cos(\alpha_2) = 26.33/\cos(40) = 34.37\) (ft).

The slope correction approach described above is general and works even when the slope is not constant across the plot area. Many modern laser rangefinders report horizontal distance and hence automatically correct for slope. Using such a device, simply measure the horizontal distance from the plot center to each possible measurement tree. Those trees with horizontal distance (to the tree center) less than \(R\) are measurement trees. These results will match the time- and math-intensive approach of finding each tree’s critical distance. This approach is also the most commonly used in national forest inventories and similar inventories where accuracy and repeatability are key.

There are two other approaches to slope correction often seen in practice.

  1. Radius adjustment uses a larger circular plot on the slope that, once projected to the horizontal plane, equals \(\pi R^2\) (Bryan 1956).

  2. Tree factor adjustment uses a circular plot with radius \(R\) on the slope but computes a plot-specific tree factor that accounts for the resulting smaller plot area when projected onto the horizontal plane (Beers 1969).

Both approaches require a measurement that captures the overall plot slope.

To apply the radius adjustment approach, establish a circular plot on the slope with radius \(R^\ast = R/\sqrt{cos(\alpha)}\). This circular plot defined by radius \(R^\ast\) on the slope will be an ellipse with area \(\pi R^2\) when projected onto the horizontal plane. This works because the inclusion zone area of the projected ellipse equals that of the intended plot on the horizontal plane. This approach might be a good option for a timber cruise as it allows you to keep a constant radius, which greatly simplifies identifying measurement trees using a tape from the plot center. Foresters using this approach often rely on a table that provides a scaling factor for \(R\) given different slope ranges.

For example, if your cruise protocol calls for a circular plot with \(R=26.33\) (ft) (horizontal distance) and the plot falls on a \(\alpha = 30\) (degree) slope, the adjusted radius is \(R^\ast = R/\sqrt{\cos(\alpha)} = 28.3\) (ft) or using the equivalent scaling factor \(R^\ast = r^\ast R\) where \(r^\ast = 1/\sqrt{cos(\alpha)} = 1.07\). To create your own scaling factor table to take into the field, compute \(r^\ast\) for a range of slopes (see Exercise 12.12).

To apply the tree factor adjustment approach, establish a circular plot on the slope with radius \(R\) as if it were on the horizontal plane. Because the area of this slope plot is smaller than \(\pi R^2\) when projected onto the horizontal plane, the tree factor for measurement trees needs to increase accordingly. Specifically, the tree factor for measurement trees on the slope plot is \(TF^\ast = TF/\cos(\alpha)\), where \(TF = \text{unit area}/(\pi R^2)\) (following (12.1)) or equivalently \(TF^\ast= \text{unit area}/(\pi \cos(\alpha) R^2)\), where the denominator is the area of the horizontally projected ellipse, i.e., the blue area in Figure 12.5(a).

This approach is also easy to implement, but it requires that you record all plot slopes so you can compute corresponding tree factors after the cruise. As an example, say your horizontal plot radius is \(R=26.33\) (ft), which results in a \(TF = 43560 / (\pi 26.33^2) = 20\). The tree factor for trees measured on the oblique plot with a slope of 30 (degrees) and \(R=26.33\) (ft) is \(TF^\ast = TF/\cos(\alpha) = 23.09401\).

12.3.5 Effect of plot size on variance

Plot size, i.e., the area covered by the sampling unit, affects variance and, in turn, inferences that involve variance such as confidence intervals and sample size calculations. For many variables measured in forestry applications, the variance of measurements collected on small plots is greater than the variance of measurements collected on larger plots. For example, the variance in biomass measurements (Mg/ha) taken on 0.25 ha plots will be larger than the variance in biomass measurements (Mg/ha) on 0.5 ha plots.

Large plots tend to have smaller variance because they average over small-scale horizontal forest structure caused by factors such as competition, harvesting, windthrow, disease, or fire. The effect of plot size on variance diminishes as stand structure becomes more uniform. For example, we expect a plantation to have a weaker plot size to variance relationship compared with structurally complex mixed-species uneven-aged stands.

Freese (1962) offers the following equation to approximate the relationship between plot size and variance. \[\begin{equation} s_2^2 = s_1^2 \sqrt{\frac{a_1}{a_2}}, \tag{12.4} \end{equation}\] where plot size \(a_1\) has variance \(s^2_1\) and plot size \(a_2\) has variance \(s^2_2\). One can replace the variances in (12.4) with their corresponding coefficients of variation (i.e., \(CV_1 = s^2_1/\bar{y}_1\) and \(CV_2 = s^2_2/\bar{y}_2\)).

For example, following (12.4), if the variance of pulpwood volume per acre on 1/4 acre plots is \(s_1^2 = 50\), the variance in pulpwood volume per acre on a 1/10 acre plot is approximately \[\begin{align*} s_2^2 &= 50\sqrt{\frac{0.25}{0.1}}\\ & = 79.06. \end{align*}\]

Although plot size is often chosen based on experience or an existing field sampling protocol, the relationship between plot size and expected variance can provide an approach to select plot size and sample size based on allowable error and cost constraints. Cost comes into play here because “time is money”—to reach the same standard error, there are trade-offs between sampling more small plots versus fewer large plots.

For example, if travel time between many small plots is large relative to the time needed to take measurements on fewer large plots, one would favor a sampling design where fewer large plots are selected.

12.3.6 Illustration

Figure 12.6 shows a toy dataset that we’ll use to illustrate estimation via plot sampling. The figure shows a forest area (1.32 ac) that delineates tree populations of interest. The forest is divided into two stands (Stand 1 is 0.64 ac and Stand 2 is 0.68 ac), and within each a distinction is made between overstory and regeneration trees. This partitioning allows us to define at most four populations: Stand 1 overstory trees, Stand 1 regeneration trees, Stand 2 overstory trees, and Stand 2 regeneration trees.

These are simulated data, meaning we created them, and as a result we have measurements on all trees, i.e., a census. These measurements allow us to compute the population parameter values of interest that are given in Table 12.1. For instructional purposes, we can compare these parameter values to their corresponding estimates—something we can’t do when sampling a real population.124

Toy dataset used to illustrate sample-based estimation.

FIGURE 12.6: Toy dataset used to illustrate sample-based estimation.

TABLE 12.1: Parameters for the population shown in Figure 12.6.
Overstory
Regeneration
Stand Trees/ac Basal area (ft\(^2\)/ac) Volume (ft\(^3\)/ac) Trees/ac
1 31.965 26.364 739.74 319.65
2 34.150 13.018 299.98 341.50

The sample comprises trees measured on six overstory plots with accompanying regeneration subplots. Overstory and regeneration plots are circular fixed-areas with 24 ft and 6.8 ft radii, respectively. Plot sampling locations were selected at random (i.e., SRS) from each stand’s areal sampling frame and serve as the overstory plot centers. Regeneration subplots were located 12 ft east of overstory plot centers.125

On overstory plots, species, DBH (in), and height (ft) for all trees with DBH \(\ge\) 2 in were measured. Additionally, volume (ft\(^3\)) was estimated for measurement trees using allometric equations provided in Honer (1967). On regeneration subplots, the number of trees by species with height greater than 2 ft and DBH \(<\) 2 in were recorded.

A sample from the populations shown in Figure 12.6. The sample comprises three fixed-area circular overstory plots randomly located within each stand. Regeneration subplots are nested within overstory plots.

FIGURE 12.7: A sample from the populations shown in Figure 12.6. The sample comprises three fixed-area circular overstory plots randomly located within each stand. Regeneration subplots are nested within overstory plots.

This illustration steps through the calculations used to estimate number of trees, basal area, and volume per acre, as well as their stand totals. To keep the focus on these steps, we’ll initially limit our analysis to the overstory plot data in Stand 1. Later in Section 12.6, we develop an efficient workflow for both stands and apply it to estimate parameters for both the overstory and regeneration layers.

Table 12.2 provides overstory tree measurements for the three Stand 1 plots. Notice in Figure 12.7, there are no overstory trees on Plot 2 in Stand 1. It’s critical that the absence of trees at a sampling location be included in subsequent estimates—meaning a plot with no trees is part of the sample, reflects a characteristic of the population, and needs to be included as a zero when computing population parameter estimates.

We include a line for Plot 2 in Table 12.2 with zero DBH and volume values to remind us to include these values in subsequent computations.

TABLE 12.2: Stand 1 overstory tree measurements for inventory data shown in Figure 12.7. Zero values indicate plots where no overstory trees were measured.
Plot DBH (in) Volume (ft\(^3\))
1 11.3 17.8
1 9.8 14.5
1 10.7 17.9
2 0.0 0.0
3 14.8 33.6
3 15.4 36.6
3 13.1 28.9

We divide the estimation process into two steps.126 First, compute plot-level summaries for each variable. The plot-level summaries comprise each plot’s expanded and summarized tree measurements expressed on a per unit area basis (these are the sample observations). Second, use the plot-level summaries to compute the desired population parameter estimates.

Computing the plot-level summaries begins with calculating the TF used by all overstory trees measured on the 24 ft radius circular fixed-area plot. Following (12.1), the TF is \[\begin{align*} \text{TF} \left(\text{trees}/\text{ac}\right) &= \frac{\text{Unit area} \left(\text{ft}^2/\text{ac}\right)}{\text{Inclusion zone area} \left(\text{ft}^2/\text{tree}\right)}\\[0.5ex] &= \frac{43560}{\pi\cdot R^2} = \frac{43560}{\pi\cdot 24^2}\\ &= 24.07219, \end{align*}\] where \(R\) is the plot radius. This tree factor tells us that each tree measured on an overstory plot represents 24.07 trees per acre.

Next, we compute trees per acre for each of the \(n\)=3 plots. Referring to Table 12.2 to get the number of trees on each plot, the plot-level trees per acre are \[\begin{align*} \text{Plot 1:}&\; \sum^3_{j=1}\text{TF}_j = 3\cdot \text{TF} = 72.21656 \left(\text{trees}/\text{ac}\right),\\ \text{Plot 2:}&\; \text{TF}\cdot 0 = 0\left(\text{trees}/\text{ac}\right),\\ \text{Plot 3:}&\; \sum^3_{j=1}\text{TF}_j = 3\cdot \text{TF} = 72.21656 \left(\text{trees}/\text{ac}\right), \end{align*}\] where \(j\) is the tree index. Notice in the calculations above, because the TF is a constant we pull it out of the summation and drop its subscript.

Using DBH measurements in Table 12.2 and following (12.3), the plot-level basal area per acre is \[\begin{align*} \text{Plot 1:}&\; \sum^3_{j=1}c\!\cdot\!\text{DBH}^2_j\!\cdot\!\text{TF}_j = c\cdot (11.3^2\!+\!9.8^2\!+\!10.7^2)\!\cdot\!\text{TF}\\[-0.5em] &\qquad\qquad\qquad\quad\,\, = 44.4048 \left(\text{ft}^2/\text{ac}\right),\\ \text{Plot 2:}&\; \text{TF}\!\cdot\!0 = 0 \left(\text{ft}^2/\text{ac}\right),\\ \text{Plot 3:}&\; \sum^3_{j=1}c\!\cdot\!\text{DBH}^2_j\!\cdot\!\text{TF}_j = c\cdot\!(14.8^2\!+\!15.4^2\!+\!13.1^2)\!\cdot\!\text{TF}\\[-0.5em] &\qquad\qquad\qquad\quad\,\, = 82.42499 \left(\text{ft}^2/\text{ac}\right), \end{align*}\] where \(c = 0.005454\) and \(DBH_j\) is the \(j\)-th tree’s DBH. Like TF, the constant \(c\) can be pulled out of the summation.

Using volume measurements in Table 12.2 and following (12.2), the plot-level volume per acre is \[\begin{align*} \text{Plot 1:}&\; \sum^3_{j=1}v_j\!\cdot\!\text{TF}_j = (17.8\!+\!14.5\!+\!17.9)\!\cdot\!\text{TF}\\[-0.5em] &\qquad\qquad\;\,\,\, = 1208.42369 \left(\text{ft}^3/\text{ac}\right),\\ \text{Plot 2:}&\; \text{TF}\!\cdot\!0 = 0\left(\text{ft}^3/\text{ac}\right),\\ \text{Plot 3:}&\; \sum^3_{j=1}v_j\!\cdot\!\text{TF}_j = (33.6\!+\!36.6\!+\!28.9)\!\cdot\!\text{TF}\\[-0.5em] &\qquad\qquad\;\,\,\, = 2385.55355 \left(\text{ft}^3/\text{ac}\right), \end{align*}\] where \(v_j\) is the \(j\)-th tree’s volume.

For reference, we collected the plot-level summaries of number of trees, basal area, and volume per acre computed above into Table 12.3. Given the \(n\)=3 sample observations in Table 12.3, you’re back in familiar territory for computing SRS means, standard deviations, standard errors, and confidence intervals—all covered in Section 11.3.

TABLE 12.3: TABLE 12.4: Stand 1 overstory plot-level summary for inventory data shown in Figure 12.7.
Plot Trees/ac Basal area (ft\(^2\)/ac) Volume (ft\(^3\)/ac)
1 72.2166 44.4048 1208.4237
2 0 0 0
3 72.2166 82.425 2385.5535

The code below computes each variable’s mean (11.15), standard error of the mean (11.21), and 90% confidence interval. Recall as discussed in Section 12.2.1, when we select sampling locations using an areal sampling frame, we don’t apply the FPC when computing the standard error of the mean—this is an important distinction.

n <- 3 # Sample size.

t <- qt(p = 1 - 0.1 / 2, df = n - 1) # t-value for 90% CI.

# Trees per acre.
trees_per_ac <- c(72.2166, 0, 72.2166)
y_bar_trees <- mean(trees_per_ac)
s_y_bar_trees <- sd(trees_per_ac) / sqrt(n)
ci_trees <- c(y_bar_trees - t * s_y_bar_trees,
              y_bar_trees + t * s_y_bar_trees)

y_bar_trees # Mean number of trees per acre.
#> [1] 48.144
ci_trees # Confidence interval for mean number of trees per acre. 
#> [1] -22.146 118.435
# Basal area per acre.
ba_per_ac <- c(44.4048, 0, 82.425)
y_bar_ba <- mean(ba_per_ac)
s_y_bar_ba <- sd(ba_per_ac) / sqrt(n)
ci_ba <- c(y_bar_ba - t * s_y_bar_ba, 
           y_bar_ba + t * s_y_bar_ba)

y_bar_ba # Mean basal area per acre.
#> [1] 42.277
ci_ba # Confidence interval for basal area per acre. 
#> [1] -27.271 111.824
# Volume per acre.
vol_per_ac <- c(1208.4237, 0, 2385.5535)
y_bar_vol <- mean(vol_per_ac)
s_y_bar_vol <- sd(vol_per_ac) / sqrt(n)
ci_vol <- c(y_bar_vol - t * s_y_bar_vol, 
            y_bar_vol + t * s_y_bar_vol)

y_bar_vol # Mean volume per acre.
#> [1] 1198
ci_vol # Confidence interval for volume per acre.
#> [1] -812.91 3208.90

The mean and associated confidence interval for each variable computed in the code above are collected in Table 12.5.

TABLE 12.5: TABLE 12.6: Stand 1 overstory mean estimates along with associated confidence intervals (CI) for inventory data shown in Figure 12.7 and recorded in Table 12.2.
\(\bar{y}\) 90% CI
Trees/ac 48.144 (-22.146, 118.435)
Basal area (ft\(^2\)/ac) 42.277 (-27.271, 111.824)
Volume (ft\(^3\)/ac) 1197.992 (-812.913, 3208.898)

Stand 1 total and associated confidence interval for each variable can be computed using (11.22) and (11.25). These estimators for the total are modified by replacing population size \(N\) with stand area in acres \(A\) = 0.64 (as demonstrated in the Harvard Forest biomass analysis toward the end of Section 11.3.5).

Total estimates are given in Table 12.7. Importantly, notice that obtaining totals and associated confidence intervals is as easy as scaling mean per acre estimates given in Table 12.5 by \(A\)—a simple and direct computation.

TABLE 12.7: TABLE 12.8: Stand 1 overstory total estimates along with associated confidence intervals (CI) for inventory data shown in Figure 12.7 and recorded in Table 12.2.
\(\hat{t}\) 90% CI
Trees 30.812 (-14.173, 75.798)
Basal area (ft\(^2\)) 27.057 (-17.454, 71.568)
Volume (ft\(^3\)) 766.715 (-520.264, 2053.695)

Notice the lower confidence interval bound for a few of the estimated mean and total parameters is negative. This should seem odd—after all, quantities like tree density, basal area, and volume are strictly positive. How can we have negative trees? Is such a thing possible? No. Is such a thing statistically possible? Yes.

This apparent contradiction arises from the limitations of normal approximation methods in small samples or high-variance settings. For further discussion on how to interpret such results—and why they should prompt careful reflection and defensive programming practices—see Section 12.5.

12.4 Point sampling

Like plot sampling, introduced in Section 12.3, point sampling is a rule for selecting measurement trees around a sampling location. The Austrian forester Walter Bitterlich introduced point sampling in the forestry literature under the German name “Winkelsählprobe,” which translates into English as “angle count sampling” (Bitterlich 1952). Bitterlich named his method angle count sampling because basal area per unit area can be estimated by simply counting trees selected using an angle from a sampling location (Bitterlich 1952, 1984).

Bitterlich’s ingenious method is most commonly used in timber cruises because of its ease of application, time efficiency, and flexibility to meet a range of inferential objectives. The time efficiency and flexibility come from the fact that you can match information collection effort with the desired level of inference. For example, if you’re only interested in estimating mean basal area per unit area, then a simple count (i.e., tally) of measurement trees across sampling locations is all that’s needed. If you’d like to estimate a confidence interval for mean basal area per unit area, simply keep track of the number of trees tallied at each sampling location. Estimating other variables such as volume, biomass, or trees per unit area typically requires additional measurements such as DBH. However, in some special cases, you can even estimate volume or similar variables from a tally (see Section 12.4.6).

Shortly after Bitterlich’s 1952 publication, the American forester Lewis R. Grosenbaugh helped popularize the method, which he coined point sampling, among American foresters (Grosenbaugh 1952; Grosenbaugh and Stover 1957).127 Grosenbaugh (1958) generalized point sampling to estimate tree variables beyond basal area, e.g., volume, biomass, and density, and connected the method to existing probability proportional to size (PPS) sample survey theory. A few years later, Palley and Horwitz (1961) further detailed point sampling’s theoretical underpinnings by providing statistical proofs that mean and variance estimators are unbiased for several common sampling designs.

Under point sampling, a tree’s inclusion zone area—and hence its selection probability—is a function of a chosen characteristic. Here, the characteristic that determines a tree’s selection probability is typically related to its size, e.g., DBH or height. This is why point sampling is a PPS method.

Timber cruises are typically conducted to estimate stand value and, for most stands, timber value is concentrated in large diameter trees. From a time efficiency standpoint, it’s logical that we spend our valuable field time measuring those trees that contribute most to parameters of interest, e.g., timber volume with associated standard error. This is the primary motivation and result of point sampling: when selection probability scales with tree size, large trees are measured more frequently than small trees.

This section focuses on horizontal point sampling, which uses a tree’s basal area to determine its selection probability. It’s called horizontal point sampling because the forester projects the discerning angle horizontally from the sampling location toward each tree. Related sampling methods such as vertical point sampling, horizontal line sampling, and vertical line sampling are summarized in Kershaw et al. (2016). We focus on horizontal point sampling because it’s the most common in forest inventory and because its general concepts are transferable to related sampling methods.

Point sampling example with three trees around a sampling location. Trees are identified using filled circles with diameter equal to the tree's DBH. The orange cross indicates the sampling location. (a) Fixed and known angle $\theta$ is projected from the sampling location and used to identify measurement (3), borderline (2), and non-measurement (1) trees. (b) Unfilled circles identify each tree's inclusion zone with $R_1$, $R_2$ and $R_3$ being the inclusion zone radius for Trees 1, 2, and 3, respectively. Inclusion zones delineated with solid lines identify measurement and borderline trees. Inclusion zones delineated with dashed lines identify non-measurement trees.

FIGURE 12.8: Point sampling example with three trees around a sampling location. Trees are identified using filled circles with diameter equal to the tree’s DBH. The orange cross indicates the sampling location. (a) Fixed and known angle \(\theta\) is projected from the sampling location and used to identify measurement (3), borderline (2), and non-measurement (1) trees. (b) Unfilled circles identify each tree’s inclusion zone with \(R_1\), \(R_2\) and \(R_3\) being the inclusion zone radius for Trees 1, 2, and 3, respectively. Inclusion zones delineated with solid lines identify measurement and borderline trees. Inclusion zones delineated with dashed lines identify non-measurement trees.

Figure 12.8(a) illustrates a horizontal point sampling measurement tree selection rule. From a sampling location, the forester projects a known and fixed angle \(\theta\) toward each tree’s bole at breast height. Given a tree’s DBH and distance from the sampling location (i.e., the angle’s vertex), the angle may appear:

  1. wider than the tree bole, in which case the tree is not measured;
  2. the same width as the tree bole (i.e., the angle’s rays are tangent to the bole), in which case the tree is considered a borderline tree and a measurement determination will take some more effort, as we’ll describe below;
  3. narrower than the tree bole, in which case the tree is measured.

If we apply this rule to the three trees in Figure 12.8(a), Tree 1 is not measured, Tree 2 is a borderline tree, and Tree 3 is measured. Here again, depending on the inferential objectives, “measured” might imply that you simply tally (i.e., add the tree to a tree count) or take detailed measurements of interest (e.g., DBH, height).

The angle gauge, relascope, and wedge prism are the most common tools used in practice to project the angle \(\theta\) from the sampling location to discern a tree’s measurement status (i.e., is the tree “in” or “out” of the tally). The angle gauge consists of a small piece of metal of known width, held a known distance from the forester’s eye, which is positioned over the sampling location. The relascope, developed by Walter Bitterlich specifically for point sampling, is a sophisticated and multipurpose tool. The forester sights through the relascope’s eyepiece with their eye positioned over the sampling location. The relascope automatically corrects for slope (see Section 12.4.4) and allows the user to choose among various angles to maximize cruise efficiency (see Section 12.4.3). The wedge prism, introduced by Bruce (1955), is a ground and calibrated prism that refracts light at the desired angle. Unlike the angle gauge and relascope, the wedge prism—not the forester’s eye—is held over the sampling location. See Burkhart, Avery, and Bullock (2018) or similar practical guides to learn how these tools are used.

Let’s begin building some intuition about point sampling by connecting measurement tree selection using an angle with a tree’s inclusion zone as developed in Section 12.3. Recall that under plot sampling, a tree is selected for measurement if its inclusion zone contains a sampling location (see, e.g., Figure 12.1(b)). Also, under plot sampling, all trees have the same inclusion zone area that is equal to the plot area used in the survey. Because all trees have the same inclusion zone, they have equal probability of selection for measurement and, following (12.1), the same tree factor.

Under horizontal point sampling, a tree’s inclusion zone area—and hence its selection probability—is a function of its DBH. This notion is illustrated in Figure 12.8(b), where a tree’s circular inclusion zone area, defined by its radius (\(R\)), is proportional to its DBH. As in plot sampling, a tree is selected for measurement if its inclusion zone contains a sampling location. However, unlike plot sampling, when a tree’s inclusion zone is a function of its DBH, its probability of selection is proportional to its DBH.

A tree’s inclusion zone radius (\(R\)) is called its limiting distance. This is because if the distance between the sampling location and the tree center is less than or equal to the tree’s limiting distance, then the tree is measured. If the distance between the sampling location and the tree center is greater than the tree’s limiting distance, then the tree is not measured.

This relationship can be seen in Figure 12.8(b). Here, in particular, notice the borderline tree’s limiting distance, \(R_2\), equals the distance between the sampling location and the tree’s center. Figure 12.9 provides a closer look at a borderline tree. At the borderline, the tree is the maximum distance from the sampling location and still measured. Notice in Figure 12.9, if the tree’s DBH is smaller or the tree is farther from the sampling location, the angle would be wider than the tree—hence, it would not qualify for measurement.

Horizontal point sampling example of a borderline tree.

FIGURE 12.9: Horizontal point sampling example of a borderline tree.

In practice, when projecting the angle from the sampling location, if there’s uncertainty about a tree’s status (i.e., is it a measurement tree or not), then additional measurements are needed to make a correct determination. This is done by measuring the tree’s DBH, computing its limiting distance, and checking that distance against the distance from the sampling location to the tree’s center.

Even without competing brush or other factors that impair line of sight, it’s often difficult—even for experienced foresters—to identify borderline trees using an angle gauge, prism, or relascope. See, for example, work by Kim Iles and Fall (1988), who assessed professional cruisers’ skill in discerning borderline trees using a wedge prism.128

Incorrect status determination can introduce bias, so it’s worth the extra time to conduct the limiting distance calculation and measurement needed for a correct determination.

For a fixed angle \(\theta\), the ratio between a tree’s DBH and its inclusion zone radius \(R\) is a constant \(k\). When tree DBH is in inches and plot radius \(R\) is in feet, the constant \(k\) in feet is \[\begin{equation} k = \frac{\text{DBH (in)}}{12 R \text{ (ft)}}. \tag{12.5} \end{equation}\] Similarly, when DBH is in centimeters and plot radius \(R\) is in meters, the constant \(k\) in meters is \[\begin{equation} k = \frac{\text{DBH (cm)}}{100 R \text{ (m)}}. \tag{12.6} \end{equation}\] Regardless of measurement system, the constant \(k = 2 \sin\left(\frac{\theta}{2}\right)\).

If you know \(k\) (which is a function of the angle used) and you measure a tree’s DBH, then you compute its limiting distance in feet as \[\begin{equation} R = \frac{\text{DBH (in)}}{12 k \text{ (ft)} } \tag{12.7} \end{equation}\] or in meters as \[\begin{equation} R = \frac{\text{DBH (cm)}}{100 k \text{ (m)} }. \tag{12.8} \end{equation}\]

12.4.1 Expansion factors and per unit area estimates

Recall from Section 12.3.1, the tree factor is the number of trees per unit area a given measurement tree represents. For the \(j\)-th measurement tree, (12.1) defines the tree factor as \[\begin{equation} \text{TF}_j = \frac{\text{unit area}}{\text{inclusion zone area}_j}, \tag{12.9} \end{equation}\] where, depending on your measurement system, the unit area numerator is 43,560 (ft\(^2\)/ac) or 10,000 (m\(^2\)/ha), and the denominator is the \(j\)-th tree’s inclusion zone area in ft\(^2\) or m\(^2\).129

For horizontal point sampling, the inclusion zone is circular with area equal to \(\pi R_j^2\). Because the \(j\)-th tree’s inclusion zone radius \(R_j\) depends on the tree’s DBH, its tree factor is DBH dependent.130

Using radius \(R_j\) defined in (12.7), the TF for the English system is \[\begin{equation*} \text{TF}_j = \frac{43560}{\pi R^2_j} = \frac{43560}{\pi \left(\text{DBH}_j/(12k) \right)^2} = \frac{43560k^2}{ \left(\pi/144\right)\text{DBH}^2_j}, \end{equation*}\] then scaling the numerator and denominator by 1/4 gives us the more revealing and convenient \[\begin{equation} \text{TF}_j = \frac{(1/4)43560k^2}{(1/4)\left(\pi/144\right)\text{DBH}^2_j} = \frac{10890k^2}{0.005454 \text{DBH}^2_j} = \frac{10890k^2}{\text{BA}_j}. \tag{12.10} \end{equation}\] where BA\(_j\) is the \(j\)-th tree’s basal area.

Similarly, using the definition of \(R_j\) from (12.8), the TF for the metric system is \[\begin{equation*} \text{TF}_j = \frac{10000}{\pi R^2_j} = \frac{10000}{\pi \left(\text{DBH}_j/(100k) \right)^2} = \frac{10000k^2}{ \left(\pi/10000\right)\text{DBH}^2_j}, \end{equation*}\] then scaling the numerator and denominator by 1/4 gives us \[\begin{equation} \text{TF}_j = \frac{(1/4)10000k^2}{(1/4)\left(\pi/10000\right)\text{DBH}^2_j} = \frac{2500k^2}{0.00007854 \text{DBH}^2_j} = \frac{2500k^2}{\text{BA}_j}. \tag{12.11} \end{equation}\]

As developed in Section 12.3.1, (12.2) defines the per unit area expansion factor for the \(j\)-th tree’s continuous or binary variable measurement (e.g., basal area, volume, biomass, logs, live/dead) as \[\begin{equation} y_j \left(\text{units}/\text{unit area}\right) = y_j \left(\text{units}/\text{tree}\right) \cdot \text{TF}_j \left(\text{trees}/\text{unit area}\right), \tag{12.12} \end{equation}\] where \(y_j\) represents the variable measurement in the given units.

It’s instructive to first consider a tree’s basal area expansion, which is called the basal area factor (BAF). Following (12.9) and (12.10), the English system BAF (with units included in parentheses) is \[\begin{align} \text{BAF}_j \left(\text{ft}^2/\text{acre}\right) &= \text{BA}_j \left(\text{ft}^2/\text{tree}\right) \cdot \text{TF}_j \left(\text{trees}/\text{acre}\right)\nonumber\\ &= \cancel{\text{BA}_j} \cdot \left(\frac{10890k^2}{\cancel{\text{BA}_j}}\right)\nonumber\\ &= 10890k^2. \tag{12.13} \end{align}\] and following (12.9) and (12.11), the metric system BAF is \[\begin{align} \text{BAF}_j \left(\text{m}^2/\text{ha}\right) &= \text{BA}_j \left(\text{m}^2/\text{tree}\right) \cdot \text{TF}_j \left(\text{trees}/\text{ha}\right)\nonumber\\ &= \cancel{\text{BA}_j} \cdot \left(\frac{2500k^2}{\cancel{\text{BA}_j}}\right)\nonumber\\ &= 2500k^2. \tag{12.14} \end{align}\]

Notice in (12.13) and (12.14), BAF depends only on the constant \(k\). As discussed earlier, \(k\) is determined by the angle \(\theta\) used to conduct the cruise.131 Therefore, given \(\theta\), the basal area per unit area represented by each measurement tree is constant. This is a remarkable result! It means that regardless of the measurement tree’s DBH, that tree represents the chosen BAF’s basal area per unit area.

By extension, this means you don’t need to measure a tree’s DBH to determine its contribution to estimating basal area per unit area—all you need to know is whether the tree is a measurement tree or not (i.e., is the tree “in” or “out”)!

For example, say your chosen angle \(\theta\) results in a BAF = 10 (ft\(^2\)/acre). From the result above, we know that each measurement tree (regardless of its DBH) represents 10 ft\(^2\) of basal area per acre. If your chosen angle \(\theta\) results in a BAF = 4 (m\(^2\)/ha), then each measurement tree represents 4 m\(^2\) of basal area per hectare (again, regardless of its DBH).

Expanding variables other than basal area requires the tree’s tree factor, which means you need to measure its DBH and compute its basal area. The \(j\)-th tree’s tree factor is computed using (12.10) or (12.11), or the mathematically equivalent expression based on the chosen BAF: \[\begin{equation} \text{TF}_j = \frac{\text{BAF}}{\text{BA}_j}. \tag{12.15} \end{equation}\]

For example, say you’re using a BAF = 10 (ft\(^2\)/acre) and the \(j\)-th measurement tree has a DBH of 16 in and volume of 45.8 (ft\(^3\)).132 Calculating the \(j\)-th tree’s volume per acre is a two-step process.

First, compute the tree’s tree factor using (12.15) as follows. \[\begin{align*} \text{TF}_j \left(\text{trees}/\text{acre}\right) &= \frac{\text{BAF} \left(\text{ft}^2/\text{acre}\right)}{\text{BA}_j \left(\text{ft}^2/\text{tree}\right)}\\ &= \frac{10}{0.005454\cdot 16^2}\\ &= 7.16217. \end{align*}\]

Then, scale the tree’s volume by its tree factor. \[\begin{align*} v_j \left(\text{ft}^3/\text{acre}\right) &= v_j \left(\text{ft}^3/\text{tree}\right)\cdot \text{TF}_j \left(\text{trees}/\text{acre}\right)\\ &= 45.8 \cdot 7.16217 \\ &= 328. \end{align*}\]

So, that one measurement tree represents a volume of 328 ft\(^3\)/acre.

For a given sampling location (i.e., sampling point), the estimated basal area per unit area is the number of measurement trees times the BAF, i.e., \(m \cdot \text{BAF}\) where \(m\) is the number of measurement trees. The estimated number of trees per unit area is the sum of the measurement trees’ tree factors, i.e., \(\sum^m_{j=1}\text{TF}_j\). Similarly, the sampling location’s per unit area estimate for other variables is the sum of the given variable’s per unit area tree measurements. Following the examples above, a sampling location’s volume per unit area estimate is \(\sum^m_{j=1}v_j\cdot \text{TF}_j\).

We refer to the process of expanding and summing tree measurements for each sampling location as the point-level summary.133

12.4.2 Sample-based estimates

For \(n\) sampling locations, the corresponding \(n\) per unit area point-level summaries of a given variable (see Section 12.4.1) are the sample observations (i.e., the \(y_i\) for \(i = 1, 2,\ldots, n\)) used to estimate population parameters via estimators provided in Section 11.3.

Under point sampling, however, there is one exception to following the sample-based estimation steps laid out in Section 11.3. As we mentioned at the beginning of Section 12.4, point sampling is unique because it allows you to match information collection effort with the desired level of inference. Under point sampling, the minimum data collection effort is called a continuous tally, which means a count of measurement trees is kept across the \(n\) sampling locations (no additional information is recorded—not even how many measurement trees were observed at each sampling location).

At the end of a continuous tally cruise, you have the total number of measurement trees \(m\), which is used to compute the mean basal area per unit area estimate as \[\begin{equation} \bar{y} = \frac{m \cdot \text{BAF}}{n}. \tag{12.16} \end{equation}\]

If you’d like to estimate a confidence interval for mean basal area per unit area, then keep track of the number of trees tallied at each sampling location and compute the point-level basal area per unit area. Recognize these values as your \(y_i\)s and follow steps in Section 11.3.

Following Section 12.4.1, all variables except basal area require tree factors to compute their point-level summaries.134 Again, following notation in Section 11.3, these \(n\) point-level summaries are your \(y_i\)s used to compute the sample mean, standard error of the mean, and subsequent confidence intervals following (11.15), (11.21), and (11.31), respectively.

Estimates for totals, e.g., total basal area or volume in the cruised population, follow from the per unit area estimates as described in Section 11.3.3, with the simple change of replacing the population size \(N\) with the forest area \(A\) that is in the same units as the per unit area means.

Computation of estimates for basal area and other variables is illustrated in Section 12.3.6.

12.4.3 Selecting a basal area factor and computing limiting distances

In practice, a horizontal point sampling cruise is described in terms of the BAF, not the angle \(\theta\) that determines the BAF. For example, you might say “the cruise was conducted using a 40 BAF angle gauge (English units),” or perhaps “a 3 BAF wedge prism (metric units).”

While there are no specific rules for selecting a BAF, in practice you should select one that yields an average of 4 to 8 measurement trees per point. BAFs that yield average tree counts outside this range are not wrong—they’ll just be less efficient.

Given a desired average number of trees per point and a rough estimate of the stand’s basal area per unit area (e.g., obtained via prior experience with similar stands or from a quick preliminary cruise of a few points), select the BAF instrument (e.g., angle gauge or wedge prism) you have that’s closest to the value computed using \[\begin{equation} \frac{\text{Stand basal area per unit area}}{\text{Desired tree count per point}}. \tag{12.17} \end{equation}\]

For example, say you’d like an average of 7 measurement trees per point. Looking at the stand, you come up with a rough basal area of 150 ft\(^2\)/ac. Then, using (12.17), a reasonable BAF to cruise the stand would be \(150/7 = 21.43\), so select a BAF of 20 (which is a standard instrument size).

Based on the guidance above or an established survey protocol, a single BAF is typically selected for each population in a stand. To keep roughly the same number of measurement trees per point, you might feel tempted to change the BAF from point to point—don’t do this. It can lead to errors and bias, see, e.g., Wensel, Levitan, and Barber (1980), Hans T. Schreuder, Schreiner, and Max (1981), and Kim Iles and Wilson (1988).

You can, however, use a different BAF for different populations within a stand, especially if the populations differ substantially in their average DBH and/or basal area. For example, you might choose one BAF for large diameter sawtimber and a separate BAF for small diameter pulpwood. Of course, in such settings, it’s critical to record the BAF used for each population.

Given a BAF, you can find the constant \(k\), the tree’s inclusion zone radius (\(R\)) given its DBH, and the \(R/\text{DBH}\) ratio called the horizontal distance multiplier (HDM), which is useful for quickly figuring out a tree’s limiting distance. For the English system, these are \[\begin{align} k \left(\text{ft}\right) &= \sqrt{\frac{\text{BAF}}{10890}},\\ R \left(\text{ft}\right) &= \frac{\text{DBH}}{12k},\\ \text{HDM} \left(\frac{\text{ft}}{\text{in}}\right) &= \frac{R}{\text{DBH}} = \frac{1}{12k}, \tag{12.18} \end{align}\] and for the metric system these are \[\begin{align} k \left(\text{m}\right) &= \sqrt{\frac{\text{BAF}}{2500}},\\ R \left(\text{m}\right) &= \frac{\text{DBH}}{100k},\\ \text{HDM} \left(\frac{\text{m}}{\text{cm}}\right) &= \frac{R}{\text{DBH}} = \frac{1}{100k}. \tag{12.19} \end{align}\]

The HDM is a convenient value to have with you in the field to determine a tree’s limiting distance. As noted previously, due to competing vegetation or other complicating factors, it might be difficult to discern if a given tree is a measurement tree. In such cases, you’ll need to compare the tree’s limiting distance to the distance between the sampling location and the tree’s center.

A tree’s limiting distance is \[\begin{equation} \text{HDM}\cdot \text{DBH}, \tag{12.20} \end{equation}\] where the HDM comes from (12.18) for DBH (in) or (12.19) for DBH (cm).

For example, say you’re conducting a cruise using a metric BAF = 4 (i.e., \(k\) = 0.04) and the tree in question has a DBH of 30 cm. Then its limiting distance is \(\text{HDM} \cdot \text{DBH} = \left(1/(100 \cdot 0.04)\right) \cdot 30 = 7.5\) m.

While any angle \(\theta\) could be used for a cruise, we generally select ones that provide convenient BAFs to work with (i.e., whole numbers). A few such BAFs and their corresponding angle, constant \(k\), and HDM are provided in Tables 12.9 and 12.10 for the English and metric systems, respectively.135

TABLE 12.9: Commonly used English system basal area factors (BAFs) and associated angle, constant, and horizontal distance multiplier (HDM).
English units
BAF (ft\(^2\)/acre) Angle (minutes) Constant \(k\) (ft) HDM (ft/inch)
5 73.664 0.02143 3.89
10 104.178 0.03030 2.75
15 127.594 0.03711 2.25
20 147.336 0.04285 1.94
25 164.730 0.04791 1.74
30 180.456 0.05249 1.59
35 194.918 0.05669 1.47
40 208.380 0.06061 1.38
50 232.985 0.06776 1.23
60 255.232 0.07423 1.12
TABLE 12.10: Commonly used metric system basal area factors (BAFs) and associated angle, constant, and horizontal distance multiplier (HDM).
Metric units
BAF (m\(^2\)/ha) Angle (minutes) Constant \(k\) (m) HDM (m/cm)
1 68.756 0.02000 0.500
2 97.237 0.02828 0.354
3 119.093 0.03464 0.289
4 137.519 0.04000 0.250
5 153.754 0.04472 0.224
6 168.431 0.04899 0.204
7 181.930 0.05292 0.189
8 194.494 0.05657 0.177
9 206.296 0.06000 0.167
10 217.458 0.06325 0.158

12.4.4 Boundary overlap and slope corrections

Like plot sampling, point sampling requires corrections for boundary and slope effects. The need for these corrections is the same as those for plot sampling, namely, selection probability is lower for trees that have their inclusion zone partially outside the forest boundary, and horizontal distance, not slope distance, should be used to determine if a point center falls within a tree’s inclusion zone. The main difference between boundary overlap and slope correction approaches for plot sampling versus horizontal point sampling is that under horizontal point sampling a tree’s inclusion zone is a function of its DBH (whereas under plot sampling, a tree’s inclusion zone and hence selection probability is a function of plot area).

Under horizontal point sampling, if a measurement tree has any portion of its inclusion zone outside the forest boundary, then some approach to correcting the boundary effect should be used. In such cases, either the mirage or walkthrough method introduced in Section 12.3.3 can be applied. An in-depth look at these and other viable correction methods is presented in Kershaw et al. (2016) and Gregoire and Valentine (2007).

As in plot sampling, measurements taken to identify measurement trees are assumed to be on the horizontal plane. The relascope automatically corrects for slope while sighting possible measurement trees, so no additional effort is required to apply the correction. Slope correction using a wedge prism can be done by rotating the prism around the line of sight to match the angle of slope you are sighting along, see, e.g., USFS (1996) or a similar field methods guide.

When using an angle gauge or other devices without a straightforward slope correction, you can convert horizontal limiting distance to slope limiting distance. This calculation is the same as that performed in Section 12.3.3 to compute \(C_1\) and \(C_2\) in Figure 12.5(c), but with plot radius \(R\) replaced by the \(j\)-th tree’s limiting distance \(R_j = \text{HDM} \cdot \text{DBH}_j\), following (12.20).

12.4.5 Illustration

This illustration uses the toy forest population described in Section 12.3.6 and shown in Figure 12.6. Here, point sampling locations are placed at the fixed-area plot centers used previously to illustrate plot sampling in 12.3.6. The analysis presented in this section is the point sampling equivalent to the plot sampling analysis presented in Section 12.3.6.

Recall from the description in Section 12.3.6, these sampling locations were selected using SRS from each stand’s areal sampling frame.

Our aim is to step through the point sampling calculations to estimate per-acre basal area, number of trees, and volume, along with corresponding stand totals. A highly efficient workflow for these and additional estimates is developed in Section 12.6 and applied to data from both stands.

We chose to use an English BAF 20 to select measurement trees at each of the six sampling locations. These sampling locations and their associated measurement trees are shown in Figure 12.10.

Toy forest inventory dataset comprising two stands and three sampling locations randomly located within each stand. An English BAF 20 was used to identify measurement trees around each sampling location. Each tree's inclusion zone is added for illustration.

FIGURE 12.10: Toy forest inventory dataset comprising two stands and three sampling locations randomly located within each stand. An English BAF 20 was used to identify measurement trees around each sampling location. Each tree’s inclusion zone is added for illustration.

We limit our analysis to overstory plot data in Stand 1. Table 12.11 provides overstory tree measurements for the three Stand 1 points. Notice in Figure 12.10 there are no overstory trees on Point 2 in Stand 1. As mentioned before, it’s critical that absence of trees at a sampling location is included in subsequent estimates—meaning a sampling location with no trees is part of the sample, reflects a characteristic of the population, and needs to be included as a zero when computing population parameter estimates. We include a line for Point 2 in Table 12.11 with zero DBH and volume values to remind us to include these values in subsequent computations.

TABLE 12.11: Stand 1 overstory tree measurements for inventory data shown in Figure 12.10. Zero values indicate plots where no overstory trees were measured.
Point DBH (in) Volume (ft\(^3\))
1 10.7 17.9
1 9.8 14.5
1 11.3 17.8
2 0.0 0.0
3 13.1 28.9
3 14.8 33.6
3 15.4 36.6

Let’s begin by assuming this was a continuous tally cruise, which, recall from Section 12.4.2, means the forester only kept track of the total number of measurement trees, \(m\), across the entire cruise. Looking at Figure 12.10, or by counting the non-zero rows in Table 12.11, we can see there were \(m = 6\) measurement trees across the \(n = 3\) sampling locations in Stand 1. Then, following (12.16), the estimate for basal area per acre is \[\begin{equation*} \bar{y} = \frac{m \cdot \text{BAF}}{n} = \frac{6 \cdot 20}{3} = 40 \left(\text{ft}^2/\text{acre}\right). \end{equation*}\]

Given Stand 1’s area is \(A\) = 0.64 acres, and following (11.22), with the slight modification of replacing \(N\) with \(A\), the continuous tally estimate for total basal area for Stand 1 is \(A\bar{y} = A(m\cdot \text{BAF})/n = 0.64\cdot 40 = 26\) (ft\(^2\)).

If you want a standard error to accompany the mean or total basal area estimates, then keep track of how many measurement trees were tallied at each sampling location and compute the basal area per acre point-summaries as illustrated below.

Just like estimation using plot sampling data illustrated in Section 12.3.6, we divide the estimation process into two steps. First, compute point-level summaries for each variable. The point-level summaries comprise each point’s expanded and summarized tree measurements expressed on a per unit area basis (these are the sample observations). Second, use the point-level summaries to compute the desired population parameter estimates.

Using the measurement tree counts at each sampling location, \(m_1\) = 3, \(m_2\) = 0, and \(m_3\) = 3, and following the discussion at the end of Section 12.4.1, point-level basal area per acre summaries are \[\begin{alignat*}{2} &\text{Point 1:}&&\; m_1\!\cdot\!\text{BAF} = 3\!\cdot\!20 = 60 \left(\text{ft}^2/\text{ac}\right),\\ &\text{Point 2:}&&\; m_2\!\cdot\!\text{BAF} = 0\!\cdot\!20 = 0 \left(\text{ft}^2/\text{ac}\right),\\ &\text{Point 3:}&&\; m_3\!\cdot\!\text{BAF} = 3\!\cdot\!20 = 60 \left(\text{ft}^2/\text{ac}\right). \end{alignat*}\]

To compute the point-level summaries for all other variables, we include each measurement tree’s tree factor from (12.15). Using (12.15) and referring to Table 12.11 for each measurement tree’s DBH to compute its basal area, point-level trees per acre are \[\begin{alignat*}{2} \text{Point 1:} &\; \sum^3_{j=1}\text{TF}_j = \sum^3_{j=1}\frac{\text{BAF}}{c\cdot\text{DBH}^2_j} &&= \frac{20}{c\cdot 10.7^2} + \frac{20}{c\cdot 9.8^2} + \frac{20}{c\cdot 11.3^2}\\ & &&= 32.02929 + 38.18235 + 28.71825\\ & &&= 98.92989 \left(\text{trees}/\text{ac}\right),\\[1pt] \text{Point 2:} & &&= 0 \left(\text{trees}/\text{ac}\right),\\[1pt] \text{Point 3:} &\; \sum^3_{j=1}\text{TF}_j = \sum^3_{j=1}\frac{\text{BAF}}{c\cdot\text{DBH}^2_j} &&= \frac{20}{c\cdot 13.1^2} + \frac{20}{c\cdot 14.8^2} + \frac{20}{c\cdot 15.4^2}\\ & &&= 21.36841 + 16.74139 + 15.46228\\ & &&= 53.57208 \left(\text{trees}/\text{ac}\right), \end{alignat*}\] where \(c\) = 0.005454.

Using (12.2) and the volume measurements in Table 12.11 along with the tree factors computed above, point-level volume per acre is \[\begin{alignat*}{2} \text{Point 1:} &\; \sum^3_{j=1}v_j\cdot \text{TF}_j &&= 17.9\cdot32.02929 + 14.5\cdot38.18235 + 17.8\cdot28.71825\\ & &&= 573.32428 + 553.64415 + 511.18485\\ & &&= 1638.15329 \left(\text{ft}^3/\text{ac}\right),\\[1pt] \text{Point 2:} & &&= 0 \left(\text{ft}^3/\text{ac}\right),\\[1pt] \text{Point 3:} &\; \sum^3_{j=1}v_j\cdot \text{TF}_j &&= 28.9\cdot21.36841 + 33.6\cdot16.74139 + 36.6\cdot15.46228\\ & &&= 617.54714 + 562.5106 + 565.9193\\ & &&= 1745.97704 \left(\text{ft}^3/\text{ac}\right), \end{alignat*}\] where \(v_j\) is the \(j\)-th tree’s volume.

For reference, we collected the point-level summaries of basal area, number of trees, and volume per acre computed above into Table 12.12. Given the \(n\) = 3 sample observations in Table 12.12, you’re back in familiar territory for computing SRS means, standard deviations, standard errors, and confidence intervals covered in Section 11.3.

TABLE 12.12: TABLE 12.13: Stand 1 overstory point-level summary for inventory data shown in Figure 12.10.
Point Basal area (ft\(^2\)/ac) Trees/ac Volume (ft\(^3\)/ac)
1 60 98.9299 1638.1533
2 0 0 0
3 60 53.5721 1745.977

The code below computes each variable’s mean (11.15), standard error of the mean (11.21), and 90% confidence interval. Again, as discussed in Section 12.2.1, when we select sampling locations using an areal sampling frame, we don’t apply the FPC when calculating the standard error of the mean.

n <- 3 # Sample size.

t <- qt(p = 1 - 0.1 / 2, df = n - 1) # t-value for 90% CI.

# Basal area per acre.
ba_per_ac <- c(60, 0, 60)
y_bar_ba <- mean(ba_per_ac)
s_y_bar_ba <- sd(ba_per_ac) / sqrt(n)
ci_ba <- c(y_bar_ba - t*s_y_bar_ba, 
           y_bar_ba + t * s_y_bar_ba)

y_bar_ba #Mean basal area per acre.
#> [1] 40
ci_ba #Confidence interval for basal area per acre. 
#> [1] -18.4  98.4
# Trees per acre.
trees_per_ac <- c(98.9299, 0, 53.5721)
y_bar_trees <- mean(trees_per_ac)
s_y_bar_trees <- sd(trees_per_ac) / sqrt(n)
ci_trees <- c(y_bar_trees - t * s_y_bar_trees,
              y_bar_trees + t * s_y_bar_trees)

y_bar_trees #Mean number of trees per acre.
#> [1] 50.834
ci_trees #Confidence interval for mean number of trees per acre. 
#> [1] -32.652 134.320
# Volume per acre.
vol_per_ac <- c(1638.1533, 0, 1745.977)
y_bar_vol <- mean(vol_per_ac)
s_y_bar_vol <- sd(vol_per_ac) / sqrt(n)
ci_vol <- c(y_bar_vol - t * s_y_bar_vol, 
            y_bar_vol + t * s_y_bar_vol)

y_bar_vol #Mean volume per acre.
#> [1] 1128
ci_vol #Confidence interval for volume per acre.
#> [1] -521.4 2777.5

The mean and associated confidence interval for each variable computed in the code above are collected into Table 12.14.

TABLE 12.14: TABLE 12.15: Stand 1 point sampling overstory mean estimates along with associated confidence interval (CI) for inventory data shown in Figure 12.7 and recorded in Table 12.11.
\(\bar{y}\) 90% CI
Trees/ac 50.834 (-32.652, 134.32)
Basal area (ft\(^2\)/ac) 40.000 (-18.4, 98.4)
Volume (ft\(^3\)/ac) 1128.043 (-521.398, 2777.485)

Stand 1 total and associated confidence interval for each variable can be computed using (11.22) and (11.25). These estimators for the total are modified by replacing population size \(N\) with stand area in acres \(A\) = 0.64. Total estimates are given in Table 12.16. Importantly, notice that obtaining totals and associated confidence intervals is as easy as scaling mean per acre estimates given in Table 12.14 by \(A\).

TABLE 12.16: TABLE 12.17: Stand 1 point sampling overstory total estimates along with associated confidence interval (CI) for inventory data shown in Figure 12.7 and recorded in Table 12.11.
\(\hat{t}\) 90% CI
Trees 32.534 (-11.776, 62.976)
Basal area (ft\(^2\)) 25.600 (-20.898, 85.965)
Volume (ft\(^3\)) 721.948 (-333.695, 1777.59)

Take a moment to compare point and plot sampling estimates in Tables 12.5 and 12.7 versus Tables 12.14 and 12.16. We expect some differences because different rules are used to select measurement trees, and tree expansion factors are computed differently. In practice, point and plot sampling will yield different estimates, although generally they should be comparable. As in the plot sampling example, a few of the confidence intervals include negative lower bounds for strictly positive parameters. This occurs here due to small sample size and large variance, and reflects the limitations of symmetric approximation methods. See Section 12.5 for further discussion.

12.4.6 Per unit area estimates without measuring DBH

A key advantage to horizontal point sampling is that basal area per unit area can be estimated without measuring DBH. However, as we saw in the preceding sections, the tree factor (and hence DBH measurement) is needed to obtain estimates for other variables. There is an approach to obtaining per unit area estimates for variables that scale linearly with basal area across some measure of tree height, without measuring DBH. In such cases, a tree height measurement replaces the DBH measurement. This approach is viable when tree height is easier to obtain than DBH.

Depending on the variable of interest, the tree height measurement can take a variety of forms, e.g., total height, height to a given stem diameter, or stem segment length such as a 16 ft sawlog or 2 m bolt. The approach described in this section is often referred to as tally by height.

Variables associated with tree volume and weight often scale linearly with basal area across height. In some cases, such relationships are well described by the constant form factor equation \[\begin{equation} y_j = \alpha \text{DBH}^2_j h_j, \tag{12.21} \end{equation}\] where, for the \(j\)-th tree, \(y_j\) is the variable of interest, \(\alpha\) is a constant, and \(h_j\) is some measure of height.

Dividing both sides of (12.21) by basal area yields the following \[\begin{align} \frac{y_j}{\text{BA}_j} &= \frac{\alpha \text{DBH}^2_j h_j}{BA_j}\nonumber\\ &= \frac{\alpha \cancel{\text{DBH}^2_j} h_j}{c\cancel{\text{DBH}^2_j}}\nonumber\\ &= \beta h_j, \tag{12.22} \end{align}\] where \(c\) is the basal area constant and \(\beta = \alpha/c\). Importantly, notice in (12.22) the \(y_j\) to \(\text{BA}_j\) ratio (left side) is a function of height \(\beta h_j\) (right side). This means that if we have a tree’s height measurement (\(h_j\)) and a suitable value for \(\beta\), then we can estimate the \(y_j\) to \(\text{BA}_j\) ratio. In the forestry literature, if \(y_j\) is a measure of volume, then the volume to basal area ratio (i.e., \(y_j/\text{BA}_j\)) is called VBAR\(_j\) (pronounced “v bar”). Similarly, if \(y_j\) is a measure of weight, the ratio is called WBAR\(_j\) (pronounced “w bar”).

We typically want a per unit area estimate for \(y_j\) (i.e., \(y_j\) value expanded to per acre or hectare basis), not the \(y_j\) to \(\text{BA}_j\) ratio per unit area. To get the desired per unit area estimate, multiplying both sides of (12.22) by the BAF yields the following \[\begin{equation} \frac{\text{BAF}\cdot y_j}{\text{BA}_j} = \text{BAF}\beta h_j, \end{equation}\] which can be expressed as \[\begin{equation} y_j \cdot \text{TF}_j = \text{BAF}\beta h_j, \tag{12.23} \end{equation}\] because the tree expansion factor \(\text{TF}_j = \text{BAF}/\text{BA}_j\) (12.15). Hence, (12.23) shows you can obtain the desired per unit area estimate for \(y_j\) (i.e., \(y_j \cdot \text{TF}_j\)) given your chosen BAF, a value for \(\beta\), and a tree height measurement (via \(\text{BAF}\beta h_j\)).

For example, say you’re interested in estimating lumber volume in board-feet (bd ft) using the International 1/4 log rule per acre.136 Based on prior information, you choose a constant form factor equation that measures trees in 16 ft logs. Following (12.23), the \(j\)-th measurement tree’s volume per acre is \[\begin{align} v_j \cdot \text{TF}_j \left(\frac{\text{bd ft}}{\text{acre}}\right) &= \text{BAF} \left(\frac{\text{ft}^2}{\text{acre}}\right)\cdot \beta \left(\frac{\frac{\text{bd ft}}{\text{ft}^2}}{\text{logs}}\right)\cdot h_j \left(\text{logs}\right)\nonumber\\ &= \text{BAF} \left(\frac{\text{ft}^2}{\text{acre}}\right)\cdot \beta h_j \left(\frac{\text{bd ft}}{\text{ft}^2}\right)\nonumber\\ &= \text{BAF} \beta h_j \tag{12.24} \end{align}\]

Say you’re given a \(\beta\) value of 55.41 and you conduct your cruise using an English 20 BAF prism. If the \(j\)-th measurement tree held \(h_j = 2\) 16 ft logs, then following (12.24), that tree represents \(\text{BAF} \beta h_j = 20 \cdot 55.41 \cdot 2 = 2216.4\) bd ft per acre.

As described toward the end of Section 12.4.1, the area-expanded estimate \(y_j \cdot \text{TF}_j\) is combined with those from other measurement trees at a given sampling location to arrive at the point-level summary.

Continuing our example, say there are \(m_i\) measurement trees at the \(i\)-th sampling location and the total number of logs for those trees was \(h_i = \sum^{m_i}_{j = 1} h_{i,j}\). The point-level summary for the \(i\)-th location equals \(v_i \cdot \text{TF}_i = 20 \cdot 55.41 \cdot h_i\). Point-level summaries from \(n\) sampling locations are used to estimate population parameters following steps in Section 12.4.2.

The tally by height approach produces reasonable estimates only if the following conditions hold:

  1. The linear relationship described in (12.22) holds across the range of anticipated height values.
  2. Variability in the variable to basal area ratio is small for a given height increment.

The linear relationship, described by the slope constant \(\beta\), is highly dependent on species, local environment, and stand characteristics, e.g., age, density, disturbance history. As a result, a \(\beta\) calibrated for one species and stand might not be appropriate for a different species or stand.

Let’s take a closer look at how one might estimate a \(\beta\) for a given variable, species, and geographic region. Table 12.18 holds balsam fir (Abies balsamea) volume values published by Young (1957). Table entries, broken out by DBH and height class, summarize volume (ft\(^3\)) measurements taken on 270 trees in central Maine.

TABLE 12.18: Volume in cubic feet to a 4-inch top inside bark for balsam fir (Abies balsamea) in central Maine (Young 1957). Cells are colored by volume value.
Total height (ft)
DBH (in) 30 40 50 60 70
5 1.40 1.91 2.45
6 2.14 2.83 3.60
7 3.07 4.09 5.00 6.26
8 4.20 5.60 6.82 8.42 9.85
9 5.56 7.18 8.77 10.75 12.53
10 8.90 10.90 13.33 15.56
11 10.64 13.22 16.05 18.77
12 15.53 18.78 21.83
13 18.12 21.75 25.27
14 20.87 24.76 28.66
15 28.18 32.20

Table 12.19 holds volume to basal area ratios (VBARs) computed using volume and DBH from Table 12.18. For example, the VBAR in Table 12.19 for a 5 inch DBH and 30 foot tree is 10.27 (ft\(^3\)/ft\(^{2}\)), which, working from values in Table 12.18, is 1.40 (ft\(^3\)) divided by \(c \cdot \text{DBH}^2 = 0.005454 \cdot 5^2 = 0.13635\) (ft\(^2\)).137

TABLE 12.19: Volume to basal area ratios (VBARs) (ft\(^3\)/ft\(^2\)) for balsam fir (Abies balsamea) in central Maine computed using Table 12.18. Cells are colored by VBAR value.
Total height (ft)
DBH (in) 30 40 50 60 70
5 10.27 14.01 17.97
6 10.90 14.41 18.34
7 11.49 15.30 18.71 23.42
8 12.03 16.04 19.54 24.12 28.22
9 12.59 16.25 19.85 24.33 28.36
10 16.32 19.99 24.44 28.53
11 16.12 20.03 24.32 28.44
12 19.77 23.91 27.80
13 19.66 23.60 27.42
14 19.52 23.16 26.81
15 22.96 26.24

Table 12.19 cell color makes apparent the small variability among VBARs within a given 10 ft height class. Recall this is one data characteristic needed for the tally by height approach to produce reasonable estimates. The other data characteristic is a linear relationship between VBARs across the range of anticipated height values. Figure 12.11 is a scatter plot created from values in Table 12.19. Imposed on the scatter plot is a “best fit” line with intercept fixed at zero and slope \(\beta\).138 Visual inspection of the point scatter and line suggests a strong linear relationship exists across the height classes. The line’s slope \(\beta\) = 0.393 is used in (12.23) and would yield reasonable tally by height volume per acre estimates for balsam fir in central Maine.

Scatter plot of Table 12.19 volume to basal area ratios (VBARs) for balsam fir (Abies balsamea) in central Maine. Point colors correspond to VBAR values in Table 12.19.

FIGURE 12.11: Scatter plot of Table 12.19 volume to basal area ratios (VBARs) for balsam fir (Abies balsamea) in central Maine. Point colors correspond to VBAR values in Table 12.19.

12.5 Negative confidence interval bounds and defensive coding

In the illustrations for plot and point sampling (Sections 12.3.6 and 12.4.5, respectively) you may have noticed 90% confidence intervals for some estimates include negative lower bounds. This can happen when using normal approximations to construct confidence intervals for forest parameters like stem density, basal area, or volume—quantities that are strictly positive. When sample sizes are small or variances are large, as in the toy example here (\(n = 3\)), confidence intervals may extend into the negative range, even though such values are not meaningful for parameters that must be zero or greater.

While this issue may arise in small-sample or high-variance settings, it’s uncommon in operational forest inventory, where sample sizes are typically larger and variability lower. Still, when it does occur, it’s important to recognize the issue and consider appropriate alternatives. A simple option is to truncate the lower bound at zero, though this can distort the nominal confidence level. A better approach is to apply a transformation before constructing the interval—for example, a log transformation—which ensures that back-transformed bounds remain strictly positive. Bootstrap methods are also useful for constructing skew-aware intervals that better reflect the shape of the sampling distribution; see, e.g., , , and for further discussion.

Encounters with unexpected results—like a confidence interval suggesting an impossible value—are not just statistical curiosities. They serve as reminders to slow down and examine your analysis workflow with care. This kind of careful checking falls under the broader practice of defensive programming—a mindset in which you assume that things can go wrong and proactively code against silent failures. In statistical computing, this means not only validating inputs and intermediate results, but also verifying that outputs are consistent with the theory and the context of the problem. A negative lower confidence bound for a strictly positive parameter isn’t just a numerical oddity—it’s a prompt to reexamine your assumptions, approximations, and code. Thoughtful debugging isn’t an afterthought; it’s a critical part of doing reproducible and trustworthy data science.

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

— Brian W. Kernighan

Debugging is indeed difficult—especially in analysis workflows where silent failures can lead to plausible but incorrect results. One of the most effective strategies, as Kernighan also noted, is to pair deliberate reasoning with simple tools that make your workflow more transparent.

The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.

— Kernighan (1978)

12.6 Workflows for forest inventory data

12.6.1 Plot sampling illustration

Here we use dplyr functions to repeat and expand on the analysis presented in Section 12.3.6. Recall that the analysis used a toy dataset to illustrate plot sampling estimation methods. As described in Section 12.3.6 and illustrated in Figure 12.7, the dataset comprises two stands, each with three overstory and nested regeneration plots. The overstory and regeneration plots are circular fixed areas with 24 ft and 6.8 ft radii, respectively. Plot sampling locations were selected using SRS from each stand’s areal sampling frame and serve as the overstory plot centers.

The code below reads in the data collected on the inventory plots and prints its structure.

stands <- read_csv("datasets/two_stands_plot_data.csv")
stands %>% glimpse()
#> Rows: 30
#> Columns: 8
#> $ stand_id        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ plot_id         <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3…
#> $ tree_type       <chr> "Overstory", "Overstory", "Oversto…
#> $ tree_count      <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 2…
#> $ scientific_name <chr> "Abies balsamea", "Pinus strobus",…
#> $ DBH_in          <dbl> 11.3, 9.8, 10.7, NA, NA, 0.0, NA, …
#> $ height_ft       <dbl> 60.2, 61.1, 63.6, NA, NA, 0.0, NA,…
#> $ vol_cu_ft       <dbl> 17.8, 14.5, 17.9, NA, NA, 0.0, NA,…

Column definitions for stands are as follows.

  1. stand_id stand number within which the plot falls, takes values 1 or 2.
  2. plot_id plot number within the stand on which the measurements are recorded.
  3. tree_type type of plot on which the tree was measured, takes values Overstory or Regen.
  4. tree_count number of trees measured on the plot that have the same values for the next four columns, or zero if no trees occurred on the plot.
  5. species measurement tree species.
  6. DBH_in measurement tree DBH (in).
  7. height_ft measurement tree height (ft).
  8. vol_cu_ft measurement tree volume (ft\(^3\)).

There are a few things to notice about these data (run stands %>% print(n = nrow(.)) to see all the data).

  1. Plot numbers are unique within the stand, meaning both stand and plot must be specified to identify a plot’s set of trees.
  2. Because we’re only interested in regeneration density by species for this survey, trees on regeneration plots have NA values for their DBH, height, and volume.
  3. As you might have noticed in Figure 12.7, there are no overstory trees on Plot 2 in Stand 1. It’s critical that the absence of trees at a sampling location be included in subsequent estimates—meaning a plot with no trees is part of the sample, reflects a characteristic of the population, and needs to be included as a zero when computing population parameter estimates. Here, we chose to record plots with no trees using a tree_count value of zero.
  4. In addition to tracking plots with zero measurement trees, the tree_count column records the number of trees on a given plot that share the same species and measurements on other variables. This allows each row in the data file to represent more than one tree. It’s common in timber cruises to record DBH and number of logs to the nearest whole number, in which case there are often several overstory trees on a plot with counts greater than one. Also, including a tree count variable allows us to accommodate double-count boundary trees identified using the mirage or walkthrough method described in Section 12.3.3, minimize the number of data rows needed to record the inventory data, and build an efficient analysis workflow.

To compare results with those presented in Section 12.3.6, we derive estimates for overstory per-acre number of trees, basal area, and volume, along with corresponding stand totals. Also, following Section 12.3.6, the estimation process is done in two steps. Step 1, compute plot-level summaries for each variable. The plot-level summaries comprise each plot’s expanded and summarized tree measurements expressed on a per unit area basis. Step 2, use the plot-level summaries from Step 1 to compute the desired population parameter estimates.

The first bit of code below makes a new tibble (o_stands) that includes only rows corresponding to overstory tree measurements and a column (TF) that holds their tree factors. As you’ll see in the subsequent piped workflow, it’s convenient to have a tree factor column pre-computed.139 The tree factor equation used in mutate() follows (12.1).

The second bit of code below creates the plot-level per-acre summary for each variable. Importantly, notice we group by stand_id and plot_id so the subsequent summarize() works on trees in each plot—yielding plot-level summaries. The equations used to compute these plot-level summaries within summarize() follow (12.2) and parallel steps in Section 12.3.6. The only difference here is we scale by tree_count to accommodate how the data were recorded (i.e., tree_count records the number of trees with common species, DBH, height, and volume, or the absence of trees on a plot).

# Select overstory trees and add a TF trees/ac column.
o_stands <- stands %>% 
  filter(tree_type == "Overstory") %>%
  mutate(TF = 43560 / (pi * 24^2))

# Step 1, compute per unit area plot-level summaries.
plot_summary <- o_stands %>% 
  group_by(stand_id, plot_id) %>% 
  summarize(trees_per_ac = sum(tree_count * TF),
            ba_per_ac = sum(tree_count * TF * 0.005454 * DBH_in^2),
            vol_per_ac = sum(tree_count * TF * vol_cu_ft),
            .groups = "drop_last")

plot_summary
#> # A tibble: 6 × 5
#> # Groups:   stand_id [2]
#>   stand_id plot_id trees_per_ac ba_per_ac vol_per_ac
#>      <dbl>   <dbl>        <dbl>     <dbl>      <dbl>
#> 1        1       1         72.2      44.4      1208.
#> 2        1       2          0         0           0 
#> 3        1       3         72.2      82.4      2386.
#> 4        2       1         72.2      22.2       479.
#> 5        2       2         72.2      28.9       669.
#> 6        2       3         96.3      33.2       749.

The plot_summary values for stand_id 1 match those computed in Section 12.3.6 and displayed in Table 12.3.

Importantly, the summarize() in Step 1 above uses the default .groups = "drop_last" rule, so the resulting plot_summary is grouped by stand_id and is ready to be summarized by stand in Step 2 below.

The code below uses the plot-level summaries (plot_summary) to compute the desired populations’ parameter estimates for each variable.140 In the summarize(), we first compute stand-specific sample size n and \(t\) value for a 90% confidence interval to parallel the analysis in Section 12.3.6, then mean, standard error, and confidence interval bounds for the three variables (trees_per_ac, ba_per_ac, and vol_per_ac). The resulting per acre estimates and associated confidence intervals for Stand 1 and 2 are held in stand_estimates row 1 and 2, respectively. Below, we pass stand_estimates to glimpse() so the output fits on the page. From this printed output, you can see stand_id 1 column values match those computed in Section 12.3.6 and presented in Table 12.5. Values in the second column printed below correspond to estimates for stand_id 2.

# Step 2, compute per unit area estimates for each stand.
stand_estimates <- plot_summary %>% 
  summarize(n = n(), # Sample size.
            t = qt(p = 1 - 0.1 / 2, df = n - 1), # t-value for 90% CI.
            y_bar_trees = mean(trees_per_ac),
            s_y_bar_trees = sd(trees_per_ac) / sqrt(n),
            ci_lower_trees = y_bar_trees - t * s_y_bar_trees,
            ci_upper_trees = y_bar_trees + t * s_y_bar_trees,
            y_bar_ba = mean(ba_per_ac),
            s_y_bar_ba = sd(ba_per_ac) / sqrt(n),
            ci_lower_ba = y_bar_ba - t * s_y_bar_ba,
            ci_upper_ba = y_bar_ba + t * s_y_bar_ba,
            y_bar_vol = mean(vol_per_ac),
            s_y_bar_vol = sd(vol_per_ac) / sqrt(n),
            ci_lower_vol = y_bar_vol - t * s_y_bar_vol,
            ci_upper_vol = y_bar_vol + t * s_y_bar_vol)

stand_estimates %>% glimpse()
#> Rows: 2
#> Columns: 15
#> $ stand_id       <dbl> 1, 2
#> $ n              <int> 3, 3
#> $ t              <dbl> 2.92, 2.92
#> $ y_bar_trees    <dbl> 48.144, 80.241
#> $ s_y_bar_trees  <dbl> 24.0722, 8.0241
#> $ ci_lower_trees <dbl> -22.146, 56.810
#> $ ci_upper_trees <dbl> 118.43, 103.67
#> $ y_bar_ba       <dbl> 42.277, 28.109
#> $ s_y_bar_ba     <dbl> 23.8178, 3.2138
#> $ ci_lower_ba    <dbl> -27.271, 18.725
#> $ ci_upper_ba    <dbl> 111.824, 37.493
#> $ y_bar_vol      <dbl> 1198.0, 632.3
#> $ s_y_bar_vol    <dbl> 688.670, 79.987
#> $ ci_lower_vol   <dbl> -812.91, 398.73
#> $ ci_upper_vol   <dbl> 3208.90, 865.86

The workflow above is good, but let’s make it less verbose and more flexible. Notice in the summarize() above, the same operations are applied to each variable. Specifically, say trees_per_ac, ba_per_ac, or vol_per_ac is generically represented with x, then the operations to compute the mean and its standard error are mean(x) and sd(x)/sqrt(n), respectively. As illustrated in the code below, application of these common operations to each variable allows us to use across() (Section 7.11) to simplify the code.

In the summarize() below, the call to across() computes the mean, standard error of the mean, and confidence bounds for each variable. To simplify the code, we use a custom function called se() to compute the standard error of the mean for each variable (writing custom functions is covered in Chapter 5). The call to across() also uses the anonymous function lambda syntax described in Section 7.11, i.e., the tilde ~ and .x syntax. Providing useful names for each resulting statistic is done via the .names argument (see ?dplyr::across for details).

# Function to compute the standard error of the mean.
se <- function(y, n){
  sd(y) / sqrt(n)
}

# Alternative Step 2, compute per unit area estimates for each stand.
plot_summary %>% 
 summarize(
   n = n(), # Sample size.
   t = qt(p = 1 - 0.1 / 2, df = n - 1), # t-value for 90% CI.
   across(.cols = c("trees_per_ac", "ba_per_ac", "vol_per_ac"), 
          .fns = list(y_bar = mean,
                      s_y_bar = ~ se(.x, n),
                      ci_lower = ~ mean(.x) - t * se(.x, n),
                      ci_upper = ~ mean(.x) + t * se(.x, n)
                      ),
          .names = "{.fn}_{.col}")) %>% 
  glimpse
#> Rows: 2
#> Columns: 15
#> $ stand_id              <dbl> 1, 2
#> $ n                     <int> 3, 3
#> $ t                     <dbl> 2.92, 2.92
#> $ y_bar_trees_per_ac    <dbl> 48.144, 80.241
#> $ s_y_bar_trees_per_ac  <dbl> 24.0722, 8.0241
#> $ ci_lower_trees_per_ac <dbl> -22.146, 56.810
#> $ ci_upper_trees_per_ac <dbl> 118.43, 103.67
#> $ y_bar_ba_per_ac       <dbl> 42.277, 28.109
#> $ s_y_bar_ba_per_ac     <dbl> 23.8178, 3.2138
#> $ ci_lower_ba_per_ac    <dbl> -27.271, 18.725
#> $ ci_upper_ba_per_ac    <dbl> 111.824, 37.493
#> $ y_bar_vol_per_ac      <dbl> 1198.0, 632.3
#> $ s_y_bar_vol_per_ac    <dbl> 688.670, 79.987
#> $ ci_lower_vol_per_ac   <dbl> -812.91, 398.73
#> $ ci_upper_vol_per_ac   <dbl> 3208.90, 865.86

We finish this section by extending our workflow to estimate parameters for both overstory and regeneration populations. Importantly, the only thing we need to do is include those trees measured on the smaller regeneration plots (i.e., 6.8 ft radius plots) and their associated tree expansion factor; the rest of the workflow stays pretty much the same.

Recall the tree_type column indicates if a tree was measured on the overstory or regeneration plot using Overstory and Regen. values, respectively. Using (12.1), the code below illustrates if_else() and case_when() within mutate() to add tree_type specific tree factors (TF) to stands. Both approaches result in the same TF; we include both simply for demonstration. The last line of code below is included to double check that tree factors are correctly assigned to each tree type (again, following from Section 12.3.1, one overstory tree represents 24.1 trees per acre and one regeneration tree represents 300 trees per acre).

# Add TF trees/ac column for each tree type (plot size).
stands <- stands %>% mutate(TF = if_else(tree_type == "Overstory", 
                                         43560 / (pi * 24^2),
                                         43560 / (pi * 6.8^2))
                            )

# Or equivalently. 
stands <- stands %>% 
 mutate(TF = case_when(tree_type == "Overstory" ~ 43560 / (pi * 24^2),
                       tree_type == "Regen." ~ 43560 / (pi * 6.8^2))
         )

# Double check TF is correct for each tree type (plot size).
stands %>% group_by(tree_type) %>% distinct(TF)
#> # A tibble: 2 × 2
#> # Groups:   tree_type [2]
#>   tree_type    TF
#>   <chr>     <dbl>
#> 1 Overstory  24.1
#> 2 Regen.    300.

Now, given our tree_type-specific expansion factor in TF, we can move to the familiar two-step estimation process. Step 1: create plot-level summaries. Step 2: summarize at the stand level using the plot-level results. The code below is identical to what we used for the overstory-only analysis above, except we add tree_type as the first grouping column to produce separate overstory and regeneration plot summaries.

# Step 1, compute per unit area plot-level summaries.
plot_summary <- stands %>%
  group_by(tree_type, stand_id, plot_id) %>% 
  summarize(trees_per_ac = sum(tree_count * TF),
            ba_per_ac = sum(tree_count * TF * 0.005454 * DBH_in^2),
            vol_per_ac = sum(tree_count * TF * vol_cu_ft),
            .groups = "drop_last") # Keep groups tree_type and stand_id.

plot_summary
#> # A tibble: 12 × 6
#> # Groups:   tree_type, stand_id [4]
#>    tree_type stand_id plot_id trees_per_ac ba_per_ac
#>    <chr>        <dbl>   <dbl>        <dbl>     <dbl>
#>  1 Overstory        1       1         72.2      44.4
#>  2 Overstory        1       2          0         0  
#>  3 Overstory        1       3         72.2      82.4
#>  4 Overstory        2       1         72.2      22.2
#>  5 Overstory        2       2         72.2      28.9
#>  6 Overstory        2       3         96.3      33.2
#>  7 Regen.           1       1        600.       NA  
#>  8 Regen.           1       2        600.       NA  
#>  9 Regen.           1       3       1199.       NA  
#> 10 Regen.           2       1        900.       NA  
#> 11 Regen.           2       2        900.       NA  
#> 12 Regen.           2       3       1199.       NA  
#> # ℹ 1 more variable: vol_per_ac <dbl>

Recall basal area and volume were not recorded for trees measured on regeneration plots, hence the NA values in the regeneration rows for ba_per_ac and vol_per_ac.

Stand-level estimates can now be generated by passing plot_summary to summarize(). We could use the previous overstory analysis code unchanged; however, for brevity, below we opt to compute only stand density estimates trees_per_ac for the regeneration and overstory populations.

# Step 2, compute per unit area estimates.
stand_estimates <- plot_summary %>% 
  summarize(n = n(), # Sample size.
            t = qt(p = 1 - 0.1 / 2, df = n - 1), # t-value for 90% CI.
            y_bar = mean(trees_per_ac),
            s_y_bar = sd(trees_per_ac) / sqrt(n),
            ci_lower = y_bar - t * s_y_bar,
            ci_upper = y_bar + t * s_y_bar,
            .groups = "drop") # Grouping is no longer needed.

# Print per acre estimates we care about (trees/acre).
stand_estimates %>% 
  select(stand_id, tree_type, y_bar, ci_lower, ci_upper)
#> # A tibble: 4 × 5
#>   stand_id tree_type  y_bar ci_lower ci_upper
#>      <dbl> <chr>      <dbl>    <dbl>    <dbl>
#> 1        1 Overstory   48.1    -22.1     118.
#> 2        2 Overstory   80.2     56.8     104.
#> 3        1 Regen.     800.     216.     1383.
#> 4        2 Regen.    1000.     708.     1291.

Notice above the output includes estimates for all four populations: Stand 1 overstory, Stand 2 overstory, Stand 1 regeneration, and Stand 2 regeneration.

Above, we focused on per unit area estimates; however, we’re typically also interested in estimating totals and associated confidence intervals. Totals are computed using (11.22) and (11.25). In our current setting, these estimators for the total are modified by replacing the population size \(N\) with the stand area in acres \(A\). There are a few points in the workflow where you can add the stand specific areas, e.g., to the plot-level summaries or to the stand-level estimates. Below, we add them to the plot-level summaries and work up both the stand-level means and associated totals for trees per acre.

Begin by adding the stand specific areas to plot_summary using the case_when() function introduced in Section 7.6.

# Add a stand area column.
plot_summary <- plot_summary %>% 
  mutate(A_ac = case_when(stand_id == 1 ~ 0.64, 
                          stand_id == 2 ~ 0.68))

# Make sure A_ac is added as desired.
plot_summary %>% 
  select(stand_id, A_ac) %>%
  print(n = nrow(.))
#> Adding missing grouping variables: `tree_type`
#> # A tibble: 12 × 3
#> # Groups:   tree_type, stand_id [4]
#>    tree_type stand_id  A_ac
#>    <chr>        <dbl> <dbl>
#>  1 Overstory        1  0.64
#>  2 Overstory        1  0.64
#>  3 Overstory        1  0.64
#>  4 Overstory        2  0.68
#>  5 Overstory        2  0.68
#>  6 Overstory        2  0.68
#>  7 Regen.           1  0.64
#>  8 Regen.           1  0.64
#>  9 Regen.           1  0.64
#> 10 Regen.           2  0.68
#> 11 Regen.           2  0.68
#> 12 Regen.           2  0.68

Recall that obtaining totals and corresponding confidence intervals is as simple as scaling the parameter mean estimates by their stand-specific areas, A_ac. Below, we take advantage of the stand_id grouping and scale the per-unit-area estimates by their respective acreages within the summarize() step. That is, the code below is identical to that used to create stand_estimates above—we simply scale y_bar and s_y_bar by A_ac.

# Step 2, compute total estimates.
stand_estimates <- plot_summary %>% 
  summarize(n = n(), # Sample size.
            t = qt(p = 1 - 0.1 / 2, df = n - 1), # t-value for 90% CI.
            y_total = first(A_ac) * mean(trees_per_ac),
            s_y_total = first(A_ac) * sd(trees_per_ac) / sqrt(n), 
            ci_lower = y_total - t * s_y_total,
            ci_upper = y_total + t * s_y_total,
            .groups = "drop") # Grouping is no longer needed.

# Print total estimates we care about (total trees).
stand_estimates %>% 
  select(tree_type, stand_id, y_total, ci_lower, ci_upper)
#> # A tibble: 4 × 5
#>   tree_type stand_id y_total ci_lower ci_upper
#>   <chr>        <dbl>   <dbl>    <dbl>    <dbl>
#> 1 Overstory        1    30.8    -14.2     75.8
#> 2 Overstory        2    54.6     38.6     70.5
#> 3 Regen.           1   512.     138.     885. 
#> 4 Regen.           2   680.     481.     878.

Notice the Stand 1 Overstory estimate for total trees matches \(\hat{t}\) given in Table 12.7.

Importantly, in the code above, the mean and its standard error are scaled by first(A_ac) rather than just A_ac. This is because A_ac is a group-specific vector of length three (one for each plot), with each element being the same value (i.e., the acreage of the stand within which the given plot falls). Try running the above code without first() and you’ll see each population estimate is repeated three times—this is because A_ac is a vector of length three and summarize()’s behavior is to repeat the calculation for each element in the vector. Upon running this code, you’ll also receive a warning, which should be your first indication something went wrong. As discussed in Section 7.13, within summarize(), function calls that result in a mix of scalars and vectors is a very common error and one that’s often noticed when you end up with a warning and more output rows than anticipated.

As illustrated in this section, we’re almost always interested in estimates for multiple parameters, and often for multiple populations. In practice, measurements on multiple variables (e.g., volume, basal area, density) are commonly collected using the same sampling design, which means the same estimators can be applied to each variable using a few lines of well-written code.

12.6.2 Point sampling illustration

Here we repeat the point sampling illustration presented in 12.4.5 using a tidyverse workflow. The way in which the tree factor is computed is the only major change from the plot sampling workflow covered in Section 12.3.6.

Recall from Section 12.4.5, an English BAF 20 was used to select overstory measurement trees at three sampling locations within each stand. Those measurements are read into o_stands below, and the BAF and TF columns are added. An if_else() is used when adding the tree factor column to avoid a TF value of Inf for those points with no measurement trees (i.e., in our data, when tree_count == 0, BAF_in is NA).141 Given a measurement tree’s DBH, its TF is computed using (12.15).

BAF <- 20

o_stands <- read_csv("datasets/two_stands_point_data.csv")

o_stands <- o_stands %>% 
  mutate(BAF = BAF,
         TF = if_else(tree_count == 0,
                      0,
                      BAF/(0.005454*DBH_in^2)))

# Take a look at the new BAF and TF columns.
o_stands %>% 
  select(stand_id, point_id, tree_count, BAF, DBH_in, TF)
#> # A tibble: 13 × 6
#>    stand_id point_id tree_count   BAF DBH_in    TF
#>       <dbl>    <dbl>      <dbl> <dbl>  <dbl> <dbl>
#>  1        1        1          1    20   10.7  32.0
#>  2        1        1          1    20    9.8  38.2
#>  3        1        1          1    20   11.3  28.7
#>  4        1        2          0    20    0     0  
#>  5        1        3          1    20   13.1  21.4
#>  6        1        3          1    20   14.8  16.7
#>  7        1        3          1    20   15.4  15.5
#>  8        2        1          0    20    0     0  
#>  9        2        2          1    20    8.8  47.4
#> 10        2        2          1    20    9.3  42.4
#> 11        2        3          1    20    8.8  47.4
#> 12        2        3          1    20    6.9  77.0
#> 13        2        3          1    20    8.2  54.5

In the code below, equations used to compute these point- and stand-level summaries parallel steps in Section 12.4.5. The only difference here is we scale by tree_count to accommodate how the data were recorded (i.e., tree_count records the number of trees with common species, DBH, height, volume, or the absence of trees on a plot). As demonstrated toward the end of Section 12.6.1, we use across() to simplify the stand-level summary.

# Step 1, compute per unit area point-level summaries.
point_summary <- o_stands %>% 
  group_by(stand_id, point_id) %>% 
  summarize(ba_per_ac = sum(tree_count * BAF),
            trees_per_ac = sum(tree_count * TF),
            vol_per_ac = sum(tree_count * TF * vol_cu_ft),
            .groups = "drop_last")

point_summary
#> # A tibble: 6 × 5
#> # Groups:   stand_id [2]
#>   stand_id point_id ba_per_ac trees_per_ac vol_per_ac
#>      <dbl>    <dbl>     <dbl>        <dbl>      <dbl>
#> 1        1        1        60         98.9      1638.
#> 2        1        2         0          0           0 
#> 3        1        3        60         53.6      1746.
#> 4        2        1         0          0           0 
#> 5        2        2        40         89.8       948.
#> 6        2        3        60        179.       1345.
# Step 2, compute per unit area estimates for each stand.
point_summary %>% 
 summarize(
   n = n(), # Sample size.
   t = qt(p = 1 - 0.1 / 2, df = n - 1), # t-value for 90% CI.
   across(.cols = c("ba_per_ac", "trees_per_ac", "vol_per_ac"), 
          .fns = list(y_bar = mean,
                      s_y_bar = ~ sd(.x)/sqrt(n),
                      ci_lower = ~ mean(.x) - t * sd(.x)/sqrt(n),
                      ci_upper = ~ mean(.x) + t * sd(.x)/sqrt(n)
                      ),
          .names = "{.fn}_{.col}")) %>% 
  glimpse
#> Rows: 2
#> Columns: 15
#> $ stand_id              <dbl> 1, 2
#> $ n                     <int> 3, 3
#> $ t                     <dbl> 2.92, 2.92
#> $ y_bar_ba_per_ac       <dbl> 40.000, 33.333
#> $ s_y_bar_ba_per_ac     <dbl> 20.000, 17.638
#> $ ci_lower_ba_per_ac    <dbl> -18.40, -18.17
#> $ ci_upper_ba_per_ac    <dbl> 98.400, 84.837
#> $ y_bar_trees_per_ac    <dbl> 50.834, 89.555
#> $ s_y_bar_trees_per_ac  <dbl> 28.591, 51.648
#> $ ci_lower_trees_per_ac <dbl> -32.652, -61.256
#> $ ci_upper_trees_per_ac <dbl> 134.32, 240.36
#> $ y_bar_vol_per_ac      <dbl> 1128.04, 764.32
#> $ s_y_bar_vol_per_ac    <dbl> 564.88, 398.99
#> $ ci_lower_vol_per_ac   <dbl> -521.40, -400.72
#> $ ci_upper_vol_per_ac   <dbl> 2777.5, 1929.4

Point-level summaries (point_summary) for Stand 1 match those given in Table 12.12, and stand-level summaries for Stand 1 match those estimates given in Table 12.14.

12.6.3 Stand and stock table estimates

Stand and stock tables were introduced in Section 8.3 with a focus on building tables using pre-computed stand-level estimates. This section is the prequel to Section 8.3. Here, we cover what’s needed to generate estimates used in stand and stock tables from individual tree measurements collected via plot or point sampling.

Recall from Section 8.3 the stand table summarizes a quantitative discrete variable (e.g., stem count) grouped by one or more categorical variables (e.g., size class and species). Similarly, the stock table summarizes a quantitative continuous variable (e.g., volume or biomass) grouped by one or more categorical variables (e.g., size class and species).

In forestry applications, as detailed and illustrated in Section 8.3, stand tables typically summarize the average number of trees per unit area (e.g., acre or hectare) within an area of interest (e.g., stand, management unit, or forest) by DBH class and species. Similarly, stock tables typically summarize the average wood volume or weight per unit area within an area of interest by DBH class and species. DBH classes are DBH intervals with widths chosen to communicate situation-specific information.

The key to generating estimates for stand and stock tables is recognizing that each row and column combination defines a unique population. For example, in each of Tables 8.2 and 8.3, there are nine separate populations defined by species and DBH class combinations. Our task is to generate an estimate for each population. The estimates come from the estimators defined in Chapter 11 and illustrated for plot and point sampling in Sections 12.3.6 and 12.4.5, respectively. The only tricky part in this process is partitioning the inventory data into the respective populations prior to applying the estimators—a task made simple using tidyverse functions as illustrated in the remainder of this section.

Let’s build a stand and stock table using the plot sampling toy dataset introduced in Section 12.3.6. We’ll focus on Stand 1 overstory trees and define our populations using species and those DBH classes used in Section 8.3.

We begin by reading in the stand data, filtering for Stand 1 overstory trees, and selecting only the columns needed to build the stand and stock tables.

stands <- read_csv("datasets/two_stands_plot_data.csv")

o_stand <- stands %>% 
  filter(stand_id == 1, tree_type == "Overstory") %>%
  select(plot_id, tree_count, scientific_name, DBH_in, vol_cu_ft)

o_stand
#> # A tibble: 7 × 5
#>   plot_id tree_count scientific_name   DBH_in vol_cu_ft
#>     <dbl>      <dbl> <chr>              <dbl>     <dbl>
#> 1       1          1 Abies balsamea      11.3      17.8
#> 2       1          1 Pinus strobus        9.8      14.5
#> 3       1          1 Pinus strobus       10.7      17.9
#> 4       2          0 <NA>                 0         0  
#> 5       3          1 Betula papyrifera   14.8      33.6
#> 6       3          1 Betula papyrifera   15.4      36.6
#> 7       3          1 Pinus strobus       13.1      28.9

Recall there were no trees on Plot 2 in Stand 1, and we preserve that zero count using the tree_count variable, as seen in the output above. In the subsequent bit of code, we convert plot_id to a factor (this ensures Plot 2 is preserved as an absent factor level) and remove the row with tree_count == 0.

# Make plot_id a factor then remove plots with zero trees.
o_stand <- o_stand %>% 
  mutate(plot_id = as.factor(plot_id)) %>% 
  filter(tree_count != 0)

Next, we add zeros into the dataset where species and DBH class combinations (which define a population) were not observed. As noted earlier, a plot with no trees is part of the sample, reflects a characteristic of the population, and must be included as a zero when computing population parameter estimates.

The code below adds the DBH class column, then uses complete() to add zeros for those plots where a species and DBH class combination was not observed (i.e., making implicit zeros explicit, as described in Section 8.3).

# Add the DBH class column.
o_stand <- o_stand %>%
  mutate(DBH_4in = cut_width(DBH_in, width=4))

# Make implicit zeros explicit for species and DBH class 
# combinations not observed on plots.
o_stand <- o_stand %>% 
  complete(plot_id, scientific_name, DBH_4in, 
           fill = list(tree_count = 0, vol_cu_ft = 0))

o_stand
#> # A tibble: 28 × 6
#>    plot_id scientific_name   DBH_4in tree_count DBH_in
#>    <fct>   <chr>             <fct>        <dbl>  <dbl>
#>  1 1       Abies balsamea    [6,10]           0   NA  
#>  2 1       Abies balsamea    (10,14]          1   11.3
#>  3 1       Abies balsamea    (14,18]          0   NA  
#>  4 1       Betula papyrifera [6,10]           0   NA  
#>  5 1       Betula papyrifera (10,14]          0   NA  
#>  6 1       Betula papyrifera (14,18]          0   NA  
#>  7 1       Pinus strobus     [6,10]           1    9.8
#>  8 1       Pinus strobus     (10,14]          1   10.7
#>  9 1       Pinus strobus     (14,18]          0   NA  
#> 10 2       Abies balsamea    [6,10]           0   NA  
#> # ℹ 18 more rows
#> # ℹ 1 more variable: vol_cu_ft <dbl>

Now o_stand is ready for the workflow that generates stand-level estimates for trees per acre and volume per acre. Importantly, the workflow used to generate these estimates is identical to the one followed in Section 12.6.1, except here we group by species and DBH class to partition the data into the nine populations.

# Add a TF trees/ac column.
o_stand <- o_stand %>% 
  mutate(TF = 43560 / (pi * 24^2))

# Step 1, compute per unit area plot-level summaries.
plot_summary <- o_stand %>% 
  group_by(scientific_name, DBH_4in, plot_id) %>% 
  summarize(trees_per_ac = sum(tree_count * TF),
            vol_per_ac = sum(tree_count * TF * vol_cu_ft),
            .groups = "drop_last")

# Step 2, compute per unit area estimates for each stand.
stand_estimates <- plot_summary %>% 
  summarize(y_bar_trees = mean(trees_per_ac),
            y_bar_vol = mean(vol_per_ac))

stand_estimates 
#> # A tibble: 9 × 4
#> # Groups:   scientific_name [3]
#>   scientific_name   DBH_4in y_bar_trees y_bar_vol
#>   <chr>             <fct>         <dbl>     <dbl>
#> 1 Abies balsamea    [6,10]         0           0 
#> 2 Abies balsamea    (10,14]        8.02      143.
#> 3 Abies balsamea    (14,18]        0           0 
#> 4 Betula papyrifera [6,10]         0           0 
#> 5 Betula papyrifera (10,14]        0           0 
#> 6 Betula papyrifera (14,18]       16.0       563.
#> 7 Pinus strobus     [6,10]         8.02      116.
#> 8 Pinus strobus     (10,14]       16.0       376.
#> 9 Pinus strobus     (14,18]        0           0

stand_estimates holds the stand table trees per acre and stock table volume per acre. From here, two calls to pivot_wider() are all that’s needed to build the respective tables. Code to add row and column totals is provided toward the end of Section 8.3.

# Stand table.
stand_estimates %>% 
  pivot_wider(id_cols = scientific_name, 
              names_from = DBH_4in, 
              values_from = y_bar_trees)
#> # A tibble: 3 × 4
#> # Groups:   scientific_name [3]
#>   scientific_name   `[6,10]` `(10,14]` `(14,18]`
#>   <chr>                <dbl>     <dbl>     <dbl>
#> 1 Abies balsamea        0         8.02       0  
#> 2 Betula papyrifera     0         0         16.0
#> 3 Pinus strobus         8.02     16.0        0
# Stock table.
stand_estimates %>% 
  pivot_wider(id_cols = scientific_name, 
              names_from = DBH_4in, 
              values_from = y_bar_vol)
#> # A tibble: 3 × 4
#> # Groups:   scientific_name [3]
#>   scientific_name   `[6,10]` `(10,14]` `(14,18]`
#>   <chr>                <dbl>     <dbl>     <dbl>
#> 1 Abies balsamea          0       143.        0 
#> 2 Betula papyrifera       0         0       563.
#> 3 Pinus strobus         116.      376.        0

12.7 Summary

This chapter connected general statistical survey sampling topics covered in Chapter 11 with forestry applications. Key to this connection is the areal sampling frame used to select sampling locations and rules—plot sampling and point sampling—used to select measurement trees at each sampling location. We introduced the tree-centered perspective as a means to better understand tree inclusion zones, the measurement tree selection process, probability of selection, and tree factors. The notion of a tree inclusion zone was also helpful for understanding boundary slopover bias and the need for slope correction. The mirage and walkthrough methods were introduced for correcting boundary slopover bias. Several different approaches for correcting slope effects were also presented.

Much of our focus was on calculating and applying tree factors for plot- and point-level summaries and how these summaries are used to estimate population parameters. We presented all topics in this chapter assuming samples were collected using SRS. Building on this foundation, the next chapter develops several more practical and efficient sampling designs and estimators.

Section 12.6 stepped through how to build efficient tidyverse workflows for estimation using samples collected via plot and point sampling. Once mastered, dplyr, tidyr, and related packages provide tools for computation on groups (e.g., via group_by()), and enable the creation of simple, elegant, and reusable code for even the most complex estimation tasks.

12.8 Exercises

Exercise 12.1 What is the definition of a tree factor?

Exercise 12.2 For a 1/20-th acre circular inventory plot, answer the following questions:

  1. What is the area of the plot in square feet?
  2. What is the radius of the plot, in feet?
  3. What is the tree factor for a tree measured on this plot?

Exercise 12.3 For a circular inventory plot with an area of 300 m\(^2\), answer the following questions:

  1. What is the radius of the plot, in meters?
  2. What fraction of a hectare does this plot cover?
  3. What is the tree factor for a tree measured on this plot?

Exercise 12.4 What is the relationship between BAF and a tree’s inclusion zone area? How does this relationship affect the number of “in” (i.e., measurement) trees at a given point? No math or code needed to answer this question.

Exercise 12.5 What is the horizontal distance multiplier (HDM) and how is it used?

Exercise 12.6 Given an English 20 BAF and using the HDMs provided in Table 12.9, determine whether the following trees are “in” or “out”—that is, whether they should be measured or not.

  1. 10-inch DBH tree, 20 feet from the point.
  2. 10-inch DBH tree, 19 feet from the point.
  3. 60-inch DBH tree, 110 feet from the point.
  4. 20-inch DBH tree, 40 feet from the point.

Exercise 12.7 Given a metric 4 BAF and using the HDMs provided in Table 12.10, determine whether the following trees are “in” or “out”—that is, whether they should be measured or not.

  1. 25-cm DBH tree, 6 meters from the point.
  2. 25-cm DBH tree, 7 meters from the point.
  3. 150-cm DBH tree, 35 meters from the point.
  4. 50-cm DBH tree, 84 meters from the point.
TABLE 12.20: TABLE 12.21: Cruise data for \(n\)=2 plots. Measurements were taken on fixed-area circles plots with a 9.77 (m) radius.
Plot Tree DBH (cm) Volume (m\(^3\))
1 1 41 4
1 2 58 10
1 3 30 2
2 1 46 6
2 2 71 15

Exercise 12.8 Using the cruise data given in Table 12.20, answer the following questions using the three measurement trees on Plot 1. Include the units as a comment for each answer.

  1. What is each measurement tree’s tree factor?
  2. Compute the plot-level trees per hectare.
  3. Compute each tree’s BA.
  4. Compute each tree’s BA per hectare.
  5. Compute the plot-level basal area per hectare.
  6. Compute each tree’s volume per hectare.
  7. Compute the plot-level volume per hectare.

Exercise 12.9 Recognize the three plot-level summaries computed in Exercise 12.8(b,e,g) (i.e., per hectare basal area, number of trees, and volume) represent one set of observations from one plot. Such plot-level summaries must be computed for each of \(n\) plots. The \(n\) plot-level summaries are the sample from which you compute the mean, standard error of the mean, and confidence interval forest-level estimates.

Using the cruise data given in Table 12.20, answer the following questions. Include the units as a comment for each answer.

  1. Compute the plot-level summary for basal area per hectare, trees per hectare, and volume per hectare (hint, you already did this for Plot 1 in Exercise 12.8(b,e,g)). You will end up with six values (i.e., basal area, number of trees, and volume for each of the two plots).

  2. Use your plot-level summaries computed in Exercise 12.9(a) to compute a forest-level estimate for per hectare basal area, number of trees, and volume. Your forest-level estimate for each is the mean (\(\bar{y}\)) over your plot-level summaries.

  3. Use your plot-level summaries computed in Exercise 12.9(a) to compute the standard error of the mean (\(s_{\bar{y}}\)) for your forest-level estimate of per hectare basal area, number of trees, and volume computed in Exercise 12.9(b). These standard errors can be used to compute a confidence interval for each forest-level estimate.

TABLE 12.22: TABLE 12.23: Point sampling cruise data for \(n\)=2 points with measurement trees idenfied using an English 20 BAF.
Point Tree DBH (in) Volume (ft\(^3\))
1 1 16 46
1 2 23 103
1 3 12 23
2 1 18 60
2 2 28 159

Exercise 12.10 Using the cruise data given in Table 12.22, answer the following questions using the three measurement trees at Point 1. Include the units as a comment for each answer.

  1. What is the basal area per acre that each tree represents? No calculations or code needed here.

  2. Compute the point-level basal area per acre.

  3. Compute each tree’s BA.

  4. Using BA from 12.10(c) and equation (12.15), compute each tree’s tree factor.

  5. Compute the point-level trees per acre.

  6. Compute each tree’s volume per acre.

  7. Compute the point-level volume per acre.

Exercise 12.11 Recognize the three point-level summaries computed in Exercise 12.10(b,e,g) (i.e., per acre basal area, number of trees, and volume) represent one set of observations from one point. Such point-level summaries must be computed for each of \(n\) points. The \(n\) point-level summaries are the sample from which you compute the mean, standard error of the mean, and confidence interval forest-level estimates.

Using the cruise data given in Table 12.22, answer the following questions. Include the units as a comment.

  1. Compute the point-level summary for basal area per acre, trees per acre, and volume per acre (hint, you already did this for Point 1 in Exercise 12.10(b,e,g)). You will end up with six values (i.e., basal area, number of trees, and volume for each of the two points).

  2. Use your point-level summaries computed in Exercise 12.11(a) to compute a forest-level estimate for per acre basal area, number of trees, and volume. Your forest-level estimate for each is the mean (\(\bar{y}\)) over your point-level summaries.

  3. Use your point-level summaries computed in Exercise 12.11(a) to compute the standard error of the mean (\(s_{\bar{y}}\)) for your forest-level estimate of per acre basal area, number of trees, and volume computed in Exercise 12.11(b). These standard errors can be used to compute a confidence interval for each forest-level estimate.

Exercise 12.12 Toward the end of Section 12.3.4, we outlined the radius adjustment approach to slope correction for fixed-area circular plots. Table 12.24 holds slope measured in percent and degrees, and the associated radius scaling factor \(r^\ast\). As described in Section 12.3.4, the scaling factor \(r^\ast\) is multiplied by the horizontal plot radius \(R\) (i.e., plot radius used in the cruise) to give the adjusted plot radius \(R^\ast\) (i.e., plot radius used on the slope).

Your task is to write code to reproduce \(r^\ast\) values in Table 12.24. Recall R’s trigonometric functions assume angles are in radians, so we also provide several utility functions for converting among slope degrees, percents, and radians. Given the sequence of slopes (either percentages or degrees), your code to compute \(r^\ast\) should be a simple R math expression.

Include your code and the resulting \(r^\ast\) values as output.

# Some useful angle conversion functions.
r2d <- function(r) {r * 180 / pi}       # Radian to degree.
d2r <- function(d) {d * pi / 180}       # Degree to radian.
p2r <- function(p) {atan(p / 100)}      # Percent to radian.
p2d <- function(p) {r2d(atan(p / 100))} # Percent to degree.

# To get you started, we've set up the vectors of slopes below.
p_slope <- seq(from = 0, to = 150, by = 10) # slope percent.
d_slope <- p2d(p_slope) # slope degrees.
r_slope <- p2r(p_slope) # slope radians.
TABLE 12.24: TABLE 12.25: Slope and scaling factor \(r^\ast\) used to implement radius adjustment approach to slope correction for fixed-area circular plots
Slope (%) Slope (\(^{\circ}\)) r\(^\ast\) Slope (%) Slope (\(^{\circ}\)) r\(^\ast\)
0 0 1 80 38.7 1.132
10 5.71 1.002 90 42.0 1.160
20 11.3 1.010 100 45 1.189
30 16.7 1.022 110 47.7 1.219
40 21.8 1.038 120 50.2 1.250
50 26.6 1.057 130 52.4 1.281
60 31.0 1.080 140 54.5 1.312
70 35.0 1.105 150 56.3 1.343

Exercise 12.13 Say you conducted a cruise using 1/20-th acre circular plots. Table 12.26 holds information about the slope on a few plots and distances from plot center to possible measurement trees. Your task is to use the radius adjustment slope correction approach to fill in missing information in Table 12.26. Specifically, given the slope, determine the scale factor (\(r^\ast\)) from Table 12.24. Then compute the slope adjusted plot radius (\(R^\ast\)) and make a determination about tree measurement status, i.e., is the tree “in” or “out” of the plot.

Hint, you’ll first need to compute the plot radius for the 1/20-th acre plot. The first two rows of Table 12.26 are filled in for you.

TABLE 12.26: TABLE 12.27: Trees measured on different plot slopes. Distance from the plot center to the tree’s center is provided and status is whether the tree is in or out of the 1/20-th acre plot.
Tree Slope (%) \(r^\ast\) \(R^\ast\) Distance (ft) Status
1 30 1.022 26.91 26.5 In
2 60 1.080 28.44 29.0 Out
3 110 30.0
4 40 27.0
5 90 31.0

The next set of exercises use the Penobscot Experimental Forest (PEF) inventory data introduced in Section 1.2.4. These data were collected as part of several long-term USDA Forest Service silvicultural studies described in Kenefic et al. (2015). The data comprise repeated measurements on fixed-area permanent sample plots (PSPs) located within management units (MUs), see Section 1.4 and Figure 1.5. The code below reads the PEF tree measurements into pef_trees and then takes a glimpse at the data.

pef_trees <- read_csv("datasets/PEF/PEF_trees.csv")
pef_trees %>% glimpse()
#> Rows: 316,837
#> Columns: 10
#> $ MU               <chr> "12", "12", "12", "12", "12"…
#> $ plot             <dbl> 11, 11, 11, 11, 11, 11, 11, …
#> $ inv              <dbl> 17, 17, 17, 17, 17, 17, 17, …
#> $ year             <dbl> 2014, 2014, 2014, 2014, 2014…
#> $ month            <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6…
#> $ PEF_species_code <dbl> 4, 4, 7, 15, 6, 15, 16, 1, 1…
#> $ DBH_in           <dbl> 6.9, 6.4, 6.0, 6.2, 16.8, 5.…
#> $ AGB_lbs          <dbl> 219.693, 184.341, 135.754, 2…
#> $ exp_ac           <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
#> $ study            <chr> "CMS", "CMS", "CMS", "CMS", …

Rows in pef_trees correspond to measurement trees and columns are defined as follows.

  1. MU management unit within which the tree was sampled. Management unit boundaries are delineated using orange polygons in Figure 1.5. MUs receive different silvicultural treatments and belong to one of two studies (see the study column definition below).
  2. plot permanent sample plot number. Plot number is unique within MU, but not across MUs (i.e., the same plot number could occur in multiple MUs).
  3. inv sequential inventory number. For a given MU and plot combination, the smallest and largest inv values correspond to the earliest and most current measurement, respectively. An inventory is a complete measurement of all plots within the given MU.
  4. year year of measurement.
  5. month month of measurement.
  6. PEF_species_code species code used on the PEF. Common and scientific names for each code are given in “datasets/PEF/PEF_species_codes.csv”.
  7. DBH_in tree DBH in inches.
  8. AGB_lbs tree above ground biomass (AGB) in pounds computed using DBH and species specific equations defined in Jenkins et al. (2002).
  9. exp_ac tree expansion factor (TF) in trees per acre. As described in Section 12.3.1, exp_ac is the inverse of the plot area upon which the tree was measured.
  10. study the silvicultural study to which the tree belongs. MUs, and hence trees, belong to either the Compartment Management Study (CMS) or Management Intensity Demonstration (MID).
Penobscot Experimental Forest permanent sample plot (PSP) nested design and tree DBH measurement protocol.

FIGURE 12.12: Penobscot Experimental Forest permanent sample plot (PSP) nested design and tree DBH measurement protocol.

Each plot depicted in Figure 1.5 was measured using the design shown in Figure 12.12.142 As noted in Section 12.3, and illustrated using the toy inventory dataset in Section 12.6, a nested plot design that uses smaller plot size to sample smaller diameter trees reduces the time needed to conduct an inventory.

The pef_trees subset created in Exercises 12.14 and 12.15 is used in Exercises 12.16 through 12.18. For several exercises, we provide the dplyr workflow output and it’s your job to write code to reproduce the output. As always, use the pipe operator when possible.

Exercise 12.14 Consider the code below used to subset pef_trees to form pef_current_trees, which includes only the most current measurement in each plot, then answer the following questions. Hint, review the MU, plot, and inv column definitions.

pef_trees_current <- pef_trees %>% group_by(MU, plot) %>% 
  filter(inv == max(inv)) %>% ungroup()
  1. Why are MU and plot in group_by()?
  2. How does the grouped filter(inv == max(inv)) result in each MU’s most current inventory?
  3. How is the tibble output from filter() grouped (i.e., before it’s piped to ungroup())?
  4. What is the effect of ungroup() at the end of the piped workflow?
  5. Why might it be a good idea to remove the grouping structure on pef_current_trees (i.e., why did we feel the need to add ungroup() to the workflow)?

Exercise 12.15 Subset pef_trees_current, created in Exercise 12.14, to include only trees in the MID study and name your resulting tibble mid_trees. Hint, following the column definitions, you’ll use study == "MID" in your filter() call. Your mid_trees should match the output below.

mid_trees %>% print()
#> # A tibble: 2,496 × 10
#>    MU     plot   inv  year month PEF_species_code
#>    <chr> <dbl> <dbl> <dbl> <dbl>            <dbl>
#>  1 90       11    25  2011     5                4
#>  2 90       11    25  2011     5                4
#>  3 90       11    25  2011     5                4
#>  4 90       11    25  2011     5                6
#>  5 90       11    25  2011     5                6
#>  6 90       11    25  2011     5                4
#>  7 90       11    25  2011     5                4
#>  8 90       11    25  2011     5                6
#>  9 90       11    25  2011     5                6
#> 10 90       11    25  2011     5                6
#> # ℹ 2,486 more rows
#> # ℹ 4 more variables: DBH_in <dbl>, AGB_lbs <dbl>,
#> #   exp_ac <dbl>, study <chr>

To provide some context for subsequent exercises, we used the tree measurements held in mid_trees to create Figure 12.13. The subsequent exercises explore these data and generate plot- and MU-level biomass estimates.

Penobscot Experimental Forest (PEF) management units (MUs) that are part of the Management Intensity Demonstration (MID) study are delineated in orange and labeled. Other PEF MUs are shaded gray. MID MUs are positioned near the middle of the PEF extent shown in Figure 1.5. MID permanent sample plot locations are colored by the most recent inventory’s total biomass per acre estimate of overstory trees.

FIGURE 12.13: Penobscot Experimental Forest (PEF) management units (MUs) that are part of the Management Intensity Demonstration (MID) study are delineated in orange and labeled. Other PEF MUs are shaded gray. MID MUs are positioned near the middle of the PEF extent shown in Figure 1.5. MID permanent sample plot locations are colored by the most recent inventory’s total biomass per acre estimate of overstory trees.

Exercise 12.16 Using mid_trees, write code to reproduce the answer to each question below.

  1. How many trees were measured across all plots? Hint, we used nrow().

    #> [1] 2496
  2. What are the distinct MU values? Hint, consider distinct().

    #> # A tibble: 6 × 1
    #>   MU   
    #>   <chr>
    #> 1 90   
    #> 2 91   
    #> 3 92   
    #> 4 93A  
    #> 5 93B  
    #> 6 93C
  3. What year was each MU inventoried? Hint, our code uses group_by(), summarize(), and first() to create the inventory_year output column.

    #> # A tibble: 6 × 2
    #>   MU    inventory_year
    #>   <chr>          <dbl>
    #> 1 90              2011
    #> 2 91              2011
    #> 3 92              2011
    #> 4 93A             2010
    #> 5 93B             2010
    #> 6 93C             2010
  4. How many plots in each MU? Hint, our code uses group_by(), summarize(), and n_distinct() to create the n_plots output column.

    #> # A tibble: 6 × 2
    #>   MU    n_plots
    #>   <chr>   <int>
    #> 1 90          6
    #> 2 91          6
    #> 3 92          6
    #> 4 93A         2
    #> 5 93B         2
    #> 6 93C         2
  5. How many unique plots were measured across all MUs? Hint, our code uses summarize() and n_distinct() to create the n_plot output column.

    #> # A tibble: 1 × 1
    #>   n_plots
    #>     <int>
    #> 1      24
  6. Which three plots have the most trees measured? Hint, our code uses group_by(), summarize() with the argument .groups = "drop", and slice_max() with the argument n = 3.

    #> # A tibble: 3 × 3
    #>   MU     plot n_trees
    #>   <chr> <dbl>   <int>
    #> 1 93C      12     214
    #> 2 93B      12     199
    #> 3 92       11     160

Exercise 12.17 As noted in the column definitions, exp_ac is the tree expansion factor (TF) as defined in Section 12.3.1. Use mid_trees to answer the following questions about the expansion factor.

  1. Print the distinct values of exp_ac. Are they consistent with the plot areas given in Figure 12.12?

  2. What is the interpretation of the exp_ac values? What determines a tree’s expansion factor?

  3. For each exp_ac value, what are the minimum and maximum DBH values? Write the code to reproduce our output below. Hint, we used group_by() and summarize() to create the min_DBH_in and max_DBH_in output columns. Are the DBH ranges for each expansion factor consistent with the sampling protocol given in Figure 12.12?

    #> # A tibble: 3 × 3
    #>   exp_ac min_DBH_in max_DBH_in
    #>    <dbl>      <dbl>      <dbl>
    #> 1      5        4.5       31.1
    #> 2     20        2.5        4.4
    #> 3     50        0.5        2.4

Exercise 12.18 This next series of questions results in plot-level summaries then MU-level estimates for number of trees per acre, basal area, and above ground biomass (AGB). We’ll limit our estimates to overstory trees in mid_trees (i.e., trees measured on the 1/5-th ac plot depecited in Figure 12.12).

After question (a) below, that filters only overstory trees, this exercise follows the estimation steps laid out in Section 12.6; however, it breaks Step 1 into two parts. The first part, question (b), expands each tree’s variable value to the per unit area basis. The second part, question (c), sums each tree’s values computed in question (b) to arrive at the plot-level summaries. In practice, we can combine these two parts, as we did in Section 12.6.

  1. Filter mid_trees to include only those trees measured on the 1/5-th ac plots and call the resulting tibble mid_overstory_trees. Your mid_overstory_trees should match the output below. Use mid_overstory_trees for questions b-e.

    mid_overstory_trees %>% glimpse()
    #> Rows: 1,366
    #> Columns: 10
    #> $ MU               <chr> "90", "90", "90", "90", "90"…
    #> $ plot             <dbl> 11, 11, 11, 11, 11, 11, 11, …
    #> $ inv              <dbl> 25, 25, 25, 25, 25, 25, 25, …
    #> $ year             <dbl> 2011, 2011, 2011, 2011, 2011…
    #> $ month            <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
    #> $ PEF_species_code <dbl> 6, 6, 4, 6, 6, 4, 6, 4, 4, 4…
    #> $ DBH_in           <dbl> 13.9, 9.3, 16.1, 7.1, 17.6, …
    #> $ AGB_lbs          <dbl> 1207.12, 445.32, 1585.07, 22…
    #> $ exp_ac           <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
    #> $ study            <chr> "MID", "MID", "MID", "MID", …
  2. Add the following new columns to mid_overstory_trees: 1) trees_per_ac is the number of trees per acre each tree represents; 2) BA_sqft_per_ac is the basal area (ft\(^2\)/ac) each tree represents; 3) AGB_tons_per_ac is the above ground biomass (tons/ac) each tree represents. Note you’ll need to scale AGB_lbs by 1/2000 to convert from pounds to tons. Hint, our code uses mutate().

    # Take a glimpse of only a few columns so they fit on the page.
    mid_overstory_trees %>% 
      select(MU, plot, trees_per_ac, 
             BA_sqft_per_ac, AGB_tons_per_ac) %>%
      glimpse() 
    #> Rows: 1,366
    #> Columns: 5
    #> $ MU              <chr> "90", "90", "90", "90", "90",…
    #> $ plot            <dbl> 11, 11, 11, 11, 11, 11, 11, 1…
    #> $ trees_per_ac    <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
    #> $ BA_sqft_per_ac  <dbl> 5.2688, 2.3586, 7.0687, 1.374…
    #> $ AGB_tons_per_ac <dbl> 3.01781, 1.11329, 3.96267, 0.…
  3. Compute the plot-level summaries by summing each plot’s values for trees_per_ac, BA_sqft_per_ac, and AGB_tons_per_ac. Call the resulting tibble plot_summary. Hint, our code uses group_by() and summarize().143

    plot_summary %>% glimpse()
    #> Rows: 24
    #> Columns: 5
    #> Groups: MU [6]
    #> $ MU              <chr> "90", "90", "90", "90", "90",…
    #> $ plot            <dbl> 11, 12, 21, 31, 51, 52, 11, 2…
    #> $ trees_per_ac    <dbl> 150, 155, 160, 250, 225, 175,…
    #> $ BA_sqft_per_ac  <dbl> 109.483, 114.765, 98.615, 78.…
    #> $ AGB_tons_per_ac <dbl> 58.337, 62.491, 53.709, 37.32…
  4. Use the plot-level summaries for MUs 90, 91, and 92 in plot_summary from question (c) to estimate MU-level mean, standard error of the mean, and 95% confidence intervals for the trees_per_ac, BA_sqft_per_ac, and AGB_tons_per_ac variables. This is Step 2 in the estimation process described in Section 12.6.

    In your calculations, use the standard error given in (11.21). Call your resulting tibble MU_estimates. Hint, our code uses only filter() and summarize().

    MU_estimates %>% glimpse()
    #> Rows: 3
    #> Columns: 15
    #> $ MU             <chr> "90", "91", "92"
    #> $ n              <int> 6, 6, 6
    #> $ t              <dbl> 2.5706, 2.5706, 2.5706
    #> $ y_bar_trees    <dbl> 185.83, 263.33, 247.50
    #> $ s_y_bar_trees  <dbl> 17.001, 36.370, 22.721
    #> $ ci_lower_trees <dbl> 142.13, 169.84, 189.09
    #> $ ci_upper_trees <dbl> 229.54, 356.83, 305.91
    #> $ y_bar_BA       <dbl> 117.412, 138.905, 78.939
    #> $ s_y_bar_BA     <dbl> 12.1158, 22.9370, 7.3206
    #> $ ci_lower_BA    <dbl> 86.267, 79.943, 60.121
    #> $ ci_upper_BA    <dbl> 148.556, 197.866, 97.757
    #> $ y_bar_AGB      <dbl> 63.425, 73.671, 36.802
    #> $ s_y_bar_AGB    <dbl> 7.5995, 13.0970, 3.7803
    #> $ ci_lower_AGB   <dbl> 43.889, 40.004, 27.084
    #> $ ci_upper_AGB   <dbl> 82.960, 107.338, 46.519
  5. What was the sample size used to compute your MU-level estimates in question (d)? Do you think these sample sizes are large enough to provide robust estimates?

  6. Use the MU-level estimates in MU_estimates from question (d) to compute MU-level total with associated 95% confidence intervals for AGB.144 MU 90, 91, and 92 have an area of 9.92, 9.76, and 9.92 acres, respectively. Break your workflow into three steps. First, make a tibble called mu_acres with columns MU and A_ac, where A_ac holds the acres for each MU. Second, join mu_acres to MU_estimates.145 Third, use mutate() or transmute() on MU_estiamtes to create a tibble that holds each variable’s estimated total and associated confidence intervals. The output from these three steps is given below.

    Step 1: Create the mu_acres tibble.

    mu_acres %>% print()
    #> # A tibble: 3 × 2
    #>   MU     A_ac
    #>   <chr> <dbl>
    #> 1 90     9.92
    #> 2 91     9.76
    #> 3 92     9.92

    Step 2: Join mu_acres to MU_estimates.

    MU_estimates %>% select(MU, A_ac) %>% print()
    #> # A tibble: 3 × 2
    #>   MU     A_ac
    #>   <chr> <dbl>
    #> 1 90     9.92
    #> 2 91     9.76
    #> 3 92     9.92

    Step 3: Use MU_estimates and transmute() to estimate each variable’s total with associated confidence interval. Name the resulting tibble MU_tot_estimates. Hint, scale the per unit area estimates by their respective MU acres.

    MU_tot_estimates %>% glimpse()
    #> Rows: 3
    #> Columns: 4
    #> $ MU           <chr> "90", "91", "92"
    #> $ total_AGB    <dbl> 629.17, 719.03, 365.07
    #> $ ci_lower_AGB <dbl> 435.38, 390.44, 268.68
    #> $ ci_upper_AGB <dbl> 822.96, 1047.62, 461.47

References

Beers, Thomas W. 1969. Slope Correction in Horizontal Point Sampling.” Journal of Forestry 67 (3): 188–92. https://doi.org/10.1093/jof/67.3.188.
———. 1977. Practical Correction of Boundary Overlap.” Southern Journal of Applied Forestry 1 (1): 16–18. https://doi.org/10.1093/sjaf/1.1.16.
Bitterlich, W. 1952. “Die Winkelzählprobe.” Forstwissenschaftliches Centralblatt 71 (7): 215–25. https://doi.org/10.1007/BF01821439.
———. 1984. The Relascope Idea. Relative Measurements in Forestry. Farnham Royal, UK: Commonwealth Agricultural Bureaux.
Bruce, David. 1955. A New Way to Look at Trees.” Journal of Forestry 53 (3): 163–67. https://doi.org/10.1093/jof/53.3.163.
Brunsdon, Chris, and Lex Comber. 2019. An Introduction to R for Spatial Analysis and Mapping (Spatial Analytics and GIS) Second Edition. SAGE Publications Ltd.
Bryan, Mackay B. 1956. A Simplified Method of Correcting for Slope on Circular Sample Plots.” Journal of Forestry 54 (7): 442–45. https://doi.org/10.1093/jof/54.7.442.
Burkhart, H. E., T. E. Avery, and B. P. Bullock. 2018. Forest Measurements: Sixth Edition. Waveland Press.
Ducey, Mark J., Jeffrey H. Gove, and Harry T. Valentine. 2004. “A Walkthrough Solution to the Boundary Overlap Problem.” Forest Science 50: 427–35.
Freese, F. 1962. Elementary Forest Sampling. Agriculture Handbook. U.S. Department of Agriculture, Forest Service.
Gregoire, T. G., and H. T. Valentine. 2007. Sampling Strategies for Natural Resources and the Environment. Chapman & Hall/CRC Applied Environmental Statistics. Taylor & Francis.
Grosenbaugh, L. R. 1952. “Plotless Timber Estimates—New, Fast, Easy.” Journal of Forestry 50: 32–37.
———. 1958. “Point-Sampling and Line-Sampling Probability Theory, Geometric Implications, Synthesis.” USDA Forest Service, Southern Forest Experiment Station, Occasional Paper 160.
Grosenbaugh, L. R., and W. S. Stover. 1957. Point-Sampling Compared with Plot-Sampling in Southeast Texas.” Forest Science 3 (1): 2–14.
Honer, T. G. 1967. “Standard Volume Tables and Merchantable Conversion Factors for the Commercial Tree Species of Central and Eastern Canada.” Can. Dept. Forestry Rural Devel., For. Mgmt. Res. And Serv. Inst. Info. Rep. FMR-X-5.
Iles, Kim, and Mike Fall. 1988. “Can an Angle Gauge Really Evaluate "Borderline Trees" Accurately in Variable Plot Sampling?” Canadian Journal of Forest Research 18 (6): 776–83. https://doi.org/10.1139/x88-118.
Iles, Kim, and William H. Wilson. 1988. “Changing Angle Gauges in Variable Plot Sampling: Is There a Bias Under Ordinary Conditions?” Canadian Journal of Forest Research 18 (6): 770–75. https://doi.org/10.1139/x88-117.
Jenkins, J. C., D. C. Chojnacky, L. S. Heath, and R. A. Birdsey. 2002. “Comprehensive Database of Diameter-Based Biomass Regressions for North American Tree Species.” Gen. Tech. Rep. NE-319. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northeastern Research Station.
Kenefic, Laura S., Nicole S. Rogers, Joshua J. Puhlick, Justin D. Waskiewicz, and John C. Brissette. 2015. “Overstory Tree and Regeneration Data from the "Silvicultural Effects on Composition, Structure, and Growth" Study at Penobscot Experimental Forest. 2nd Edition. Fort Collins, CO: Forest Service Research Data Archive.”
Kershaw, J. A., M. J. Ducey, T. W. Beers, and B. Husch. 2016. Forest Mensuration. Wiley.
Oderwald, Richard G. 1981. Point and Plot Sampling–The Relationship.” Journal of Forestry 79 (6): 377–78.
Palley, Marshall N., and Leah G. Horwitz. 1961. Properties of Some Random and Systematic Point Sampling Estimators.” Forest Science 7 (1): 52–65.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
Roesch, Francis A., Edwin J. Green, and Charles T. Scott. 1993. “An Alternative View of Forest Sampling.” Survey Methodology, 199–204.
Schmid, Paul. 1969. “Stichproben Am Waldrand” 45 (3): 234–303.
Schreuder, Hans T., David S. Schreiner, and Timothy A. Max. 1981. Ensuring an Adequate Sample at Each Location in Point Sampling.” Forest Science 27 (3): 567–73. https://doi.org/10.1093/forestscience/27.3.567.
USDA. 2006. National Forest Log Scaling Handbook. FSH; 2409.11 U.S. Department of Agriculture, Forest Service.
USFS. 1996. Forest Service Handbook, Washington : FSH 2409.12, Timber Cruising Handbook. Washington, DC: The Forest Service, 1996-.
Waskiewicz, J. D., L. S. Kenefic, N. S. Rogers, J. J. Puhlick, J. C. Brissette, and R. J. Dionne. 2015. “Sampling and Measurement Protocols for Long-Term Silvicultural Studies on the Penobscot Experimental Forest.” Gen. Tech. Rep. NRS-147. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research Station. 32 p.
Wensel, Lee C., Jack Levitan, and Klaus Barber. 1980. Selection of Basal Area Factor in Point Sampling.” Journal of Forestry 78 (2): 83–84. https://doi.org/10.1093/jof/78.2.83.
Young, H. E. 1957. Additional Volume Tables for Maine. Maine Agricultural Experiment Station Miscellaneous Publication 627.

  1. We use forest generically to represent a bounded land area holding the population of interest.↩︎

  2. The plot defining each tree’s inclusion zone is created by replicating the plot intended for the sampling location, then rotating the plot 180\(^\circ\) and positioning it at the tree’s location, see Chapter 7 in Gregoire and Valentine (2007) for further explanation and examples. The inclusion zone rotation is not apparent in Figure 12.1(b) because the circular plot is symmetric about the sampling location.↩︎

  3. Often referred to as a milacre plot.↩︎

  4. Calculations for the constant \(c\) are given in Section 5.3.1.↩︎

  5. You might ask, “If all trees are measured on the same plot area, why complicate life with this \(j\) subscript for tree-specific TF?” There are a few good reasons. (1) As noted earlier, it’s common to have more than one plot size in a given inventory, e.g., a plot for overstory trees and a nested subplot for regeneration. So you’ll have trees in your dataset with different TFs. (2) As developed later in Section 12.4, under point sampling a tree’s inclusion zone depends on its size (e.g., its basal area), so the TF differs by tree. (3) Using a tree-specific TF supports an efficient workflow developed in Section 12.6 that works for both plot and point sampling.↩︎

  6. Paul Schmid published his original mirage method development in German. Later, an English publication by Beers (1977) highlighted Schmid’s method and proposed an extension for settings where the mirage method could not be implemented.↩︎

  7. R’s trigonometric functions assume angles are in radians, not degrees. To reproduce this example in R using cos(), first convert the angle from degrees to radians with the d2r() function: d2r <- function(d) { d * pi / 180 }. So for the \(C_1\) example: 26.33/cos(d2r(30)). Other useful angle conversion functions are listed in Exercise 12.12.↩︎

  8. Like the Harvard Forest census dataset, it’s useful to learn about different estimators and associated computations using a census because it allows you to check your resulting estimates against the truth. Clearly, this is never a luxury we have when sampling a real population.↩︎

  9. Because a lot of activity occurs at overstory plot centers, e.g., holding the “dumb” end of a tape measure used to check if trees are within the plot radius and establishing markers if they’re permanent plots, we typically locate the regeneration subplot away from this high-traffic area to avoid trampling seedlings.↩︎

  10. Get used to these steps—they’re a recurring theme in forest inventory data analysis.↩︎

  11. Bitterlich’s angle count method is also known as Bitterlich sampling, prism sampling, plot-less sampling, variable radius plot sampling, variable plot sampling, and plotless cruising.↩︎

  12. Interestingly, they found that even though errors were made by professional cruisers in calling borderline trees as “in” or “out,” the errors tended to cancel over the entire cruise. Those errors that didn’t cancel out (i.e., bias) were typically due to inaccurate slope corrections and line of sight issues. Inexperienced cruisers tended to make more non-canceling errors. The authors strongly recommend measuring all potential borderline trees.↩︎

  13. While any area unit can be used, we focus on acres and hectares because they’re most commonly used in forestry applications.↩︎

  14. Trees with the same DBH will have the same tree factor. For time efficiency, it’s common to assign trees to coarse DBH classes, e.g., using 2 in increments, and one might compute the TF for each DBH class. However, we’ll generally assume each tree’s DBH is measured with enough precision to yield a tree-specific TF.↩︎

  15. Remember you choose the angle \(\theta\), or perhaps it’s chosen for you and specified in the survey protocol.↩︎

  16. Notice we dropped the \(j\) subscript on BAF. This is because a single BAF is typically used for a given population—more on this topic in Section 12.4.3.↩︎

  17. The point-level summary is the point sampling analog to plot sampling’s plot-level summary introduced in Section 12.4.1.↩︎

  18. There are a few exceptions to obtaining point-level summaries without tree factors—see Section 12.4.6.↩︎

  19. Such tables are useful to have with you in the field so you can quickly compute a tree’s limiting distance given its DBH and HDM.↩︎

  20. International 1/4-inch log rule estimates board feet from a log, accounting for stem taper, wood removed from the outside of the log to square it off, and saw kerf (or the loss of wood as sawdust), see, e.g., USDA (2006).↩︎

  21. Check your understanding by computing other VBARs in Table 12.19 using values in Table 12.18.↩︎

  22. The line’s slope is found using the least-squares regression method with a zero intercept constraint, implemented using the lm() function; see Section ?? for more details.↩︎

  23. This is especially true if trees are measured on different sized plots or under point sampling where the tree factor depends on tree-specific characteristics like DBH.↩︎

  24. Notice again that plot_summary is already grouped by stand_id, so the subsequent call to summarize() produces stand-level summaries, as desired.↩︎

  25. Because the same BAF was used for both stands, we could have simply referenced the BAF variable defined outside o_stands. However, it’s common to have a stand-specific BAF, in which case you’d want to work from a BAF column in your dataset. You could use case_when() to assign stand-specific BAF values.↩︎

  26. This design, detailed in Waskiewicz et al. (2015), was only consistently applied to recent PEF data. If you dig into the pef_trees data, you’ll find that prior to 2000 many tree measurements don’t follow the DBH cutoffs given in Figure 12.12.↩︎

  27. As an aside, plot color in Figure 12.13 corresponds to the AGB_tons_per_ac value in your resulting plot_summary tibble.↩︎

  28. We could also compute the totals for number of trees and basal area; however, in practice we typically talk about these variables on a per unit area basis.↩︎

  29. We used case_when() to accomplish these first two steps in Section 12.6.1. We’re using the join approach here for demonstration—it’s a better approach in practice when you’re dealing with many units that need values assigned.↩︎

Want to know when the book is for sale? Enter your email so we can let you know.