Algorithm 3A25 - Space Time Statistics of Level 2 PR Products

Objective of the Algorithm

To calculate various statistics over a month from the level 2 PR output products.  Four types of statistics are calculated:
 
   a. probabilities of occurrence (count values)
   b. means and standard deviations
   c. histograms
   d. correlation coefficients
 
The standard space scale is a 5 degree by 5 degree latitude x longitude cell.  A subset of the products, however, is also produced over 0.5 degree x 0.5 degree cells

Description of the HDF output variables from 3a-25 can be found in Volume 4 - levels 2 and 3 file specifications provided by the TRMM Data and Information System (TSDIS).  The document is available at:   http://tsdis02.nascom.nasa.gov/tsdis/Documents/ICSVol4.pdf.

Processing Procedure:
 
The basic steps in the procedure are:

i. read in data (scan by scan) from 2a-21, 2a-23, 2a-25 and 1c-21

ii. adjust the numbering conventions so that Zm, Zt and R are aligned properly; this is done by using the anchor point of binEllipsoid in 1c-21 and the corresponding
bin ellipsoid of 2a-25 which, by convention, is the 80th element of Zt

iii. find the coarse and fine resolution boxes to which each of the 49 observations belong.  Note that a single scan is composed of 49 observations each at a different
incidence angle.

            (coarse resolution boxes are 5 degree x 5 degree cells)
            (fine resolution boxes are 0.5 degree x 0.5 degree cells)

iv. resample Zm, Zt and R from the range direction onto the vertical

v.  update the various statistics

vi. if a month transition occurs within the granule, write the HDF output file and reinitialize the intermediate files

Comments and Issues:
 
i.  With the exception of one quantity, all statistics in 3a-25 are computed only when rain is judged in 1c-21 to be 'certain'.  What this means is that when rain is judged in 1c-21 to be 'possible' the observation is treated as a 'no-rain' observation.  The one exception to this rule is the near-surface rain rate.  For this quantity, the statistics
(mean, standard deviation and histogram) are computed for 'rain-possible' as for the usual 'rain-certain'.
 
The near-surface rain rate statistics computed under 'rain-possible' and 'rain-certain' conditions are:

-    Low resolution products (5 x 5 degrees x 1 month)

    surfRainAllPix1(i,j):  total counts of 'rain-possible' and 'rain-certain' at (latitude, longitude) box = (i,j)
    surfRainAllMean1(i,j): mean rain rate (mm/h), given rain is present
    surfRainAllDev1(i,j):  standard deviation of the rain rate (mm/h), given rain is present
    surfRainAllH(i,j,30):  histogram classified into 30 bins

-    High Resolution products (0.5 x 0.5 x 1 month)

    surfRainAllPix2(i,j):  total counts of 'rain-possible' and 'rain-certain' at (latitude, longitude) box = (i,j)
    surfRainAllMean2(i,j): mean rain rate (mm/h), given rain is present
    surfRainAllDev2(i,j):  standard deviation of the rain rate (mm/h), given rain is present

The statistics of near-surface rain rate computed only under 'rain-certain' conditions are denoted by:

-    Low resolution products (5 x 5 degrees x 1 month)

    surfRainPix1(i,j):  total counts of 'rain-possible' and 'rain-certain' at (latitude, longitude) box = (i,j)
    surfRainMean1(i,j): mean rain rate (mm/h), given rain is present
    surfRainDev1(i,j):  standard deviation of the rain rate (mm/h), given rain is present
    surfRainH(i,j,30):  histogram classified into 30 bins

-    High Resolution products (0.5 x 0.5 x 1 month)

    surfRainPix2(i,j):  total counts of 'rain-possible' and 'rain-certain' at (latitude, longitude) box = (i,j)
    surfRainMean2(i,j): mean rain rate (mm/h), given rain is present
    surfRainDev2(i,j):  standard deviation of the rain rate (mm/h), given rain is present

Because the 'rain-possible' cases are dominated by noise so that the probability of false-alarm is high, the 'rain-certain' statistics should be considered more representative of the TRMM radar data.

ii.  There are 2 definitions of zeta and nubf (from 2a-25).  In both cases the original definitions of these quantities are used; i.e., the first element of the array.

iii.  The height levels are being defined relative to the ellipsoid and not the local surface.   This may cause difficulties in the interpretation of the statistics over some land areas at the lower height levels because the level can be below the local surface.  In these cases, the rain rate is always set to some flag value and is not counted in the statistics.  On the other hand, ttlPix1 (or ttlPix2), the total number of valid observations at the low (high) resolution averaging box, will be incremented so that the
observations 'below the surface' will be counted as 'no-rain' events. This will introduce a negative bias into the mean rain rate at the (lat,long) box in question.

iv. Missing data scans are being checked by monitoring the scanStatus flags in 1C-21.  If this indicates a missing scan, no processing is done for that scan.  Checks for
individual missing variables are not being done explicitly, however.
 
v. There are several subtle, interrelated issues regarding the definitions of rain and no-rain and how these definitions affect the statistics.  For most of the output products from level 2, numbers that represent a physical quantity (non-flagged values) are being output only if the minEchoFlag variable  in 1c-21 is set to 'rain-certain'.    However, an important category of products (Zt and rain rate from 2a-25 and Zm from 1c-21) are being output under rain-possible conditions.  With the exception noted above
(in comment i.) only those products for which rain detection is classified as 'certain' are included in the statistics (that is, the statistics conditioned on rain being present).  Although some rain events will be missed, the advantage of this selection is that the set of products should be self-consistent.

vi. The quantity 'minEchoFlag' (from 1b-21 and 1c-21) provides information on the presence/absence of rain along each of the 49 angle bins that comprise the cross-track scan.  To test whether rain is present at a particular range bin or height above the ellipsoid, a threshold value must be used. Presently, this threshold is dBZt > 0.01 dB so that if minEchoFlag indicates the certainty of rain along the beam and if dBZt > 0.01 dB  at a particular range bin or height level, then the data  (e.g., rain rate, dBZm, dBZt, etc) are used in the calculation of the statistics (mean and standard deviation).

A difficulty arises in defining the histograms for the rain rates. The lowest histogram bin for dBZt and dBZm is taken from 0.01 dB to 12 dB; the subsequent bins are taken equal to 2 dB so that the bin boundaries are 14 dB, 16 dB,..., 70 dB.  Since the Z-R relationship that is used in 2a-25 can change depending on the storm type and vertical structure, and because the histogram bins must be fixed, the bins for the quantity 10 log R (where R is the rain rate in mm/h) are determined from the nominal relationship Z = 200 R^1.6 or in dB:

             dBR = 0.625 dBZ - 14.38 .

For example, the dBZ histogram bin from 12 dB to 14 dB corresponds to the rain rate histogram bin from -6.88 dB to -5.63 dB.  The lowest dBR value (the lower boundary
of the first bin) is 0.625 * 0.01 - 14.38 = -14.32 dB.  It is possible, however, for dBR to be less than this because the actual Z-R relationship used in 2a-25 differs from the nominal relationship.  In order to count all non-zero rain rates (under 'rain-certain' conditions), the lower boundary of the first dBR histogram bin is set to -20 dB rather than -14.32 dB.   The reason for doing this is to ensure that the number of data points that are categorized in the rain rate histogram are equal to the number of data points used in the calculation of the mean and standard deviation of this quantity.

vii. There are 3 types of rain rates that are defined in 3a-25.
 
The first is a 'near-surface' rain rate that is obtained from the range bin closest to the surface which is not corrupted by the surface clutter.  Two sets of products are being computed from these data:  the first set of statistics uses only those rain rate for which rain is classified as 'certain'; the second set uses those rain rates for which rain is classified either as 'possible' or 'certain'.
 
The second type of rain rate is the path-averaged rain rate calculated by summing the values from the storm top (first gate where rain is detected) to the last gate (gate nearest to the surface uncontaminated by the surface clutter) and dividing by the number of gates in the interval.

The third type of rain rate is that at a fixed height above the ellipsoid (2, 4, 6, 10 and 15 km).  For an arbitrary incidence angle there will be several range gates that intersect the height: to estimate dBZm, dBZt and rain rate at that height, a gaussian weighting is done in dB space for the reflectivity factors and in linear space for the rain rates. This resampling lowers the minimum detectable threshold which, in turn, effects the histogram counts in the 2 lowest bins.  In other words, the histogram counts at the lowest 2 bins will generally be larger for the height profiled quantities than for the 'near-surface' or 'path-averaged' quantities.