Introduction to Global Precipitation Algorithms and Data Sets

George J. Huffman
21 January 2013, rev. 27 November 2017

How Do I Choose a Data Set?

In the modern age the good news is that there is a rich selection of precipitation products from which data users can choose. However, this relative plethora of products is also a problem for users, requiring them to determine which product(s) to use. This document is intended to assist in this process by providing information and pointers to additional information on the algorithms that are used to compute precipitation estimates and the data sets that provide the estimates. The focus here is on data sets that provide

  • global (or nearly global) coverage, with
  • multi-year periods of record, that are
  • publicly available, and
  • primarily constructed from observations and observational retrievals.

The data tables are designed to allow the user to filter through the various observations rather quickly, since dataset resolution, period of record, regional coverage, latency, and format are major considerations. The dataset provider's attitude toward user support can be gauged by the completeness and friendliness of their documentation.

1. Background

The observational data that feed into these precipitation data sets result from observational systems both at the Earth's surface and flying on satellites. Most of the data are from passive satellite observations. Thermal infrared (IR) data are routinely provided by geosynchronous Earth orbit (geo) satellites, which allows very frequent imaging. IR sensors primarily "see" cloud tops, so the correlation to precipitation at fine scales is weak. However, averaging over time/space scales of a day/2.5x2.5 latitude/longitude (or more) significantly improves the quality of the estimates (Arkin and Meisner 1987). In contrast, observations at microwave frequencies ( roughly 10-200 GHz) detect the profiles of hydrometeors in the atmosphere, which is physically more direct than the IR relation. Unfortunately, microwave sensors have only flown on low-Earth-orbit (leo) satellites, each of which provides no more than two snapshots of any given location on Earth in a day. Such sparse sampling requires use of multiple satellites to achieve fine-scale temporal coverage. Even if the user only needs five-day, monthly, or longer time averages, the short time scales of precipitation processes require multiple satellites to avoid significant sampling error. Finally, at high latitudes estimates from both IR and microwave sensors tend to be unreliable. This is a topic of current research, but more-approximate schemes based on satellite estimates of temperature, humidity, and cloud cover are already in use (Adler et al. 2003).

At the surface, the only observations that support the requirements for spatial coverage and length of record are precipitation gauges (popularly referred to as "rain gauges"), and even they suffer large gaps, principally over oceans and in in sparsely settled land regions. Nonetheless, they are included both as input and in separate analyses because of their status as the closest thing we have to actual measurements, with appropriate caution in areas of sparse coverage.

In the next two sections the algorithms and data sets are introduced. The listings on which these are based have been assembled and are maintained by the International Precipitation Working Group (IPWG). The IPWG is charged with advancing precipitation science and promoting the use of precipitation data by its parent body, the Coordinating Group for Meteorological Satellites, which is the World Meteorological Organization group focused on the operational use of meteorological satellites. The IPWG home page is

2. Algorithms

Algorithms are the computational procedures used to convert satellite and surface data into precipitation estimates. Considering satellite data, the first tier of algorithms converts sets of channel data to precipitation estimates for individual satellites. In very general terms, this means taking advantage of the characteristics of the various individual data sources discussed in the previous section. The reader should go to the algorithm pages on the IPWG web site or recent reviews (Kidd and Huffman 2011, Tapiador et al. 2012) to find information on how this is done, but a few overarching comments are in order. For example, one important implication of IR's dependence on cloud top is that no first-principles precipitation estimate is possible. Every IR algorithm is calibrated with some other source of precipitation data. In some cases this calibration is static (for example the GPI), computed for a particular developmental data set and then applied without further modification. In other cases the calibration is dynamic (for example the TMPI used in the GPCP 1DD), being recomputed on some schedule prescribed in the algorithm.

The precipitation signal in microwave data actually results from two physical effects, emission and scattering. Emission, which dominates at frequencies below 37 GHz, primarily arises from the emission of microwave energy by the surface, atmospheric vapor, and liquid cloud and hydrometeor drops. Scattering, which dominates at frequencies above 37 GHz, results from solid hydrometeors scattering upwelling microwave energy out of the line of sight of the satellite instrument. Solid hydrometeors tend to be confined to the upper levels of clouds, so the scattering signal is less sensitive to the surface precipitation than the emission signal. Also, snow, ice, and frost on the ground are highly effective at scattering, so such surfaces prevent a useful scattering-based precipitation estimates. Both emission and scattering are useful over ocean, but only scattering over land, because the land's high emissivity and strong heterogeneity defeat current-generation emission algorithms. Accordingly, microwave estimates over land tend to be considered less reliable. Developing snow algorithms is a major focus in current research, including in the Global Precipitation Measurement mission (GPM). It is also the case that coastal areas and regions with complex terrain pose particular challenges for microwave-based algorithms.

Analyses of precipitation gauge data can be considered single-sensor algorithms as well. Assembling, quality-controlling, and archiving the necessary data records from the international collection of weather stations is a major barrier to creating such data sets. Another issue is that the effects of wind, evaporation, and side-wetting, usually resulting in an undercatch of precipitation, should be considered in the analysis. Wind losses are particularly significant in snow events (Legates 1987; Sevruk 1989).

A second tier of algorithms uses various combinations of satellite sensors. Note well that, except for the TCI, which uses TRMM PR and TMI channel data, all of the combination schemes combine precipitation products from the various individual sensors. The general goal is to provide denser time/space coverage with higher quality than any single-sensor algorithm can provide. Although the combination schemes seek to emphasize strengths and minimize weaknesses in the individual-sensor estimates, many of the shortcomings found in combination algorithm results are directly traceable to errors in the individual-sensor estimates.

The IPWG seeks to obtain algorithm summaries both from developers whose work results in the datasets discussed in the next section, and from developers whose algorithms are not represented in the next section. The latter group might provide an early look at possible future directions for quasi-operational data sets. Of course, not all data sets in the IPWG have a corresponding algorithm description. The algorithm descriptions are posted at

3. Data Sets

The actual precipitation data sets which are considered to be global (or nearly global), multi-year, and publicly available are summarized in four tables posted on These tables are organized along the lines discussed above to help users more easily compare products:
  • Table 1: Combination datasets with gauge data; data sets that are produced by combining input data from several sensor types, including satellite sensors and precipitation gauges.
  • Table 2: Satellite combination datasets; data sets that are produced by combining input data from several satellite sensor types.
  • Table 3: Single-source datasets; data sets from a single satellite sensor type.
  • Table 4: Precipitation gauge analyses; data sets from precipitation gauge data.

The goal is to provide the user with a basic listing of the data sets' characteristics, namely the algorithm, input data, space/time scales, area coverage/start date, update frequency (how often the data set is produced), and latency (how long after observation time the data set is produced), to facilitate a quick comparison, together with a URL (preferably) or an e-mail address to access the data. Most novice users are likely to best be served by the data sets in Table 1, although if near-real-time access is important, those data sets are found in Table 2.

All of these data sets, including the precipitation gauge analyses, are provided on regular spatial grids, meaning they provide area averages. All of the datasets employ a uniform latitude/longitude grid at the present time. Thus, users who wish to have spatial averages over defined regions, such as a country or watershed, must compute these quantities themselves. However, several data centers are researching the possibility of routinely providing averages over defined regions, most likely countries and specific watersheds. In the time dimension there is more diversity in whether the data represent snapshots in time or explicit time averages. However, all the datasets organize the data as a time sequence of maps; time sequences at individual locations must be constructed by the user. Some data archive sites provide simple data display tools; users should consult the sites' documentation pages.

In addition to the data set tables, the IPWG provides a listing of "products" at, which in some cases duplicates listings in the data sets. However, this listing also references more general web sites, some of which provide value-added services, such as on-line analysis capabilities for some subset of the precipitation data sets.

4. Contacts

We anticipate that most questions about the algorithms and data sets are properly directed to the relevant experts. Questions about this summary and the listings may be directed to the IPWG webmaster (Vincenzo Levizzani, or the appropriate document custodian (George Huffman,

5. References

Adler, R.F., G.J. Huffman, A. Chang, R. Ferraro, P. Xie, J. Janowiak, B. Rudolf, U. Schneider, S. Curtis, D. Bolvin, A. Gruber, J. Susskind, P. Arkin, E.J. Nelkin, 2003: The Version 2 Global Precipitation Climatology Project (GPCP) Monthly Precipitation Analysis (1979-Present). J. Hydrometeor., 4, 1147-1167.

Arkin, P.A., B.N. Meisner, 1987: The Relationship between Large-Scale Convective Rainfall and Cold Cloud over the Western Hemisphere during 1982-1984. Mon. Wea. Rev., 115, 51-74.

Kidd, C., G.J. Huffman, 2011: Global Precipitation Measurement. Meteor. Appl., 18, doi:10.1002/met.284, 334-353.

Legates, D.R., 1987: A Cimatology of Global Precipitation. Publications in Climatology, 40, Univ. of Delaware,85 pp.

Sevruk, B., 1989: Reliability of Precipitation Measurements. Proc. WMO/IAHS/ETH Workshop on Precipitation Measurements, St. Moritz, Switzerland, World Meteor. Org., 13-19.

Tapiador, F.J., F.J. Turk, W. Petersen, A.Y. Hou, E. Garcia-Ortega, L.T. Machado, C.F. Angelis, P. Salio, C. Kidd, G.J. Huffman, M. de Castro, 2012: Global Precipitation Measurement: Methods, Datasets and Applications. Atmos. Res., 104-105, doi:10.1016/j.atmosres. 2011.10.021, 70-97.