GMI logo

Global Mapping International


Populated Places Database

Overview

Global Mapping International and the JESUS Film Project of Campus Crusade for Christ are developing a database of populated places (cities, towns, and villages) combining the best features and minimizing the deficiencies of the best freely-available populated place data sets. The resulting database includes records of approximately 2.3 million distinct places suitable for use in mapping systems and in database systems that record the locations of persons, institutions, and events. The methods used allow the database to be updated with new or updated sources. Because the data preserves all alternate names of the places from the original sources, along with phonetic renderings of all names, the data is particularly suitable for use in systems where it is necessary to look up places that may have several names or variant spellings of the name.

Intended Uses

Currently anticipated uses for the data include:

    • Site and project location lookups for WorldMap.org's partners site, which allows ministries to track the locations of their sites and activities online
    • Improved map labeling and population-based map symbols for users of the Global Ministry Mapping System and other Geographic Information Systems (GIS)
    • Possible adoption of the unique place codes as a standard within the Harvest Information System
    • A wide variety of secular and ministry applications where it is desirable to locate churches, projects, or people based on a place name

Sources

The database includes the following data sources:

Souce Advantages Disadvantages
GeoNet Names Server
  • Many places (over 2 million)
  • Many alternate names (nearly 3 million)
  • Names transliterated consistently according to published standard
  • No coverage in U.S.
  • Many records with poor coordinate precision (1 minute)
  • Uneven depth of coverage in different countries and areas of countries
  • USBGN standard transliteration not current preferred local method for some areas
USGS GNIS
  • Comprehensive (for U.S.); complementary to GNS
  • Good Populations for U.S. (2000 Census)
  • Definitive Names (USBGN standard)
  • U.S. Only

Global Gazetteer (download)

  • Population for many places (over 100,000)
  • More comprehensive than GNS for a few countries
  • Name rendering more in keeping with local preferences in some areas (including some names in local scripts)
  • Poor coordinate precision
  • Some coordinates missing (particularly for urban agglomerations)

 

 

DCW/VMAP1 Populated Place Points (download as VPF via NGA Raster Roam)(alternate download)
  • Very good coordinate accuracy
  • Consistent worldwide coverage
  • Omits secondary cities in urban areas
  • Poor name rendering
  • No population data
DCW/VMAP1 Built-Up Area Polygon Centroids(download as VPF via NGA Raster Roam)(alternate download)
  • Consistent worldwide coverage
  • Omits secondary cities in urban areas
  • Built-up area polygons frequently include other places than the one named
  • Poor name rendering
  • No population data
Instituto Nacional De Estadistica Geografia E Informatica (INEGI)(download from CIESIN)
  • Large number of populations
  • Many alternate names

 

  • Mexico only
  • No diacritics in names
Habitats Project (1994)
  • Provides populations and alternate names for some places missed by other datasets
  • Good coverage of places and their relationships in urban areas
  • Poor coordinate accuracy in some areas
  • Older populations (mostly early 1990s)

Method

The output data table is initially empty. For each distinct place in each source, all alternate names of nearby places already in the output table are compared phonetically to all alternate names of the place under consideration. If no match is found, the place is added to the output table. If the place already exists in the output table, the entry in the output table is updated to reflect any "better" information (name, coordinate, or population). More detail.

Status of Work

A first run of all of the data was completed in January, 2007, requiring 8 weeks of continuous processing time. The results were close to what we desired, but had an excessive number of places merged togther in situations where a person looking at the data would judge the names to be too dissimilar or the distances separating the places too great. As of April 2007 we are:

    • revising the matching algorithm to improve phonetic discrimination
    • decreasing the average radius searched for matches, making it dependent on the population of the place and the accuracy of the source

We currently expect that results will be available in the second half of 2007.

Limitations

While the process described here draws each component of the result (name, coordinates, and population) from the source considered most reliable, many places have data from only a single source.

Contacts

Contact at Global Mapping International or at JESUS Film Project for additional information on this project.

Donations

This project has been funded to date by contributions to Global Mapping International and the JESUS Film Project. Your contribution will help the continuing work of updating and improving this resource.

Detailed Method

The general method used to merge the various sources is as follows:

    • Decompose each data soure into entries in three tables:
      • places containing for each (supposedly) unique place in each source:
        • A unique ID composed of a source identifier and a place ID within the original source
        • The preferred name of the place for GIS labeling purposes (from among the names provided in the source
        • Place and urban agglomeration populations
      • names containing for each alternate name of each entry in places
        • Place identifier
        • Name in Unicode
        • Name in diacritic-stripped plain ASCII
        • A "normalized" name with punctuation removed and common words (e.g. various language versions of "a" "the" "saint", "city", etc. either removed or converted to standard abbreviation
        • Primary and secondary phonetic equivalents of the name using a modified version of Philips' Double Metaphone algorithm.
    • Populate a third table gis with merged, composite records (similar in structure to the places table, but with source identifiers for each component), according to the following general rules for each entry in places:
      • Search for gis table entries within a stated distance and having at least one double metaphone match among all of the corresponding entries in the names table.
      • If zero matching entries are found, copy the places record to the gis table
      • If one entry is found, update the components of the GIS record (code, name, coordinates, population) for which the current places record has better data
      • If more than one entry is found, merge existing gis records according to the above rules until a single entry is present, then update the record with new data as above.