HIS Registry of Geography

Date: 2002-06-23
Status: Draft Standard. This is only a preliminary draft that is still under development.
Abstract:

Documents the Registry of Geography (ROG) for the Harvest Information System (HIS). This registry defines the standardized codes used for identifying geopolitical areas of the world.

Editor: Bill Dickson, Global Mapping International ( mailto:bill@gmi.org)

Table of contents

  1. Overview
  2. Code tables
  3. Other tables
  4. Change management
  5. Change history
  6. Known Limitations
  7. Distribution

1. Overview

The function of the Registry of Geography (ROG) is to document a set of standardized codes used for identifying geopolitical areas. The present code covers three levels of political geography: World (Level 0), Countries and Areas of Special Sovereignty (Level 1), and first-level administrative divisions within countries (Level 2). Codes at Levels 1 and 2 are an extension of the codes used by the United States National Imagery and Mapping Agency (NIMA) and generally referred to as Federal Information Processing Standard (FIPS) 10-4 and subsequent changes. The base code set used here is obtained by direct download fron NIMA and contains codes which have not yet been published in the formal FIPS 10-4 standard. The registry includes cross-references to other coding and naming systems, including the ISO 3166 standard for country names. The registry is designed to be extensible to lower levels of politcal subdivisions within countries and to accomodate intermediate levels of political subdivisions.

The registry contains a code table, two supplementary tables, and a change history table:

The tables relate to each other as indicated the following diagram:

It is anticipated that other HIS registries will use these codes preferentially in discribing the geographic locations of the entities they describe.

2. Code tables

The registry contains just one code table: ROG_Political.

ROG_Political: Code for Geopolitical Regions

The domain of the categories for this code set politically-defined land areas of the world. Codes of variable length using printable ASCII characters are defined at different levels of political geography: at present, the level 0 (World) code is one character long, level 1 (countries and areas of special sovereignty) codes are two characters long, level 2 (first-level political subdivisions within countries) are four characters long, with the first two characters being the code for the country of which the first level subdivision is a part. Our intent wherever reasonably possible is to continue this pattern to lower levels of political geography, adding two characters at each level to a maximum code length of 16 characters. We recognize the possibility that situations could theoretically arise where more political subdivisions exist at a particular level than can be encoded with two additional characters. The tilde (~) character appearing anywhere in a code is reserved to indicate that the code beyond the tilde departs from the standard two-characters-per-level hierarchical scheme. At levels 1 and 2, where the majority of codes are derived from the NIMA/FIPS 10-4 coding system, which uses two upper-case alpha letters for the code at level 1 (countries and areas of special sovereignty) and suffixes two additional digits or uppercase letters at level 2 (first level political subdivisions of level 1 entities, or ADM1 in the NIMA feature designation system), codes assigned by GMI to supplement the NIMA codes will be recognizable by using characters not used by NIMA, i.e. lower case letters or other printable ASCII characters for the portions of the hierarchical code created by GMI.

The code table for ROG_Political contains the following four columns:

Column Format Description
Level decimal(2.1) The level of the geopolitcal region in a geopolitical hierarchy, with 0.0 representing the World; 1.0 representing countries and areas of special sovereignty, 2.0 representing first-level administrative divisions within countries, etc. Non-integer values may be used in future versions of the standard to indicate intermediate levels. As an example, the NIMA code we adopt uses counties as the first-level political divisions of the United Kingdom. A higher-level of political subdivison, with entities such as England, Scotland and Wales, exists within the United Kingdom. Were it to prove desirable to add codes for these entities, they might be assigned a Level of 1.5.
Code char(16) The HIS code for the geopolitical region.
Status   char(1) A one-letter code with one of three possible values for the status of the geopolitcal region code.
C   Current NIMA. The code is believed to be current in the NIMA code set.
H Current HIS. A current code assigned by the HIS steward for a currently-valid geopolitical region not represented in the NIMA code set.
R Retired. A code for a geopolitical region that is not currently in existence, but may be validly appear in historical data. Note: Retired codes are not currently included in the standard, but may be added in the future if the HIS steward receives requests for codes for historically-valid geopolitical entities.
Name varchar(128) A transliteration of the region name represented in standard ASCII with all diacritics removed. Some symbols may be included, such as the apostrophes, exclamation points, or other ASCII characters. to represent speech sounds not present in English. In the case of countries, the name used is the shortest conventional English name of the country.
NameSrc char(8) Source from which the region names are derived; see ROG_Sources for values.
NameD varchar(254) A transliteration of the region name into latin characters with diacritics. In general, these are intended to more accurately represent the local pronunciation of the name. See note under Status below for names of Retired codes. Where the source of the information is indicated as NIMA, the relationship between the romanized name and the local alphabet is documented in Romanization Systems and Roman-Script Spelling Conventions, item TBGNROMANGUIDE, available from the U.S. Geological Survey at http://mac.usgs.gov/mac/isb/pubs/forms/nimapl.html.
NameDChar char(1) The character encoding used to represent the diacritical form of the name. Two encodings are currently used. "I" indicates ISO 8859-1 single-byte Latin 1 characters (equivalent for practical purposes to the Microsoft Windows ANSI code page 1252 character encoding); "H" indicates HTML-encoded text, potentially with Unicode characters encoded in hexadecimal in accordance with HTML 4.0 specifications. In general, character sets indicated are those necessary to represent all data from a particular source, so some records from a source indicated as, for example, HTML-encoded Unicode, may not contain any Unicode values. Where diacritical names originate from NIMA, translation from the original NIMA 8-bit code sets to Unicode has been done in accordance with http://164.214.2.59/gns/html/regions.pdf. Many fonts claiming to be Unicode compatible, and particularly those provided by Microsoft at the time this is written, do not include all of the Unicode characters required (particularly applied diacritics) to render NIMA names translated into Unicode according to NIMA specifications. A font that works well for this purpose is distributed by Thesaurus Indogermanischer Text- und Sprachmaterialien (TITUS) and may be downloaded at http://titus.fkidg1.uni-frankfurt.de/unicode/tituut.asp. The TITUS Cyberbit Basic font is a TrueType font for Microsoft Windows, and may only be used for non-commercial purposes.
NIMA_Reg integer For NIMA-derived names, the region code identifying the original regional character set from which the HTML-encoded Unicode name was derived.

The SQL statement for creating this table is as follows:

CREATE TABLE ROG_Political (
  Level     decimal(2.1) NOT NULL,
  Code      char(16) NOT NULL,
  Status    char(1) NOT NULL,
  Name      varchar(128) NOT NULL,
  Name_Src  char(8) NOT NULL,
  NameD     varchar(254) NOT NULL,
  NameDchar char(1) NOT NULL,
  NIMA_Reg 	integer)

For instance, the entries for the United Arab Emirates appear as follows:

Level Code Status CodeSrc  Name                 NameSrc  NameD                          NameDChar NIMA_Reg
===== ==== ====== ======== ==================== ======== ============================== ========= ========
1.0   AE   C      NIMA_C_L United Arab Emirates NIMA_C_L United Arab Emirates           I         0
2.0   AE01 C      NIMA_A_L Abu Zaby             NIMA_A_L Abū Z̧aby H         3
2.0   AE02 C      NIMA_A_L `Ajman               NIMA_A_L ‘Ajmān           H         3
2.0   AE03 C      NIMA_A_L Dubayy               NIMA_A_L Dubayy                         H         3
2.0   AE04 C      NIMA_A_L Al Fujayrah          NIMA_A_L Al Fujayrah                    H         3
2.0   AE05 C      NIMA_A_L Ra's al Khaymah      NIMA_A_L Ra's al Khaymah                H         3
2.0   AE06 C      NIMA_A_L Ash Shariqah         NIMA_A_L Ash Shāriqah            H         3
2.0   AE07 C      NIMA_A_L Umm al Qaywayn       NIMA_A_L Umm al Qaywayn                 H         3

If you are viewing this document in a Unicode-capable browser and have the "TITUS Cyberbit Basic" font installed, you should see the corret rendering (with a bar over the u and a hook under the Z) of the Unicode string for Abu Zaby here: Abū Z̧aby. If you have only the standard Arial Unicode MS font installed, you will probably have one or more missing diacritics.

2. Other tables

The registry contains two supplementary tables.

ROG_PoliticalXref: ROG Political Cross-Reference Index

This supplementary table offers an cross-reference between the HIS codes and names for geopolitcal region and alternate codes and names.

The ROG Cross-Reference Index contains the following four columns:

Column Format Description
Code char(16) HIS geopolitical region code
CodeRel   char(1) A one-character code that indicates the relationship between the geopolitical area represented by the HIS code and the most nearly corresponding code(s) in another system.
< The HIS code describes a smaller area than the alternate code. There are probably additional records in this table relating other HIS entities related to the same alternate code entity.
=   The HIS code describes substantially the same area as the alternate code.
> The HIS code describes a larger area than the alternate code. There are probably additional records in this table relating other alternate code entities to the same HIS entity.
AltCode varchar(45) A code representing a region that at least partially overlaps, and might potentially be of use in encoding, the area designated by the corresponding HIS code. If the record represents an alternate name for a region from a source which does not associate the name with a code, the code field will be left empty.
Source char(8) Source for this record.
NameType char(2)

The type of name represented by this record. In many cases, sources do not clearly distinguish name types; in such cases, the name type assigned is likely to be based on the Steward's best assessment of the predominant name type of the source, rather than an individual assessment of each record.

C Conventional. A name as conventionally used by English Speakers
CL   Conventional, Long. The long form name, where a source provides both a long-form and short-form conventional name.
CS Conventional, Short. The short- form name, where a source provides both a long-form and short-form conventional name.
N Native. The name of the location transliterated from a local or national language.
V Variant or Alternate. Variant spelling, alternate name, former name, etc.
D   Dagger (not verified). Name not verified from a recent local official source
Name varchar(128) A transliteration of the region name represented in standard ASCII with all diacritics removed. Some symbols may be included, such as the apostrophes, exclamation points, or other ASCII characters to represent speech sounds not present in English.
NameD varchar(254) A transliteration of the region name into Latin characters with diacritics. See notes for the field of the same name in the ROG_Politcal table for notes on character encoding and fonts.
NameDChar char(1) The character encoding used to represent the diacritical form of the name. See notes for the field of the same name in the ROG_Political table for permitted values.
NIMA_Reg integer For NIMA-derived names, the region code identifying the original regional character set from which the HTML-encoded Unicode name was derived.

The SQL statement for creating this table is as follows:

CREATE TABLE ROG_PoliticalXref(
Code char(16)NOT NULL,
CodeRel char(1)NOT NULL,
AltCode varchar(45),
Source char(8) NOT NULL,
NameType char(2)NOT NULL,
Name varchar(128)NOT NULL,
NameD varchar(254)NOT NULL,
NameDChar char(1)NOT NULL,
NIMA_Reg integer)

The following shows the entries in the name index for Tromelin Island, an entity with an assigned HIS Code, but which is considered part of Réunion in the ISO coding scheme.:

Code CodeRel AltCode Source NameType Name            NameD	         NameDChar NIMA_Reg
TE   <       RE      ISO_A2 CS       Reunion         Réunion         I         0
TE   <       REU     ISO_A3 CS       Reunion         Réunion         I         0
TE   =       TE      NIMA   C        Tromelin Island Tromelin Island H         1 
TE   =       TE      NIMA   N        Ile Tromelin    Île Tromelin    H         1

We see that TE also has two alternate names originating from the NIMA names table, one a conventional English name and one the local (French) name. Each of these is represented in both diacritic-stripped and diacritical form. Both Name and NameD fields will be populated even if the name contains no characters with diacritics.

ROG_Sources: Sources for Geographic Names and Codes

This supplementary table identifies the sources of names and codes used in other tables of the ROG.

Column Format Description
Source char(8) Source Designation code
Description varchar(128) Description of source
URL   char(254) A URL, if available, from which one can obtain further information (and possibly updated data) pertaining to a source.
Notes  char(254) Notes about the source

The SQL statement for creating this table is as follows:

CREATE TABLE ROG_Sources (
   Source  char(8) NOT NULL,
   Description varchar(128) NOT NULL,
   URL varchar(254),
   Notes varchar(254))

The following shows the entries in the name index for the three typical sources. Notes field is word wrapped for presentation here, but continuous in data.:

Source   Description                                   URL                                            Notes                          
-------- -----------                                   ----------------------                         -----
HIS      Codes/Names assigned by HIS Steward		   http://www.gmi.org/HIS
NIMA_C_L NIMA GeoNet Names Server Country Lookup table http://gnpswww.nima.mil/geonames/GNS/index.jsp From URL indicated, choose
                                                                                                      Country Codes under
                                                                                                      "Lookup Tables", enter blank
                                                                                                      form.
NIMA_A_L NIMA GeoNet Names Server ADM1 Lookup table    http://gnpswww.nima.mil/geonames/GNS/index.jsp From URL indicated, choose
                                                                                                      "ADM1 Codes" under "Lookup 
                                                                                                      Tables", enter blank form. 
                                                                                                      Note that names are in
                                                                                                      proprietary NIMA code sets.

4. Change management

This section defines the process that the registry steward will follow to maintain the registry.

Governing philosophy

Additions or changes will be made to the when the submitter can demonstrate that a code is needed to describe a validly existing (either present or historical) geographic region not represented in the existing code, or that the name(s) associated with a code are in error. In keeping with the philosophy of having a code represent a specific geographic region, codes will not be changed simply to have a better mnemonic relationship to the current name of a region.

How to make a change request

If you believe any of the information in the is in error, wish to have additional codes assigned, send your proposed change by e-mail to ROG_Steward@gmi.org, by fax to +1-719-548-7459, or by mail to ROG Steward, Global Mapping International, PO Box 63719, Colorado Springs CO 80962-3719, USA. Be sure to report the source of your information and the relationship of the regions for which codes are requested to regions already coded in the HIS system. If at all possible, particularly when requesting assignment of codes for a major rearrangement of political subdivisions or a level of subdivisions not previously assigned codes, provide a URL to an authoritative site giving preferred local names of the new subdivisions in Latin characters (with diacritics if used locally). Sites including maps of the regions are strongly preferred, as they will clearly show the spatial relationship with existing regions. If submitting requests for codes by mail, if at all possible include a printed form of the region names for which codes are requested, a floppy disk or CD-ROM with the same information, and if at all possible an authoritative map showing the boundaries of the regions for which codes are requested.

If you are requesting new code assignments at a country or first-level political subdivion level, please check the appropriate lookup table on the NIMA Geonet Names Server at http://gnpswww.nima.mil/geonames/GNS/index.jsp to see if codes have already been assigned and if you believe the names to be accurate. If the NIMA's codes and names are correct, simply use the NIMA codes and notify us that the change has occurred.

How change requests are processed

In straighforward cases (i.e. situations where the names of regions for which codes are requested are in machine-readable form, documented by authoritative sources, and undisputed), GMI will generally provide new codes within three weeks. Additional time will be required if extensive research, verification, manual key entry, or character code conversion is required; in these cases we will notify you within three weeks of your submission of the expected time required for code assignment.

How updates are made

The registry will be updated and posted whenver requested changes are made. We will also update the HIS standard to conform with NIMA standards whenever we become aware of changes at a country level, or at least once per year. The most recent version will always be available for download from this page.

What to do in the meantime

The plus symbol (+) will never be assigned as part of code at any level, and are reserved for local use. Thus, if you need a code for second-level political subdivisions within an existing first-level subdivision XX01, you could encode the first one XX01+A and increment the final character as needed to code all new areas.

5. Change history

The registry contains just one change history table.

ROG_PoliticalChangeHistory

All changes to ROG_Political reported in ROG_PoliticalChangeHistory. This table is cumulative, listing all changes to successive versions of the registry. The table has the following four columns:

Code char(16) The code that is affected by the change reported in this record.
Type char(1) A one-letter code indicating the type of change: There are four possible values:
C Created. The geopolitical region code is newly added to the HIS standard.
E   Extended. Included in definition for conformance with HIS registry standards. Our intent is that this Type will never be used for geographic data, since by intention codes will always be retired if the land area represented changes.
R Retired. The geopolitical region code has been retired and should no longer be used in a database of current political entities. Retired codes may potentially be used for coding historical data.
U Updated. There has been no change to the code or its meaning, but other information in the code table entry or cross-reference information has changed.
Date char(10) The date the change was released in a new version of the registry. Dates are expressed as 8 digits with hyphens to separate the parts of the date, e.g. YYYY-MM-DD.
Description varchar(255) Describes the change. In the case of R changes, it also describes what a user should do to fix existing data that uses the now retired code.

Note that there is not a change type for the case of narrowing the meaning of a code, such as when the region denoted by one code is split into two regions. In such a case, the original code is retired, and two new codes are added. In this way, the user of the code set is assured that once a code has been used to tag an item of data, it will continue to be the right code to use for as long as the code remains an active member of the code set.

The SQL statement for creating the change history table is as follows:

CREATE TABLE ROG_PoliticalChangeHistory (
   Code         varchar(16) NOT NULL,
   Type         char(1) NOT NULL,
   Date         char(10) NOT NULL,
   Description  varchar(255) )

The change history table holds the cumulative list of all changes that have every been made to the registry. Thus it may be queried to learn the complete history of a given code, or to learn all the changes that have been made since a given date. For instance, the following SQL query would be used to find out what changes have occurred since the beginning of 2002:

SELECT * FROM ROG_PoliticalChangeHistory WHERE Date >=2002-01-01

For a site that has used ROG_Political codes in its own database, an important use of the change history table is to discover codes used in its data that are now obsolete and thus need to be changed. These will be only the codes that have been retired. Thus a full list of all data records needing to be changed can be found by doing a JOIN on the change history table. For instance, if the column named code in MyTable holds an ROG_Political code, then the following SQL statement will select all records that need to be changed due to changes to the code set since the beginning of 2002:

SELECT * FROM MyTable as M
JOIN ROG_PoliticalChangeHistory as C ON M.code=C.Code
WHERE C.Type='R' AND C.Date >=2002-01-01

Note that the Description field of the joined result set will describe what needs to be done to bring the offending code up-to-date.

6. Known Limitations

The following limitations are known in the current code set, and are expected to be addressed in the future:

7. Distribution

A complete distribution of the includes:

The following versions are available: