| Date: | 2002-06-23 |
|---|---|
| Status: | Draft Standard. This is only a preliminary draft that is still under development. |
| Abstract: |
Documents the Registry of Geography (ROG) for the Harvest Information System (HIS). This registry defines the standardized codes used for identifying geopolitical areas of the world. |
| Editor: | Bill Dickson, Global Mapping International ( mailto:bill@gmi.org) |
The function of the Registry of Geography (ROG) is to document a set of standardized codes used for identifying geopolitical areas. The present code covers three levels of political geography: World (Level 0), Countries and Areas of Special Sovereignty (Level 1), and first-level administrative divisions within countries (Level 2). Codes at Levels 1 and 2 are an extension of the codes used by the United States National Imagery and Mapping Agency (NIMA) and generally referred to as Federal Information Processing Standard (FIPS) 10-4 and subsequent changes. The base code set used here is obtained by direct download fron NIMA and contains codes which have not yet been published in the formal FIPS 10-4 standard. The registry includes cross-references to other coding and naming systems, including the ISO 3166 standard for country names. The registry is designed to be extensible to lower levels of politcal subdivisions within countries and to accomodate intermediate levels of political subdivisions.
The registry contains a code table, two supplementary tables, and a change history table:
A code in this represents a specific, politically-defined land area. This is an important distinction: many other codes, including the commonly used ISO codes used in the Internet domain naming system, are codes for names of political regions. In such codes, major changes in the land area of a political unit may not be reflected by a code change, while a change in only the name of a region may artificially produce code change. In recent history, such changes have resulted in the same code referring, over time, to both the former West Germany and the present, unified Germany, (a much larger entity), or, in other case, to both the former Yugoslavia and current remainder state of Serbia and Montenegro (a much smaller entity). The HIS code, and its underlying NIMA/FIPS code, intend to change codes if, and only if, there are substantive changes in the land area represented by the code.
Provides an cross reference to alternate names and codes for geopolitical areas. The cross-reference provides means for describing less than exact correspondence between different coding systems.
Indentifies underlying sources for code and cross-reference information.
Documents the history of changes to the code set for geopolitcal regions.
The tables relate to each other as indicated the following diagram:

It is anticipated that other HIS registries will use these codes preferentially in discribing the geographic locations of the entities they describe.
The registry contains just one code table: ROG_Political.
The domain of the categories for this code set politically-defined land areas of the world. Codes of variable length using printable ASCII characters are defined at different levels of political geography: at present, the level 0 (World) code is one character long, level 1 (countries and areas of special sovereignty) codes are two characters long, level 2 (first-level political subdivisions within countries) are four characters long, with the first two characters being the code for the country of which the first level subdivision is a part. Our intent wherever reasonably possible is to continue this pattern to lower levels of political geography, adding two characters at each level to a maximum code length of 16 characters. We recognize the possibility that situations could theoretically arise where more political subdivisions exist at a particular level than can be encoded with two additional characters. The tilde (~) character appearing anywhere in a code is reserved to indicate that the code beyond the tilde departs from the standard two-characters-per-level hierarchical scheme. At levels 1 and 2, where the majority of codes are derived from the NIMA/FIPS 10-4 coding system, which uses two upper-case alpha letters for the code at level 1 (countries and areas of special sovereignty) and suffixes two additional digits or uppercase letters at level 2 (first level political subdivisions of level 1 entities, or ADM1 in the NIMA feature designation system), codes assigned by GMI to supplement the NIMA codes will be recognizable by using characters not used by NIMA, i.e. lower case letters or other printable ASCII characters for the portions of the hierarchical code created by GMI.
The code table for ROG_Political contains the following four columns:
| Column | Format | Description | ||||||
| Level | decimal(2.1) | The level of the geopolitcal region in a geopolitical hierarchy, with 0.0 representing the World; 1.0 representing countries and areas of special sovereignty, 2.0 representing first-level administrative divisions within countries, etc. Non-integer values may be used in future versions of the standard to indicate intermediate levels. As an example, the NIMA code we adopt uses counties as the first-level political divisions of the United Kingdom. A higher-level of political subdivison, with entities such as England, Scotland and Wales, exists within the United Kingdom. Were it to prove desirable to add codes for these entities, they might be assigned a Level of 1.5. | ||||||
| Code | char(16) | The HIS code for the geopolitical region. | ||||||
| Status | char(1) | A one-letter code with one of three possible values
for the status of the geopolitcal region code.
| ||||||
| Name | varchar(128) | A transliteration of the region name represented in standard ASCII with all diacritics removed. Some symbols may be included, such as the apostrophes, exclamation points, or other ASCII characters. to represent speech sounds not present in English. In the case of countries, the name used is the shortest conventional English name of the country. | ||||||
| NameSrc | char(8) | Source from which the region names are derived; see ROG_Sources for values. | ||||||
| NameD | varchar(254) | A transliteration of the region name into latin characters with diacritics. In general, these are intended to more accurately represent the local pronunciation of the name. See note under Status below for names of Retired codes. Where the source of the information is indicated as NIMA, the relationship between the romanized name and the local alphabet is documented in Romanization Systems and Roman-Script Spelling Conventions, item TBGNROMANGUIDE, available from the U.S. Geological Survey at http://mac.usgs.gov/mac/isb/pubs/forms/nimapl.html. | ||||||
| NameDChar | char(1) | The character encoding used to represent the diacritical form of the name. Two encodings are currently used. "I" indicates ISO 8859-1 single-byte Latin 1 characters (equivalent for practical purposes to the Microsoft Windows ANSI code page 1252 character encoding); "H" indicates HTML-encoded text, potentially with Unicode characters encoded in hexadecimal in accordance with HTML 4.0 specifications. In general, character sets indicated are those necessary to represent all data from a particular source, so some records from a source indicated as, for example, HTML-encoded Unicode, may not contain any Unicode values. Where diacritical names originate from NIMA, translation from the original NIMA 8-bit code sets to Unicode has been done in accordance with http://164.214.2.59/gns/html/regions.pdf. Many fonts claiming to be Unicode compatible, and particularly those provided by Microsoft at the time this is written, do not include all of the Unicode characters required (particularly applied diacritics) to render NIMA names translated into Unicode according to NIMA specifications. A font that works well for this purpose is distributed by Thesaurus Indogermanischer Text- und Sprachmaterialien (TITUS) and may be downloaded at http://titus.fkidg1.uni-frankfurt.de/unicode/tituut.asp. The TITUS Cyberbit Basic font is a TrueType font for Microsoft Windows, and may only be used for non-commercial purposes. | ||||||
| NIMA_Reg | integer | For NIMA-derived names, the region code identifying the original regional character set from which the HTML-encoded Unicode name was derived. |
The SQL statement for creating this table is as follows:
CREATE TABLE ROG_Political ( Level decimal(2.1) NOT NULL, Code char(16) NOT NULL, Status char(1) NOT NULL, Name varchar(128) NOT NULL, Name_Src char(8) NOT NULL, NameD varchar(254) NOT NULL, NameDchar char(1) NOT NULL, NIMA_Reg integer)
For instance, the entries for the United Arab Emirates appear as follows:
Level Code Status CodeSrc Name NameSrc NameD NameDChar NIMA_Reg ===== ==== ====== ======== ==================== ======== ============================== ========= ======== 1.0 AE C NIMA_C_L United Arab Emirates NIMA_C_L United Arab Emirates I 0 2.0 AE01 C NIMA_A_L Abu Zaby NIMA_A_L Abū Z̧aby H 3 2.0 AE02 C NIMA_A_L `Ajman NIMA_A_L ‘Ajmān H 3 2.0 AE03 C NIMA_A_L Dubayy NIMA_A_L Dubayy H 3 2.0 AE04 C NIMA_A_L Al Fujayrah NIMA_A_L Al Fujayrah H 3 2.0 AE05 C NIMA_A_L Ra's al Khaymah NIMA_A_L Ra's al Khaymah H 3 2.0 AE06 C NIMA_A_L Ash Shariqah NIMA_A_L Ash Shāriqah H 3 2.0 AE07 C NIMA_A_L Umm al Qaywayn NIMA_A_L Umm al Qaywayn H 3
If you are viewing this document in a Unicode-capable browser and have the "TITUS Cyberbit Basic" font installed, you should see the corret rendering (with a bar over the u and a hook under the Z) of the Unicode string for Abu Zaby here: Abū Z̧aby. If you have only the standard Arial Unicode MS font installed, you will probably have one or more missing diacritics.
The registry contains two supplementary tables.
This supplementary table offers an cross-reference between the HIS codes and names for geopolitcal region and alternate codes and names.
The ROG Cross-Reference Index contains the following four columns:
| Column | Format | Description | ||||||||||||
| Code | char(16) | HIS geopolitical region code | ||||||||||||
| CodeRel | char(1) | A one-character code that indicates the relationship
between the geopolitical area represented by the HIS code and the most nearly
corresponding code(s) in another system.
|
||||||||||||
| AltCode | varchar(45) | A code representing a region that at least partially overlaps, and might potentially be of use in encoding, the area designated by the corresponding HIS code. If the record represents an alternate name for a region from a source which does not associate the name with a code, the code field will be left empty. | ||||||||||||
| Source | char(8) | Source for this record. | ||||||||||||
| NameType | char(2) |
The type of name represented by this record. In many cases, sources do not clearly distinguish name types; in such cases, the name type assigned is likely to be based on the Steward's best assessment of the predominant name type of the source, rather than an individual assessment of each record.
|
||||||||||||
| Name | varchar(128) | A transliteration of the region name represented in standard ASCII with all diacritics removed. Some symbols may be included, such as the apostrophes, exclamation points, or other ASCII characters to represent speech sounds not present in English. | ||||||||||||
| NameD | varchar(254) | A transliteration of the region name into Latin characters with diacritics. See notes for the field of the same name in the ROG_Politcal table for notes on character encoding and fonts. | ||||||||||||
| NameDChar | char(1) | The character encoding used to represent the diacritical form of the name. See notes for the field of the same name in the ROG_Political table for permitted values. | ||||||||||||
| NIMA_Reg | integer | For NIMA-derived names, the region code identifying the original regional character set from which the HTML-encoded Unicode name was derived. |
The SQL statement for creating this table is as follows:
CREATE TABLE ROG_PoliticalXref( Code char(16)NOT NULL, CodeRel char(1)NOT NULL, AltCode varchar(45), Source char(8) NOT NULL, NameType char(2)NOT NULL, Name varchar(128)NOT NULL, NameD varchar(254)NOT NULL, NameDChar char(1)NOT NULL, NIMA_Reg integer)
The following shows the entries in the name index for Tromelin Island, an entity with an assigned HIS Code, but which is considered part of Réunion in the ISO coding scheme.:
Code CodeRel AltCode Source NameType Name NameD NameDChar NIMA_Reg TE < RE ISO_A2 CS Reunion Réunion I 0 TE < REU ISO_A3 CS Reunion Réunion I 0 TE = TE NIMA C Tromelin Island Tromelin Island H 1 TE = TE NIMA N Ile Tromelin Île Tromelin H 1
We see that TE also has two alternate names originating from the NIMA names table, one a conventional English name and one the local (French) name. Each of these is represented in both diacritic-stripped and diacritical form. Both Name and NameD fields will be populated even if the name contains no characters with diacritics.
This supplementary table identifies the sources of names and codes used in other tables of the ROG.
| Column | Format | Description |
| Source | char(8) | Source Designation code |
| Description | varchar(128) | Description of source |
| URL | char(254) | A URL, if available, from which one can obtain further information (and possibly updated data) pertaining to a source. |
| Notes | char(254) | Notes about the source |
The SQL statement for creating this table is as follows:
CREATE TABLE ROG_Sources ( Source char(8) NOT NULL, Description varchar(128) NOT NULL, URL varchar(254), Notes varchar(254))
The following shows the entries in the name index for the three typical sources. Notes field is word wrapped for presentation here, but continuous in data.:
Source Description URL Notes
-------- ----------- ---------------------- -----
HIS Codes/Names assigned by HIS Steward http://www.gmi.org/HIS
NIMA_C_L NIMA GeoNet Names Server Country Lookup table http://gnpswww.nima.mil/geonames/GNS/index.jsp From URL indicated, choose
Country Codes under
"Lookup Tables", enter blank
form.
NIMA_A_L NIMA GeoNet Names Server ADM1 Lookup table http://gnpswww.nima.mil/geonames/GNS/index.jsp From URL indicated, choose
"ADM1 Codes" under "Lookup
Tables", enter blank form.
Note that names are in
proprietary NIMA code sets.
This section defines the process that the registry steward will follow to maintain the registry.
Additions or changes will be made to the when the submitter can demonstrate that a code is needed to describe a validly existing (either present or historical) geographic region not represented in the existing code, or that the name(s) associated with a code are in error. In keeping with the philosophy of having a code represent a specific geographic region, codes will not be changed simply to have a better mnemonic relationship to the current name of a region.
If you believe any of the information in the is in error, wish to have additional codes assigned, send your proposed change by e-mail to ROG_Steward@gmi.org, by fax to +1-719-548-7459, or by mail to ROG Steward, Global Mapping International, 15435 Gleneagle Dr Ste 201, Colorado Springs CO 80921, USA. Be sure to report the source of your information and the relationship of the regions for which codes are requested to regions already coded in the HIS system. If at all possible, particularly when requesting assignment of codes for a major rearrangement of political subdivisions or a level of subdivisions not previously assigned codes, provide a URL to an authoritative site giving preferred local names of the new subdivisions in Latin characters (with diacritics if used locally). Sites including maps of the regions are strongly preferred, as they will clearly show the spatial relationship with existing regions. If submitting requests for codes by mail, if at all possible include a printed form of the region names for which codes are requested, a floppy disk or CD-ROM with the same information, and if at all possible an authoritative map showing the boundaries of the regions for which codes are requested.
If you are requesting new code assignments at a country or first-level political subdivion level, please check the appropriate lookup table on the NIMA Geonet Names Server at http://gnpswww.nima.mil/geonames/GNS/index.jsp to see if codes have already been assigned and if you believe the names to be accurate. If the NIMA's codes and names are correct, simply use the NIMA codes and notify us that the change has occurred.In straighforward cases (i.e. situations where the names of regions for which codes are requested are in machine-readable form, documented by authoritative sources, and undisputed), GMI will generally provide new codes within three weeks. Additional time will be required if extensive research, verification, manual key entry, or character code conversion is required; in these cases we will notify you within three weeks of your submission of the expected time required for code assignment.
The registry will be updated and posted whenver requested changes are made. We will also update the HIS standard to conform with NIMA standards whenever we become aware of changes at a country level, or at least once per year. The most recent version will always be available for download from this page.
The plus symbol (+) will never be assigned as part of code at any level, and are reserved for local use. Thus, if you need a code for second-level political subdivisions within an existing first-level subdivision XX01, you could encode the first one XX01+A and increment the final character as needed to code all new areas.
The registry contains just one change history table.
All changes to ROG_Political reported in ROG_PoliticalChangeHistory. This table is cumulative, listing all changes to successive versions of the registry. The table has the following four columns:
| Code | char(16) | The code that is affected by the change reported in this record. | ||||||||
| Type | char(1) | A one-letter code indicating the type of change: There are four
possible values:
| ||||||||
| Date | char(10) | The date the change was released in a new version of the registry. Dates are expressed as 8 digits with hyphens to separate the parts of the date, e.g. YYYY-MM-DD. | ||||||||
| Description | varchar(255) | Describes the change. In the case of R changes, it also describes what a user should do to fix existing data that uses the now retired code. |
Note that there is not a change type for the case of narrowing the meaning of a code, such as when the region denoted by one code is split into two regions. In such a case, the original code is retired, and two new codes are added. In this way, the user of the code set is assured that once a code has been used to tag an item of data, it will continue to be the right code to use for as long as the code remains an active member of the code set.
The SQL statement for creating the change history table is as follows:
CREATE TABLE ROG_PoliticalChangeHistory ( Code varchar(16) NOT NULL, Type char(1) NOT NULL, Date char(10) NOT NULL, Description varchar(255) )
The change history table holds the cumulative list of all changes that have every been made to the registry. Thus it may be queried to learn the complete history of a given code, or to learn all the changes that have been made since a given date. For instance, the following SQL query would be used to find out what changes have occurred since the beginning of 2002:
SELECT * FROM ROG_PoliticalChangeHistory WHERE Date >=2002-01-01
For a site that has used ROG_Political codes in its own database, an important use of the change history table is to discover codes used in its data that are now obsolete and thus need to be changed. These will be only the codes that have been retired. Thus a full list of all data records needing to be changed can be found by doing a JOIN on the change history table. For instance, if the column named code in MyTable holds an ROG_Political code, then the following SQL statement will select all records that need to be changed due to changes to the code set since the beginning of 2002:
SELECT * FROM MyTable as M
JOIN ROG_PoliticalChangeHistory as C ON M.code=C.Code
WHERE C.Type='R' AND C.Date >=2002-01-01
Note that the Description field of the joined result set will describe what needs to be done to bring the offending code up-to-date.
The following limitations are known in the current code set, and are expected to be addressed in the future:
A complete distribution of the includes:
The following versions are available: