The many products we offer

Code Tables Coding method for the HIS Registry of Geography.

Email | Print | More

The registry contains just one code table: ROG_Political.

ROG_Political: Code for Geopolitical Regions

The domain of the categories for this code set politically-defined land areas of the world. Codes of variable length using printable ASCII characters are defined at different levels of political geography: at present:

  • the level 0 (World) code is one character long
  • level 1 (countries and areas of special sovereignty) codes are two characters long
  • level 2 (first-level political subdivisions within countries) are four characters long, with the first two characters being the code for the country of which the first level subdivision is a part

Our intent wherever reasonably possible is to continue this pattern to lower levels of political geography, adding two characters at each level to a maximum code length of 16 characters. We recognize the possibility that situations could theoretically arise where more political subdivisions exist at a particular level than can be encoded with two additional characters. The tilde (~) character appearing anywhere in a code is reserved to indicate that the code beyond the tilde departs from the standard two-characters-per-level hierarchical scheme.

 

At levels 1 and 2, where the majority of codes are derived from the NIMA/FIPS 10-4 coding system, which uses two upper-case alpha letters for the code at level 1 (countries and areas of special sovereignty) and suffixes two additional digits or uppercase letters at level 2 (first level political subdivisions of level 1 entities, or ADM1 in the NIMA feature designation system), codes assigned by GMI to supplement the NIMA codes will be recognizable by using characters not used by NIMA, i.e. lower case letters or other printable ASCII characters for the portions of the hierarchical code created by GMI.

 

The code table for ROG_Political contains the following four columns:

Column Format Description
Level decimal(2.1) The level of the geopolitcal region in a geopolitical hierarchy, with 0.0 representing the World; 1.0 representing countries and areas of special sovereignty, 2.0 representing first-level administrative divisions within countries, etc. Non-integer values may be used in future versions of the standard to indicate intermediate levels. As an example, the NIMA code we adopt uses counties as the first-level political divisions of the United Kingdom. A higher-level of political subdivison, with entities such as England, Scotland and Wales, exists within the United Kingdom. Were it to prove desirable to add codes for these entities, they might be assigned a Level of 1.5.
Code char(16)  The HIS code for the geopolitical region.
Status char(1)

 A one-letter code with one of three possible values for the status of the geopolitcal region code.

C   Current NIMA. The code is believed to be current in the NIMA code set.
H Current HIS. A current code assigned by the HIS steward for a currently-valid geopolitical region not represented in the NIMA code set.
R Retired. A code for a geopolitical region that is not currently in existence, but may be validly appear in historical data. Note: Retired codes are not currently included in the standard, but may be added in the future if the HIS steward receives requests for codes for historically-valid geopolitical entities.
Name varchar(128)  A transliteration of the region name represented in standard ASCII with all diacritics removed. Some symbols may be included, such as the apostrophes, exclamation points, or other ASCII characters. to represent speech sounds not present in English. In the case of countries, the name used is the shortest conventional English name of the country.
NameSrc char(8)  Source from which the region names are derived; see ROG_Sources for values.
NameD varchar(254)  A transliteration of the region name into latin characters with diacritics. In general, these are intended to more accurately represent the local pronunciation of the name. See note under Status below for names of Retired codes. Where the source of the information is indicated as NIMA, the relationship between the romanized name and the local alphabet is documented in Romanization Systems and Roman-Script Spelling Conventions, item TBGNROMANGUIDE, available from the U.S. Geological Survey at http://mac.usgs.gov/mac/isb/pubs/forms/nimapl.html.
NameDChar char(1)

 The character encoding used to represent the diacritical form of the name. Two encodings are currently used. "I" indicates ISO 8859-1 single-byte Latin 1 characters (equivalent for practical purposes to the Microsoft Windows ANSI code page 1252 character encoding); "H" indicates HTML-encoded text, potentially with Unicode characters encoded in hexadecimal in accordance with HTML 4.0 specifications.

 

In general, character sets indicated are those necessary to represent all data from a particular source, so some records from a source indicated as, for example, HTML-encoded Unicode, may not contain any Unicode values. Where diacritical names originate from NIMA, translation from the original NIMA 8-bit code sets to Unicode has been done in accordance with http://164.214.2.59/gns/html/regions.pdf. Many fonts claiming to be Unicode compatible, and particularly those provided by Microsoft at the time this is written, do not include all of the Unicode characters required (particularly applied diacritics) to render NIMA names translated into Unicode according to NIMA specifications.

 

A font that works well for this purpose is distributed by Thesaurus Indogermanischer Text- und Sprachmaterialien (TITUS) and may be downloaded at http://titus.fkidg1.uni-frankfurt.de/unicode/tituut.asp. The TITUS Cyberbit Basic font is a TrueType font for Microsoft Windows, and may only be used for non-commercial purposes.

NIMA_Reg integer  For NIMA-derived names, the region code identifying the original regional character set from which the HTML-encoded Unicode name was derived.

 

The SQL statement for creating this table is as follows:

CREATE TABLE ROG_Political (
Level decimal(2.1) NOT NULL,
Code char(16) NOT NULL,
Status char(1) NOT NULL,
Name varchar(128) NOT NULL,
Name_Src char(8) NOT NULL,
NameD varchar(254) NOT NULL,
NameDchar char(1) NOT NULL,
NIMA_Reg integer
)

For instance, the entries for the United Arab Emirates appear as follows:

Level Code Status CodeSrc  Name                 NameSrc  NameD                          NameDChar NIMA_Reg
===== ==== ====== ======== ==================== ======== ============================== ========= ========
1.0 AE C NIMA_C_L United Arab Emirates NIMA_C_L United Arab Emirates I 0
2.0 AE01 C NIMA_A_L Abu Zaby NIMA_A_L Abū Z̧aby H 3
2.0 AE02 C NIMA_A_L `Ajman NIMA_A_L ‘Ajmān H 3
2.0 AE03 C NIMA_A_L Dubayy NIMA_A_L Dubayy H 3
2.0 AE04 C NIMA_A_L Al Fujayrah NIMA_A_L Al Fujayrah H 3
2.0 AE05 C NIMA_A_L Ra's al Khaymah NIMA_A_L Ra's al Khaymah H 3
2.0 AE06 C NIMA_A_L Ash Shariqah NIMA_A_L Ash Shāriqah H 3
2.0 AE07 C NIMA_A_L Umm al Qaywayn NIMA_A_L Umm al Qaywayn H 3

If you are viewing this document in a Unicode-capable browser and have the "TITUS Cyberbit Basic" font installed, you should see the corret rendering (with a bar over the u and a hook under the Z) of the Unicode string for Abu Zaby here: Abū Z̧aby. If you have only the standard Arial Unicode MS font installed, you will probably have one or more missing diacritics.