Compute Visual Similarity of Candidate Top-Level Domains
This pre-production test version algorithm is intended to provide a visual similarity
score between a candidate generic top-level domain (gTLD) and existing TLDs (gTLDs
and ccTLDs) or a Reserved Name or between any two strings. The algorithm compares uppercase
and lowercase versions of strings. Any string yielding a similarity level of 30% above is cited.
The higher score between upper case and lower case comparison is displayed.
Dots (‘.’) should not be included in the test strings; they are not eliminated, but cause
distortions in calculation of similarity scores.
Assess candidate string : Compare a candidate to existing TLDs
and reserved names.
|
Assess two strings : Compare two strings to each other.
|
Background
A Generic Names Supporting Organization (GNSO) policy recommendation for the introduction
of New gTLDs
is that “Strings must not be confusingly similar to an existing top-level
domain or a Reserved Name.” The algorithm is being developed as a tool to help implement
this recommendation. A panel of examiners will determine if strings are visually confusingly
similar to one another and they will be aided in their decision making process by an
algorithmic tool.
This web page demonstrates the algorithm developed by SWORD to provide an open, objective,
and predictable mechanism for assessing the degree of visual similarity between TLD strings.
This site is being used for testing and evaluation purposes and ICANN reserves the right to
use any data entered on the site to improve the algorithm and ICANN's new gTLD program. In
this test phase, the algorithm will only display results above 30%.
The algorithm uses proprietary software to perform a series of mathematical calculations to
assess the visual similarity between strings based upon the following parameters:
- length of the strings;
- number of similar letters within sequences of two or more letters;
- number of similar letters not in sequence;
- number of dissimilar letters;
- length of common prefixes and suffixes if greater than one.
Additionally, SWORD has developed an image recognition program (a pixel-by-pixel comparison) designed to assess the visual similarity between two characters (e.g., the capital letter “O” and a zero “0” have a high visual similarity score). This scoring feature is a component of the algorithm’s results.
This pre-production version algorithm supports the most common characters in Arabic, Chinese, Cyrillic, Devanagari, Greek, Japanese, Korean and Latin. It also can compare cross-script strings that belong to the same family of scripts. For example, Chinese and Japanese belong to the East-Asian script family and could be compared. Similarly, Latin and Greek belong to the European script family and could be compared. This tool will not compare scripts from different families as they are believed to be so visually different that the likelihood of confusion is thought to be zero. This version of the algorithm does not include validation of candidate strings for compliance with IDNA protocols.
The algorithm computes a visual similarity score – it is not meant to consider phonetic similarity. For example, "fish", "phish", and "fiche" sound alike, but are visually distinct and unlikely to be confused.
About ICANN
The Internet Corporation for Assigned Names and Numbers (ICANN) is a not-for-profit, multi-stakeholder, international organization that has responsibility for Internet Protocol (IP) address space allocation, protocol identifier assignment, generic (gTLD) and country code (ccTLD) top-level domain name system management, and root server system management functions.
ICANN’s mission is to coordinate, at the overall level, the global Internet's systems of unique identifiers, and in particular to ensure the stable and secure operation of these systems. It coordinates policy development reasonably and appropriately related to these technical functions, consistent with ICANN’s core values.
Information about ICANN is available at
www.icann.org
About SWORD
SWORD is a specialist international IT services and products company offering consultancy
and integration services. As well-known specialist in the area of verbal search algorithms,
SWORD has its proprietary search algorithm deployed in more than 30 patent and trademark
offices throughout the world.
Information about SWORD is available at
www.sword-group.com