Read other articles

Brand Monitor: Typo generation FAQ

Posted on May 23, 2019

Why should I monitor misspellings for my domain name?

Brand Alert API and Brand Alert Monitor provide a toolkit for searching newly registered or recently removed domain names by a substring.

However, often it may not be enough to check the brand name only, since new domain names can slightly vary.

This technique is called typosquatting. The term "typosquatting" means registering and using a domain name which is similar to that of the victim but with intentional typos.

Misspelled domain names might be of a potential risk both for the domain owners and end-users, since such the domain names might be used for brandjacking, redirecting traffic to competitors, harmful content distribution, etc.

How many misspellings can you generate?

The number of possible typos strictly depends on the search term's length. The longer the word, the more misspellings may be generated based on it.

However, finding all the possible typos would be a demanding task and, typically it's no use searching through all the possible letters combinations.

As of now, the number of misspellings is limited to 1,000 per search term.

If for some reason you need more variations, please contact us.

How do you generate typos for a domain?

Our service supports the following rules for misspelling generation:

Bitsquatting;
Homoglyph substitution;
Wildcard substitution;
Natural languages differences;
Common misspellings dictionary;
Words splitting;
Letters mistype;

What is bitsquatting?

Bitsquatting is a form of typosquatting which usually makes sense for machine-to-machine interaction. The general idea is to switch a bit or several bits in the domain name's binary representation.

If the machine's RAM is broken, a bit might be switched on any of the underlying network levels, which results in connecting to a potentially malicious host.

For example, if the search term is google, its bit representation is:

g								o								o								g								l								e
0	1	1	0	0	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57

Let's switch bit 7 to 0:

f								o								o								g								l								e
0	1	1	0	0	1	1	0	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57

The result search term is foogle.

The same mechanics might be applied for any bit or even for several bits at the same time.

What is homoglyph substitution?

In different natural and artificial languages, there are characters which look quite similar or even identical, e.g. the Latin "C" is identical to the Cyrillic "С".

Another example is Latin "G" and the sound sign "ɢ". Such the characters are called "Homoglyphs".

In spite of their similarity, they are different in terms of Unicode.

It means that a fraudster can register a domain name with such similar-looking characters and redirect traffic to a malicious resource.

Such an attack is called IDN homograph attack.

In order to prevent the attack, it'd be better to monitor all the possible homoglyph combinations for your domain name.

What is the common misspellings dictionary?

It is a dictionary which consists of real-world misspellings. There is a rather comprehensive one at wikipedia.org.

We check a search term against this list so as to generate possible typos for it.

What are natural languages differences?

In different regions, people may spell and pronounce the same words in different ways. For example, for British and American English, there are such common differences as:

behaviour → behavior
catalogue → catalog
center → centre
etc.

We generate all the possible misspelled domain names according to the most common differences.

What is word splitting?

Sometimes domain names consist of several concatenated words, e.g. thelongestdomainnameever.com.

A possible typosquatting for such a domain name is to split it to words and then concatenate them with a hyphen "-".

We use a pre-trained model in order to detect original words in a character sequence and then combine them with any possible combination of hyphens.

For instance, for domain name thelongestdomainnameever.com, we'll get:

the-longest-domain-name-ever.com
thelongest-domainname-ever.com
the-longestdomainnameever.com
etc.

What is letters mistype?

The most natural misspelling are typing mistakes in one or two letters.

For instance, that can be done by combining the source domain name's letters with their keyboard neighbors.

Such the domain names may look pretty similar to the original ones, but they probably will lead to a harmful resource.

We use the following letter mistype rules:

Letter repetition - add extra letters which repeat themselves in the original word, e.g. google.com → gooogle.com or ggoogle.com
Letter replacement - replace every letter in a word to its keyboard neighbor, e.g. google.com → toogle.com or boogle.com
Letter addition - add keyboard neighbor letter before or after each character, e.g. google.com → gfoogle.com
Letter reversion - reverse 2 letters in a word, e.g. google.com → goolge.com, googel.com
Letter omission - skip a letter, e.g. google.com → gogle.com, googl.com
Vowel swapping - change any vowel character in a word for other vowel characters, google.com → gaogle.com, guogle.com

Read other articles

Try our WhoisXML API for free

Get started

WHOIS / WHOIS History

DNS / DNS History

IP Geolocation / IP Netblocks

Domain Research Suite (DRS)

Domain/WHOIS

DNS/IP

Intelligence

Other

Domain/WHOIS

DNS/IP

Intelligence

Other

Domain/WHOIS

DNS/IP

Intelligence

Other

Domain Research Suite (DRS)

Research

Monitoring

White-Label

Predictive Threat Intelligence Feeds

Internet Infrastructure

Enterprise API Packages

Security Intelligence (SI) Suite

Brand Monitor: Typo generation FAQ

Why should I monitor misspellings for my domain name?

How many misspellings can you generate?

How do you generate typos for a domain?

What is bitsquatting?

What is homoglyph substitution?

What is the common misspellings dictionary?

What are natural languages differences?

What is word splitting?

What is letters mistype?

Try our WhoisXML API for free

Have questions?

g								o								o								g								l								e
0	1	1	0	0	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57

f								o								o								g								l								e
0	1	1	0	0	1	1	0	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57

g								o								o								g								l								e
0	1	1	0	0	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57

f								o								o								g								l								e
0	1	1	0	0	1	1	0	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57

g								o								o								g								l								e
0	1	1	0	0	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57

f								o								o								g								l								e
0	1	1	0	0	1	1	0	0	1	1	0	1	1	1	1	0	1	1	0	1	1	1	1	0	1	1	0	0	1	1	1	0	1	1	0	1	1	0	0	0	1	1	0	0	1	0	1
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	56	57