Homoglyphs

Homoglyphs lives! This Python library is an important and widely used library for handling Homoglyphs in Python. This is a fork of the original orsinium maintained project.

Homoglyphs -- python library for getting homoglyphs and converting to ASCII.

Features

It's smarter version of confusable_homoglyphs:

Autodect or manual choosing category (aliases from ISO 15924).
Auto or manual load only needed alphabets in memory.
Converting to ASCII.
More configurable.
More stable.

Installation

sudo pip install homoglyphs_fork

Usage

Best way to explain something is show how it works. So, let's have a look on the real usage.

Importing:

import homoglyphs_fork as hg

Languages

#detect
hg.Languages.detect('w')
# {'pl', 'da', 'nl', 'fi', 'cz', 'sr', 'pt', 'it', 'en', 'es', 'sk', 'de', 'fr', 'ro'}
hg.Languages.detect('т')
# {'mk', 'ru', 'be', 'bg', 'sr'}
hg.Languages.detect('.')
# set()

# get alphabet for languages
hg.Languages.get_alphabet(['ru'])
# {'в', 'Ё', 'К', 'Т', ..., 'Р', 'З', 'Э'}

# get all languages
hg.Languages.get_all()
# {'nl', 'lt', ..., 'de', 'mk'}

Homoglyphs

Get homoglyphs:

# get homoglyphs (latin alphabet initialized by default)
hg.Homoglyphs().get_combinations('q')
# ['q', '𝐪', '𝑞', '𝒒', '𝓆', '𝓺', '𝔮', '𝕢', '𝖖', '𝗊', '𝗾', '𝘲', '𝙦', '𝚚']

Alphabet loading:

# load alphabet on init by categories
homoglyphs = hg.Homoglyphs(categories=('LATIN', 'COMMON', 'CYRILLIC'))  # alphabet loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы', 'ꭇы', 'ꭈы', '𝐫ы', '𝑟ы', '𝒓ы', '𝓇ы', '𝓻ы', '𝔯ы', '𝕣ы', '𝖗ы', '𝗋ы', '𝗿ы', '𝘳ы', '𝙧ы', '𝚛ы']

# load alphabet on init by languages
homoglyphs = hg.Homoglyphs(languages={'ru', 'en'})  # alphabet will be loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы']

# manual set alphabet on init      # eng rus
homoglyphs = hg.Homoglyphs(alphabet='abc абс')
homoglyphs.get_combinations('с')
# ['c', 'с']

# load alphabet on demand
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# ^ alphabet will be loaded here for "en" language
homoglyphs.get_combinations('гы')
# ^ alphabet will be loaded here for "ru" language
# ['rы', 'гы']

You can combine categories, languages, alphabet and any strategies as you want. The strategies specify how to handle any characters not already loaded:

STRATEGY_LOAD: load category for this character
STRATEGY_IGNORE: add character to result
STRATEGY_REMOVE: remove character from result

Converting glyphs to ASCII chars

homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)

# convert
homoglyphs.to_ascii('ТЕСТ')
# ['TECT']
homoglyphs.to_ascii('ХР123.')  # this is cyrillic "х" and "р"
# ['XP123.', 'XPI23.', 'XPl23.']

# string with chars which can't be converted by default will be ignored
homoglyphs.to_ascii('лол')
# []

# you can set strategy for removing not converted non-ASCII chars from result
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
)
homoglyphs.to_ascii('лол')
# ['o']

# also you can set up range of allowed char codes for ascii (0-128 by default):
homoglyphs = hg.Homoglyphs(
    languages={'en'},
    strategy=hg.STRATEGY_LOAD,
    ascii_strategy=hg.STRATEGY_REMOVE,
    ascii_range=range(ord('a'), ord('z')),
)
homoglyphs.to_ascii('ХР123.')
# ['l']
homoglyphs.to_ascii('хр123.')
# ['xpl']

The Fork

To help with the transition I have:

Moved the main branch
Enabled Issues

I am looking to:

Contributors

With thanks to:

@wesinator
@clydejallorina

Name	Name	Last commit message	Last commit date
Latest commit yamatt Merge pull request #15 from yamatt/run-test-on-pr Feb 1, 2025 5dcb4a3 · Feb 1, 2025 History 160 Commits
.github	.github	runs tests on pr creation	Feb 1, 2025
homoglyphs_fork	homoglyphs_fork	updating confusables	Feb 1, 2025
.editorconfig	.editorconfig	migrate on dephell	Aug 6, 2019
.flake8	.flake8	making lint run consistently	Nov 7, 2024
.gitignore	.gitignore	migrate on dephell	Aug 6, 2019
LICENSE	LICENSE	migrate on dephell	Aug 6, 2019
README.md	README.md	adds @clydejallorina to thanks	Nov 7, 2024
generate.py	generate.py	switching to rye as development dependency management	Apr 1, 2024
logo.png	logo.png	+logo	Mar 31, 2018
logo.svg	logo.svg	+logo	Mar 31, 2018
pyproject.toml	pyproject.toml	description must be dynamic	Nov 7, 2024
requirements-dev.lock	requirements-dev.lock	rye updates	Nov 7, 2024
requirements.lock	requirements.lock	rye updates	Nov 7, 2024
tests.py	tests.py	Set default ascii strategy to remove non matching characters so that …	Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homoglyphs

Features

Installation

Usage

Languages

Categories

Homoglyphs

Converting glyphs to ASCII chars

The Fork

Contributors

About

Releases 3

Packages

Languages

License

yamatt/homoglyphs

Folders and files

Latest commit

History

Repository files navigation

Homoglyphs

Features

Installation

Usage

Languages

Categories

Homoglyphs

Converting glyphs to ASCII chars

The Fork

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages