Name Description Size
__init__.py Charset-Normalizer ~~~~~~~~~~~~~~ The Real First Universal Charset Detector. A library that helps you read text from an unknown charset encoding. Motivated by chardet, This package is trying to resolve the issue by taking a new approach. All IANA character set names for which the Python core library provides codecs are supported. Basic usage: >>> from charset_normalizer import from_bytes >>> results = from_bytes('Bсеки човек има право на образование. Oбразованието!'.encode('utf_8')) >>> best_guess = results.best() >>> str(best_guess) 'Bсеки човек има право на образование. Oбразованието!' Others methods and usages are available - see the full documentation at <https://github.com/Ousret/charset_normalizer>. :copyright: (c) 2021 by Ahmed TAHRI :license: MIT, see LICENSE for more details. 1577
api.py Given a raw bytes sequence, return the best possibles charset usable to render str objects. If there is no results, it is a strong indicator that the source is binary/not text. By default, the process will extract 5 blocks of 512o each to assess the mess and coherence of a given sequence. And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will. The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page but never take it for granted. Can improve the performance. You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that purpose. This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32. By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain' toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging. Custom logging format and handler can be set manually. 21097
assets
cd.py Return associated unicode ranges in a single byte code page. 12554
cli
constant.py 19101
legacy.py chardet legacy method Detect the encoding of the given byte string. It should be mostly backward-compatible. Encoding name will match Chardet own writing whenever possible. (Not on encoding name unsupported by it) This function is deprecated and should be used to migrate your project easily, consult the documentation for further information. Not planned for removal. :param byte_str: The byte sequence to examine. :param should_rename_legacy: Should we rename legacy encodings to their more modern equivalents? 2071
md.py Base abstract class used for mess detection plugins. All detectors MUST extend and implement given methods. 18682
models.py Implemented to make sorted available upon CharsetMatches items. 11492
py.typed 0
utils.py Retrieve the Unicode range official name from a single character. 11578
version.py Expose version 79