Name Description Size
__init__.py Detect the encoding of the given byte string. :param byte_str: The byte sequence to examine. :type byte_str: ``bytes`` or ``bytearray`` 3271
big5freq.py 31254
big5prober.py 1757
chardistribution.py reset analyser, clear any state 9411
charsetgroupprober.py 3839
charsetprober.py We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [\x80-\xFF] marker: everything else [^a-zA-Z\x80-\xFF] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. 5110
cli
codingstatemachine.py A state machine to verify a byte sequence for a particular encoding. For each byte the detector receives, it will feed that byte to every active state machine available, one byte at a time. The state machine changes its state based on its previous state and the byte it receives. There are 3 states in a state machine that are of interest to an auto-detector: START state: This is the state to start with, or a legal byte sequence (i.e. a valid code point) for character has been identified. ME state: This indicates that the state machine identified a byte sequence that is specific to the charset it is designed for and that there is no other possible encoding which can contain this byte sequence. This will to lead to an immediate positive answer for the detector. ERROR state: This indicates the state machine identified an illegal byte sequence for that encoding. This will lead to an immediate negative answer for this encoding. Detector will exclude this encoding from consideration from here on. 3590
compat.py 1200
cp949prober.py 1855
enums.py All of the Enums that are used throughout the chardet package. :author: Dan Blanchard (dan.blanchard@gmail.com) 1661
escprober.py This CharSetProber uses a "code scheme" approach for detecting encodings, whereby easily recognizable escape or shift sequences are relied on to identify these encodings. 3950
escsm.py 10510
eucjpprober.py 3749
euckrfreq.py 13546
euckrprober.py 1748
euctwfreq.py 31621
euctwprober.py 1747
gb2312freq.py 20715
gb2312prober.py 1754
hebrewprober.py 13838
jisfreq.py 25777
jpcntx.py 19643
langbulgarianmodel.py 105685
langgreekmodel.py 99559
langhebrewmodel.py 98764
langhungarianmodel.py 102486
langrussianmodel.py 131168
langthaimodel.py 103300
langturkishmodel.py 95934
latin1prober.py 5370
mbcharsetprober.py MultiByteCharSetProber 3413
mbcsgroupprober.py 2012
mbcssm.py 25481
metadata
sbcharsetprober.py 6136
sbcsgroupprober.py 4309
sjisprober.py 3774
universaldetector.py Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco 12503
utf8prober.py 2766
version.py This module exists only to simplify retrieving the version number of chardet from within setup.py and from chardet subpackages. :author: Dan Blanchard (dan.blanchard@gmail.com) 242