pyzor.digest

class pyzor.digest.DataDigester(msg, spec=None)

Bases: object

The major workhouse class.

atomic_num_lines = 4
digest
classmethod digest_payloads(msg)
email_ptrn = <_sre.SRE_Pattern object at 0x7f540b2dfcb0>
handle_atomic(lines)

We digest everything.

handle_line(line)
handle_pieced(lines, spec)

Digest stuff according to the spec.

longstr_ptrn = <_sre.SRE_Pattern object at 0x7f540f14a3c0>
min_line_length = 8
classmethod normalize(s)
static normalize_html_part(s)
classmethod should_handle_line(s)
unwanted_txt_repl = ''
url_ptrn = <_sre.SRE_Pattern object at 0x7f540b191100>
value
ws_ptrn = <_sre.SRE_Pattern object at 0x7f540b1e28a0>
class pyzor.digest.HTMLStripper(collector)

Bases: HTMLParser.HTMLParser

Strip all tags from the HTML.

handle_data(data)

Keep track of the data.

class pyzor.digest.PrintingDataDigester(msg, spec=None)

Bases: pyzor.digest.DataDigester

Extends DataDigester: prints out what we’re digesting.

handle_line(line)
pyzor.digest.get_digest(msg)