pyzor.digest

Handle digesting the messages.

class pyzor.digest.DataDigester(msg, spec=None)

Bases: object

The major workhouse class.

atomic_num_lines = 4
digest
classmethod digest_payloads(msg)
email_ptrn = <_sre.SRE_Pattern object at 0x7f2063ecdcb0>
handle_atomic(lines)

We digest everything.

handle_line(line)
handle_pieced(lines, spec)

Digest stuff according to the spec.

longstr_ptrn = <_sre.SRE_Pattern object at 0x7f2063938918>
min_line_length = 8
classmethod normalize(s)
static normalize_html_part(s)
classmethod should_handle_line(s)
unwanted_txt_repl = ''
url_ptrn = <_sre.SRE_Pattern object at 0x7f2063bde6b0>
value
ws_ptrn = <_sre.SRE_Pattern object at 0x7f20642bd8a0>
class pyzor.digest.HTMLStripper(collector)

Bases: HTMLParser.HTMLParser

Strip all tags from the HTML.

handle_data(data)

Keep track of the data.

handle_endtag(tag)
handle_starttag(tag, attrs)
class pyzor.digest.PrintingDataDigester(msg, spec=None)

Bases: pyzor.digest.DataDigester

Extends DataDigester: prints out what we’re digesting.

handle_line(line)