Skip to content

e14n/activityspam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is an experimental server for filtering Activity Streams (http://activitystrea.ms/) data for spam.

Apache 2.0 license.

More or less copying "a plan for spam" filtering, but make pseudo-tokens for activity streams fields. So something like:

     { id: "urn:uuid:7e4ed55a-2b99-48c8-a274-42819b2ddd39",
       url: "http://example.net/status/35",
       published: "2011-09-23T10:49:00Z",
       actor: { displayName: "John Smith",
       	      	id: "urn:uuid:bff0ecdd-a944-4d92-aed3-d6af8f13d610",
		url: "http://example.net/status/johnsmith" },
       verb: "post",
       object: { id: "urn:uuid:81e43564-c66f-40c5-878b-733275229521",
       	       	 type: "note",
		 content: "<a href='http://example.com/viagra-spam'>Buy Viagra Now!</a>" } }

Would tokenize as:

      id=urn:uuid:7e4ed55a-2b99-48c8-a274-42819b2ddd39
      url=http://example.net/status/35
      published=2011-09-23T10:49:00Z
      actor.displayName=John-Smith
      actor.id=urn:uuid:bff0ecdd-a944-4d92-aed3-d6af8f13d610
      actor.url=http://example.net/status/johnsmith
      verb=post
      object.id=urn:uuid:81e43564-c66f-40c5-878b-733275229521
      object.type=note
      a
      href
      http://example.com/viagra-spam
      Buy
      Viagra
      Now
      a

There may be some value in grabbing the domains of URLs (example.com and example.net here).

About

Bayesian spam filter for activitystrea.ms data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published