Open Discussion → Multiple tag classification

Multiple tag classification

Hi all,

I wonder how hard would be to extend POPFile to support multi-tagging, I mean the possibility to assign multiple tags to each processed messsage.

This could be used as a classification engine for text message (i.e. it could be used also in batch mode using the XMLRPC interface if I understood it correctly).

For example I may want to classify a text according to topic, language, length, priority.

Would be this possible and how hard would be to change POPFile to support that?

Regards.

  • Message #1243

    I had a look at the code and at the DB structure, the thing seems quite feasible.

    For example this could be done adding a "categories" table to the DB, adding a reference to it in the buckets and magnets tables, and iterating the classification performed by the Bayes classifier for each category.

    Maybe the same approach may be achieved creating a chain of popfile instances, each one with a different DB and configuration, but this looks quite overkill and brittle.

    What do you guys think of this?

    Greetings.

    • Message #1244

      This idea pops up every once in a while. The problem currently is that no one has of yet bothered to actually try it.

      Of course, to really try it and see how it works, you'd not only need the required time, but also the required data/emails that can be broken down in categories. The chained-POPFile-instances approach you describe would be ideal for such preliminary tests because you wouldn't have to change a single line of code.