Why does POPFile only classify individual words, not phrases?

While it would seem that classifying on phrases like the CRM114 project does would be a good idea, it is simply too slow to do in practice for the theoretical increase in accuracy. The research in multi-word classifiers (not just POPFile's) has shown this tradeoff isn't worth it as the system storage, memory and speed requirements go up exponentially as the scalability lowers in proportion. Perhaps in the future this will be considered “the best way”, but for now it isn't viable for real world use. Note: The project leaders of CRM114 have even admitted as much.

faq/phrasesonly.txt · Last modified: 2008/02/08 19:49 by

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License