Ticket #35 (new enhancement)

Opened 13 years ago

Last modified 13 years ago

Parse XML Document Formats

Reported by: texasfett Assigned to:
Priority: normal Milestone: 2.0.0
Component: unknown Version: svn
Severity: normal Keywords:


With MS Office now defaulting to XML based documents, we should take advantage of that to read keywords out of the document for better classification. Though XML is just structured text, OOXML and OpenDoc? are zipped. So mostly we just need to unzip the file. Our parser should deal with XML decently I think.