Help → [encoding] french accent

[encoding] french accent

Hello,
looking at words classified by popfile, i can't find any "French word with accent" like "méthode" ou "école"
When i try to add this word i get an error "Ignored words can only contain alphanumeric, ., _, -, or @ characters"
When i look at the code
http://getpopfile.org/changeset/449
I'm not sure why this code has been removed. It's a very common and effective way of handling accentuated words.
If popfile is capable of handling Japanese, it should be able to handle European language ;)
Am i missing an option ?
Best regards.

  • Message #1231

    When i try to add this word i get an error "Ignored words can only contain alphanumeric, ., _, -, or @ characters"

    I was able to add "école" to the list of ignored words.

    Here is an extract from the list BEFORE I added the word:

    d 	date, dec, del, dfn, did, dir, div, does, doing, done 
    e 	edt, edu, embed, encoding, esmtp, est, etc 
    f 	feb, fig, font, for, form, frame, frameset, fri, from
    

    and here is how the list looked after I added the word (by pasting école into the 'Add word' box):

    d 	date, dec, del, dfn, did, dir, div, does, doing, done 
    é 	école 
    e 	edt, edu, embed, encoding, esmtp, est, etc 
    f 	feb, fig, font, for, form, frame, frameset, fri, from
    

    I used POPFile 1.1.1 on a Windows 7 system to do this test.

    Which version of POPFile are you using?

    When i look at changeset:449 I'm not sure why this code has been removed

    That change was made just over 7 years ago so I cannot answer your question at the moment (some investigation will be needed!)

    Brian

    • Message #1233

      Which version of POPFile are you using?

      I should have asked an extra question: Which operating system are you using?

      When I tried to add "école" to the list of stopwords in POPFile 1.0.1 running on Ubuntu 9.04 (32-bit) I got the same error message you reported. Using POPFile 1.0.1 on Windows 7 (64-bit) I was able to add "école" to the list of stopwords.

      The Windows and Ubuntu versions of POPFile 1.0.1 use the same code to check if the new stopword is valid but they seem to get different results. ticket:141 has been raised for this problem.

      There may be some other problems with the way the Windows version handles accented characters.

      changeset:449 is just one of a large number of changes that went into the 0.18.0 release (see the Release Notes for details). At present I'm not sure how POPFile is supposed to be handling accented characters. All I know for sure is that something is wrong somewhere because the Windows version is doing something different.

      Brian

      • Message #1234

        The Windows and Ubuntu versions of POPFile 1.0.1 use the same code to check if the new stopword is valid but they seem to get different results.

        This is because POPFile uses the system locale.

        http://perldoc.perl.org/perllocale.html#USING-LOCALES

        Here's a sample Perl script to examine which character can be used as stopwords:

        #!/usr/bin/perl
        
        use strict;
        use warnings;
        use locale;
        use POSIX qw(locale_h);
        
        print "Current locale: ", setlocale( LC_CTYPE ), "\n";
        print +(sort grep /[[:alpha:]\-_\.\@0-9]/, map { chr } 0..255), "\n";
        

        Result on Windows (Japanese)

        Current locale: Japanese_Japan.932
        ゙゚-.@_0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZァアィイゥウェエォオカキクケコサシスセソタチッツテトナニヌネノハヒフヘホマミムメモャヤュユョヨラリルレロワヲンー
        

        Result on Mac OS X (Japanese and manually changed locale)

        Current locale: ja_JP.eucJP
        -.0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
        

        Naoki

        • Message #1378

          Hi,
          is there then a way to make popfile decoding correctly special characters like àéè? All words with accented characters are wrongly decoded and then ignored. I am using POPFile 1.1.1 under Mac OS X 10.6.

          Thanks

          Paolo