• Ed Morley's avatar
    NLTK support: Fix passing of multiple corpora identifiers (#460) · 4212e063
    Ed Morley authored
    * NLTK support: Update test to use multiple corpora
    
    So that the incorrect handling of multiple IDs seen in #444 would
    have been caught.
    
    Also switches to some of the smaller corpora, to reduce time spent
    downloading during tests (see sizes on http://www.nltk.org/nltk_data/).
    
    * NLTK support: Fix passing of multiple corpora identifiers
    
    As part of fixing the shellcheck warnigns in #438, double quotes had
    been placed around `$nltk_packages` passed to the `nltk.downloader`,
    which causes multiple identifiers to be treated as though it were just
    one identifier that contains spaces.
    
    The docs for the shellcheck warning in question recommend using arrays
    if the intended behaviour really is to split on spaces:
    https://github.com/koalaman/shellcheck/wiki/SC2086#exceptions
    
    As such, `readarray` has been used, which is present in bash >=4.
    The `[*]` array form is used in the log message, to prevent shellcheck
    warning SC2145, whereas `[@]` is used when passed to `nltk.downloader`
    to ensure the array elements are unpacked as required.
    
    Note: Both before and after this fix, using anything but unix line
    endings in `nltk.txt` will also cause breakage.
    4212e063
Name
Last commit
Last update
..
hooks Loading commit data...
collectstatic Loading commit data...
cryptography Loading commit data...
eggpath-fix Loading commit data...
eggpath-fix2 Loading commit data...
gdal Loading commit data...
geo-libs Loading commit data...
mercurial Loading commit data...
nltk Loading commit data...
pip-install Loading commit data...
pip-uninstall Loading commit data...
pipenv Loading commit data...
pipenv-python-version Loading commit data...
pylibmc Loading commit data...
python Loading commit data...