GitHub - pjwilliams/nematus-recon-scripts

1 Branch 0 Tags

Name	Name	Last commit message	Last commit date
Latest commit pjwilliams Merge pull request #1 from bricksdont/master Feb 25, 2019 100865e · Feb 25, 2019 History 5 Commits
README	README	combine-nbest-rcost.py: add --pick-tcosts and --pick-rcosts	Jul 21, 2017
combine-nbest-rcost.py	combine-nbest-rcost.py	combine-nbest-rcost.py: add --pick-tcosts and --pick-rcosts	Jul 21, 2017
normalize-nbest.py	normalize-nbest.py	Initial commit	Jul 13, 2017
recover-1best-alignment.py	recover-1best-alignment.py	fix recover alignment	Feb 25, 2019
rerank-recon.py	rerank-recon.py	Initial commit	Jul 13, 2017

Repository files navigation

Scripts for use with output from Nematus reconstructor (currently in
'reconstruction' branch of https://github.com/pjwilliams/nematus).

The basic workflow is as follows:

  TRAIN
  1. Train Nematus model as normal
  2. Continue training with reconstructor

  TEST
  1.  Generate unnormalized n-best list using model from training step 1 or 2
  2.  Rescore with reconstructor (generates separate reconstruction costs file)
  3.  Combine n-best and reconstruction costs files into extended n-best list
  4.  Normalize n-best list scores (alpha parameter controls length penalty)
  5.  Rerank to get new 1-best (lambda parameter controls reconstructor weight)

--------------------------------------------------------------------------------
TRAIN
--------------------------------------------------------------------------------
1. Train Nematus model as normal

2. Continue training with reconstructor by adding following options

      --use_reconstructor \
      --patience 100000 \

  The patience option is necessary because we are introducing reconstruction
  costs which will generally make the overall cost worse.  If the combined
  cost never drops below the best cost from step 1, then a low patience
  parameter would cause premature early stopping.

  Optionally, the following will periodically generate a 1-best reconstruction
  of the input (which can be evaluated against the input measure reconstruction
  quality)

      --valid_reconstruction_freq 10000 \

--------------------------------------------------------------------------------
TEST
--------------------------------------------------------------------------------
1. Generate unnormalized n-best list using model from training step 1 or 2

    THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu0 \
    python $nematus_home/nematus/translate.py \
        -m <MODELS> \
        -i <SRC_BPE> \
        -o <N_BEST_BASE> \
        -k <K> \
        --suppress-unk \
        --n-best

2. Rescore with reconstructor (generates separate reconstruction costs file)

    THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu0,on_unused_input=warn \
    python $nematus_home/nematus/rescore.py \
        -m <MODELS> \
        -s <SRC_BPE> \
        -i <N_BEST_BASE> \
        -o <N_BEST_RESCORED> \
        -b 80 \
        --reconstruction_cost_file <RCOSTS>

3. Combine n-best and reconstruction costs files into extended n-best list:

    ./combine-nbest-rcost.py <N_BEST_BASE> <RCOSTS> \
        --pick-tcosts 0 \
        > <NBEST_COMBINED>

    The --pick-tcosts option specifies a list of scores (translation costs) that
    should be copied from <N_BEST_BASE>.  By default, all scores are copied.
    Similarly, the --pick-rcosts option specifies which reconstruction costs
    should be copied from <RCOSTS>.

4.  Normalize n-best list scores (alpha parameter controls length penalty):

    ./normalize-nbest.py <SRC_BPE> <ALPHA> \
        < <NBEST_COMBINED> \
        > <NBEST_NORMALIZED>

5.  Rerank to get new 1-best (lambda parameter controls reconstructor weight)

    ./rerank-recon.py inf <LAMBDA> \
        < <NBEST_NORMALIZED> \
        > <1BEST_BPE>