"Flex Tape Can't Fix That": Pitfalls of Model Editing

Here is the code used for the paper "Flex Tape Can't Fix That": Pitfalls of Model Editing, accepted to EMNLP 2024 (preprint). Experimental code is mainly in the experiments directory, data preprocessing code is in dsets, and evaluation code is in eval. This repo was cloned and modified based on MEMIT, and some instructions have been duplicated from there.

Installation

We recommend conda for managing Python, CUDA, and PyTorch; pip is for everything else. To get started, simply install conda and run:

CONDA_HOME=$CONDA_HOME ./scripts/setup_conda.sh

$CONDA_HOME should be the path to your conda installation, e.g., ~/miniconda3.

Running the Evaluation Suites

experiments/evaluate.py can be used to evaluate any method in baselines/.

For example:

python3 -m experiments.evaluate \
    --alg_name=MEMIT \
    --model_name=EleutherAI/gpt-j-6B \
    --hparams_fname=EleutherAI_gpt-j-6B.json \
    --num_edits=10000 \
    --use_cache

Results from each run are stored at results/<model_name>/<method_name>/run_<run_id> in a specific format:

results/
|__ MODEL_NAME/
    |__ MEMIT/
        |__ run_<run_id>/
            |__ params.json
            |__ case_0.json
            |__ case_1.json
            |__ ...
            |__ case_10000.json

To run the GPT-J-based cross-subject experiments described in the paper, we've created a shell script that can be invoked by the following command:

bash eval.sh [DS_NAME]

where [DS_NAME] is one of the properties or property pairs among the files that begin with seesaw_cf_ in data/ - e.g. P101, P103, P101_P21, etc.

To run cross-property experiments based on GPT-J, run:

bash eval_pair.sh

For Llama-based experiments, we have two analogous shell scripts, but they can be run in succession with one script. Specifically, run:

bash llama_script.sh

To run a different Llama-based model, open job_script.sh and change the model name to your desired model name (it should be the HuggingFace identifier of the model). Additionally, open util/globals.py and add a shorthand for your new model in the MODEL_DICT variable.

For Mistral-based experiments, we have an analogous script:

bash mistral_script.sh

Similarly, change the model name in mistral_script.sh to the HuggingFace identifier of your desired model, and add a shorthand for the identifier in MODEL_DICT within util/globals.py.

There are also several evaluation scripts for the results produced by these experiments, found in the eval/completion_evaluation directory, and they can also be run all together through shell scripts in this directory. Specifically, to evaluate single-property completions on the axis of gender, you can run

cd eval/completion_evaluation
bash gender_eval.sh [MODEL_NAME] [METHOD_NAME]

For evaluation of single-property completions by race and geographic origin (since both are based on ethnic groups, they are run in one script), run

cd eval/completion_evaluation
bash race_eval.sh [MODEL_NAME] [METHOD_NAME]

For both cases, MODEL_NAME is a shorthand for the name of the model, METHOD_NAME is the editing method being evaluated - FT, MEND, or MEMIT. The conversions from full model name to shorthand are as follows for the five models we have included in our paper:

MODEL_DICT = {"EleutherAI/gpt-j-6B": "gptj",
              "meta-llama/Llama-2-7b-hf": "llama",
              "meta-llama/Llama-2-7b-chat-hf": "llamac",
              "mistralai/Mistral-7B-Instruct-v0.2": "mistral",
              "mistralai/Mistral-7B-v0.1": "mistralb"}

For example, to evaluate Llama2-chat by race on MEMIT, you would run:

cd eval/completion_evaluation
bash race_eval.sh llamac MEMIT

For evaluation of cross-property completions, run

cd eval
python pair_single.py [MODEL_NAME] [METHOD_NAME]

where MODEL_NAME and METHOD_NAME are defined as above.

We also provide a script to get t-scores for significance testing of our results:

cd eval
python ttests.py [MODEL_NAME] [METHOD_NAME]

Finally, if you're curious about agreement metrics for our long-form text annotations, feel free to run

cd eval
python agreement.py

How to Cite

@inproceedings{halevy2024flex,
               title={"Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models}, 
               author={Karina Halevy and Anna Sotnikova and Badr AlKhamissi and Syrielle Montariol and Antoine Bosselut},
               editor={Yaser Al-Onaizan and Mohit Bansal and Vivian Chen},
               year={2024},
               booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
               month = nov,
               year = "2024",
               address = "Miami, Florida",
               publisher = "Association for Computational Linguistics",
}

Name		Name	Last commit message	Last commit date
Latest commit History 422 Commits
EasyEdit @ 16bdf67		EasyEdit @ 16bdf67
baselines		baselines
data		data
dsets		dsets
effs/llamac		effs/llamac
eval		eval
experiments		experiments
hparams		hparams
memit		memit
rome		rome
scripts		scripts
util		util
.DS_Store		.DS_Store
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
eff_script.sh		eff_script.sh
entrypoint.sh		entrypoint.sh
environment.yml		environment.yml
eval.sh		eval.sh
eval_llama.sh		eval_llama.sh
eval_mistral.sh		eval_mistral.sh
eval_pair.sh		eval_pair.sh
eval_pair_llama.sh		eval_pair_llama.sh
eval_pair_mistral.sh		eval_pair_mistral.sh
flextape.err		flextape.err
flextape.out		flextape.out
flextapebig.err		flextapebig.err
globals.yml		globals.yml
job_script.sh		job_script.sh
logs.txt		logs.txt
mistral_script.sh		mistral_script.sh
my_out.txt		my_out.txt
requirements.txt		requirements.txt
scaling_curves.sh		scaling_curves.sh
zsre_evals.sh		zsre_evals.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

"Flex Tape Can't Fix That": Pitfalls of Model Editing

Installation

Running the Evaluation Suites

How to Cite

About

Releases

Packages

Languages

License

epfl-nlp/flextape

Folders and files

Latest commit

History

Repository files navigation

"Flex Tape Can't Fix That": Pitfalls of Model Editing

Installation

Running the Evaluation Suites

How to Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages