-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great, I especially love the thorough documentation 💯
I just have two comments.
First: it would be good to have an example of how to use these in practice with a transformer model or something. But that could potentially be in a blog post or GitHub Discussion.
Second: these classes don't have any attributes, so it seems like they should just be functions.
Thanks for the comment! 1) After playing around with the Transformers the past couple of days, I think I'm going to spend next week turning all my metrics and algorithm implementations into wrappers for Transformers and TextFieldEmbedders. For example, to use the bias mitigation algorithms effectively with BERT, you have to finetune with the bias mitigation layer inserted after the base word embedding weights. It would be nice to have a wrapper that handles this + examples of how to use the wrapper. 2) Yup, I agree, I think my initial thought was to make them differentiable, but it doesn't make too much sense to make these metrics differentiable given that they're not particularly smooth. |
Sorry, I changed my mind regarding this. I would prefer to keep them as classes right now, in case I want to make them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine. How do you envision the use of the bias metrics? For instance, right now, the NLI score is essentially just measuring the accuracy for a specific label, for a single batch of predictions. Is this something that we may want to evaluate over multiple batches?
AB_sim_m = torch.matmul(attribute_embeddings, mean_target_embedding1) | ||
AB_sim_f = torch.matmul(attribute_embeddings, mean_target_embedding2) | ||
|
||
return self.spearman_correlation(AB_sim_m, AB_sim_f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to not use the scipy.stats
implementation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep everything differentiable if possible.
I'm considering making the bias metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* finished WEAT * finished bias metrics * updated CHANGELOG * fixed gpu issu * fixed gpu issue * expanded NLI to include more NLI scores and work in batched and distributed settings * removed evaluate bias mitigation command from this PR Co-authored-by: Arjun Subramonian <[email protected]> Co-authored-by: Arjun Subramonian <[email protected]> Co-authored-by: Akshita Bhagia <[email protected]> Co-authored-by: Arjun Subramonian <[email protected]>
Additions proposed in this pull request:
WordEmbeddingAssociationTest
,EmbeddingCoherenceTest
, andNaturalLanguageInference
.