Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hadamard Gaussian noise #2481

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

TobyBoyne
Copy link

Addresses #877, #2416. Implements noise that is homoskedastic in inputs, but task-dependent.

Existing multi-task noises (https://docs.gpytorch.ai/en/latest/likelihoods.html#multi-dimensional-likelihoods) only work where all tasks are observed for each input. In the 'Hadamard' setting, where each input corresponds to exactly one task, there is no existing implementation that supports a different noise for each task.

See the updated Hadamard Multitask GP Regression for usage.

Unit tests, documentation, type hinting will all also be provided soon!

@TobyBoyne
Copy link
Author

Two areas that likely need improving:

  • The implementation of _shaped_noise_covar probably needs some work. I'm not sure if there is some linear_operator magic to make this neater?
  • The syntax can be a little questionable. For example, I don't like the fact that the task indexes need to be in a list in mll(output, full_train_y, [full_train_i]). It is also not obvious that the implementation requires the task indexes to be passed as the last argument to the model, since they are accessed by params[0][-1]. At least some type checking is required here?

Also, I should credit @jpfolch for help with this implementation!

Comment on lines 20 to 22
num_of_tasks : number of tasks in the multi output GP
noise_prior : any prior you want to put on the noise
noise_constraint : constraint to put on the noise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_of_tasks : number of tasks in the multi output GP
noise_prior : any prior you want to put on the noise
noise_constraint : constraint to put on the noise
num_of_tasks: Number of tasks in the multi-output GP.
noise_prior: Prior for the noise.
noise_constraint: Constraint on the noise value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_shape is missing here

Comment on lines 27 to 30
num_tasks,
noise_prior=None,
noise_constraint=None,
batch_shape=torch.Size(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add type annotations here?

def __init__(
self,
num_tasks,
noise_prior=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can only define a single prior for all tasks here - I think that's ok but probably worth mentioning this limitation in the docstring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be possible to have a multi-dimensional prior on your task noises? I had assumed that this will work by-default with the current implementation, but I will confirm this behaviour works as intended.

noise_prior=LogNormalPrior(
    loc=torch.tensor([0., 1.]), 
    scale=torch.tensor([1., 1.])
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm you're right, that should work with a multi-dimensional prior (but not with different types of priors on different tasks)


def _shaped_noise_covar(self, base_shape: torch.Size, *params: Any, **kwargs: Any):
# params contains task indexes
task_idxs = params[0][-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language nit

Suggested change
task_idxs = params[0][-1]
task_idcs = params[0][-1]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, some runtime typechecking to make sure these are indeed the task indices would probably be good here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you can introduce some comments with the shapes of the intermediate tensors / matrices in this function that would be helpful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a "canonical" way to check that these are the task indices? I will check shape and dtype, but I wasn't sure if there was a way to guarantee that task_idcs contains what we want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is, unfortunately

*params: Any,
**kwargs: Any,
) -> base_distributions.Normal:
noise = self._shaped_noise_covar(function_samples.shape, *params, **kwargs).diag()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that IIUC self._shaped_noise_covar always returns the full covariance but that you're calling diag() on it here - ideally we can have a way to just compute the diagonal here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, if self._shaped_noise_covar returns a LinearOperator, then calling .diagonal should only calculate the diagonal (as in _MultiTaskGaussianLikelihoodBase.forward()). I'm not quite sure if my _shaped_noise_covar function does return a correct LinearOperator that shows this behaviour. I'll look into MaskedLinearOperator as you mentioned below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure if my _shaped_noise_covar function does return a correct LinearOperator that shows this behaviour

I was worried since you're using .sum(dim=-3) in self._shaped_noise_covar, will that return a LinearOperator in your case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked, and yes - the function does return a LinearOperator, since summing a LinearOperator will return another LinearOperator here.


def marginal(self, function_dist: MultivariateNormal, *params: Any, **kwargs: Any) -> MultivariateNormal:
mean, covar = function_dist.mean, function_dist.lazy_covariance_matrix
noise_covar = self._shaped_noise_covar(mean.shape, *params, **kwargs).squeeze(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the .squeeze(0) necessary here? Does it work for general batch shapes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The squeeze here is due to the shape of MultitaskHomoskedasticNoise.forward. It seems to add an extra dimension than I would expect, as the output is (1, num_tasks, num_data, num_data). I have changed this to .squeeze(-4), and moved it to where it is actually relevant for clarity.

base_shape = torch.Size([40])
HomoskedasticNoise()(shape=base_shape).shape # (40, 40)
MultitaskHomoskedasticNoise(num_tasks=2)(shape=base_shape).shape # (1, 2, 40, 40)

Comment on lines 64 to 66
diag = torch.eq(all_tasks, task_idxs.mT)
mask = DiagLinearOperator(diag)
return (noise_base_covar_matrix @ mask).sum(dim=-3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a MaskedLinearOperator here? Not sure if that would necessarily be better (seems like it wouldn't help with the diag comment below b/c of this).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be used here. In my code, diag, which is the diagonal of the mask matrix, has shape (num_tasks, num_data). However, it looks like MaskedLinearOperator would only allow for a mask of shape (num_data,).

This could feasibly be rewritten as a comprehension, summing over num_tasks different MaskedLinearOperator, but I don't think it would help with the readability of the code, which does already return a LinearOperator.

@@ -29,7 +29,22 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"outputs": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we could avoid emitting this unrelated & unnecessary warning

@@ -108,7 +123,7 @@
" \n",
" return gpytorch.distributions.MultivariateNormal(mean_x, covar)\n",
"\n",
"likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
"likelihood = gpytorch.likelihoods.HadamardGaussianLikelihood(num_tasks=2)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than changing the tutorial to only use task-specific noises, it would be good to instead show both cases and discuss the pros and cons of both approaches in the tutorial. Ideally you could produce the output you shared in pytorch/botorch#2765 (comment)

We could also try to highlight a potential failure mode where in the very low data regime andwith uniform noise levels across tasks, estimating task-specific noises results in worse performance due to the less parsimonious model.

@TobyBoyne
Copy link
Author

Thanks for the thorough review, @Balandat! I've left a couple of comments and will update the PR soon to reflect your suggestions.

@TobyBoyne
Copy link
Author

TobyBoyne commented Mar 14, 2025

Hi @Balandat, I've pushed some changes that should address some of the concerns raised:

  • Added tests to check behaviour in batched settings
  • Rearranged notebook to discuss pros and cons of task-specific noise in appendix
  • Added the requested type hints and shape comments

I'm not sure why the readthedocs check is failing - seems to be having trouble importing matplotlib just for my notebook?

Hope you have a lovely weekend! Please let me know if there are any other changes to be made.

@TobyBoyne TobyBoyne requested a review from Balandat March 18, 2025 19:38
Copy link
Collaborator

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, code looks great. The notebook has no outputs included though? Can you re-run this and make sure that it includes the figures so that users can see the plots without having to run them themselves (and so they render on GitHub)?

I'm not sure why the readthedocs check is failing - seems to be having trouble importing matplotlib just for my notebook?

Not sure what this is - could be transient, let's just try again maybe it'll work on another push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants