-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Hadamard Gaussian noise #2481
base: main
Are you sure you want to change the base?
Conversation
Two areas that likely need improving:
Also, I should credit @jpfolch for help with this implementation! |
num_of_tasks : number of tasks in the multi output GP | ||
noise_prior : any prior you want to put on the noise | ||
noise_constraint : constraint to put on the noise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_of_tasks : number of tasks in the multi output GP | |
noise_prior : any prior you want to put on the noise | |
noise_constraint : constraint to put on the noise | |
num_of_tasks: Number of tasks in the multi-output GP. | |
noise_prior: Prior for the noise. | |
noise_constraint: Constraint on the noise value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_shape
is missing here
num_tasks, | ||
noise_prior=None, | ||
noise_constraint=None, | ||
batch_shape=torch.Size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add type annotations here?
def __init__( | ||
self, | ||
num_tasks, | ||
noise_prior=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can only define a single prior for all tasks here - I think that's ok but probably worth mentioning this limitation in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be possible to have a multi-dimensional prior on your task noises? I had assumed that this will work by-default with the current implementation, but I will confirm this behaviour works as intended.
noise_prior=LogNormalPrior(
loc=torch.tensor([0., 1.]),
scale=torch.tensor([1., 1.])
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm you're right, that should work with a multi-dimensional prior (but not with different types of priors on different tasks)
|
||
def _shaped_noise_covar(self, base_shape: torch.Size, *params: Any, **kwargs: Any): | ||
# params contains task indexes | ||
task_idxs = params[0][-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Language nit
task_idxs = params[0][-1] | |
task_idcs = params[0][-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said, some runtime typechecking to make sure these are indeed the task indices would probably be good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you can introduce some comments with the shapes of the intermediate tensors / matrices in this function that would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a "canonical" way to check that these are the task indices? I will check shape and dtype, but I wasn't sure if there was a way to guarantee that task_idcs
contains what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is, unfortunately
*params: Any, | ||
**kwargs: Any, | ||
) -> base_distributions.Normal: | ||
noise = self._shaped_noise_covar(function_samples.shape, *params, **kwargs).diag() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate that IIUC self._shaped_noise_covar
always returns the full covariance but that you're calling diag()
on it here - ideally we can have a way to just compute the diagonal here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, if self._shaped_noise_covar
returns a LinearOperator, then calling .diagonal
should only calculate the diagonal (as in _MultiTaskGaussianLikelihoodBase.forward()
). I'm not quite sure if my _shaped_noise_covar
function does return a correct LinearOperator that shows this behaviour. I'll look into MaskedLinearOperator
as you mentioned below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure if my _shaped_noise_covar function does return a correct LinearOperator that shows this behaviour
I was worried since you're using .sum(dim=-3)
in self._shaped_noise_covar
, will that return a LinearOperator
in your case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked, and yes - the function does return a LinearOperator
, since sum
ming a LinearOperator
will return another LinearOperator
here.
|
||
def marginal(self, function_dist: MultivariateNormal, *params: Any, **kwargs: Any) -> MultivariateNormal: | ||
mean, covar = function_dist.mean, function_dist.lazy_covariance_matrix | ||
noise_covar = self._shaped_noise_covar(mean.shape, *params, **kwargs).squeeze(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the .squeeze(0)
necessary here? Does it work for general batch shapes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The squeeze
here is due to the shape of MultitaskHomoskedasticNoise.forward
. It seems to add an extra dimension than I would expect, as the output is (1, num_tasks, num_data, num_data)
. I have changed this to .squeeze(-4)
, and moved it to where it is actually relevant for clarity.
base_shape = torch.Size([40])
HomoskedasticNoise()(shape=base_shape).shape # (40, 40)
MultitaskHomoskedasticNoise(num_tasks=2)(shape=base_shape).shape # (1, 2, 40, 40)
diag = torch.eq(all_tasks, task_idxs.mT) | ||
mask = DiagLinearOperator(diag) | ||
return (noise_base_covar_matrix @ mask).sum(dim=-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use a MaskedLinearOperator
here? Not sure if that would necessarily be better (seems like it wouldn't help with the diag
comment below b/c of this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be used here. In my code, diag
, which is the diagonal of the mask matrix, has shape (num_tasks, num_data)
. However, it looks like MaskedLinearOperator
would only allow for a mask of shape (num_data,)
.
This could feasibly be rewritten as a comprehension, summing over num_tasks
different MaskedLinearOperator
, but I don't think it would help with the readability of the code, which does already return a LinearOperator
.
@@ -29,7 +29,22 @@ | |||
"cell_type": "code", | |||
"execution_count": 1, | |||
"metadata": {}, | |||
"outputs": [], | |||
"outputs": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we could avoid emitting this unrelated & unnecessary warning
@@ -108,7 +123,7 @@ | |||
" \n", | |||
" return gpytorch.distributions.MultivariateNormal(mean_x, covar)\n", | |||
"\n", | |||
"likelihood = gpytorch.likelihoods.GaussianLikelihood()\n", | |||
"likelihood = gpytorch.likelihoods.HadamardGaussianLikelihood(num_tasks=2)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than changing the tutorial to only use task-specific noises, it would be good to instead show both cases and discuss the pros and cons of both approaches in the tutorial. Ideally you could produce the output you shared in pytorch/botorch#2765 (comment)
We could also try to highlight a potential failure mode where in the very low data regime andwith uniform noise levels across tasks, estimating task-specific noises results in worse performance due to the less parsimonious model.
Co-authored-by: Max Balandat <[email protected]>
Co-authored-by: Max Balandat <[email protected]>
Co-authored-by: Max Balandat <[email protected]>
Thanks for the thorough review, @Balandat! I've left a couple of comments and will update the PR soon to reflect your suggestions. |
Hi @Balandat, I've pushed some changes that should address some of the concerns raised:
I'm not sure why the readthedocs check is failing - seems to be having trouble importing Hope you have a lovely weekend! Please let me know if there are any other changes to be made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, code looks great. The notebook has no outputs included though? Can you re-run this and make sure that it includes the figures so that users can see the plots without having to run them themselves (and so they render on GitHub)?
I'm not sure why the readthedocs check is failing - seems to be having trouble importing matplotlib just for my notebook?
Not sure what this is - could be transient, let's just try again maybe it'll work on another push.
Addresses #877, #2416. Implements noise that is homoskedastic in inputs, but task-dependent.
Existing multi-task noises (https://docs.gpytorch.ai/en/latest/likelihoods.html#multi-dimensional-likelihoods) only work where all tasks are observed for each input. In the 'Hadamard' setting, where each input corresponds to exactly one task, there is no existing implementation that supports a different noise for each task.
See the updated
Hadamard Multitask GP Regression
for usage.Unit tests, documentation, type hinting will all also be provided soon!