Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assistance with Predictions Using PLNmodels 1.2.1 #139

Open
ivangalvan opened this issue Oct 4, 2024 · 1 comment
Open

Assistance with Predictions Using PLNmodels 1.2.1 #139

ivangalvan opened this issue Oct 4, 2024 · 1 comment
Assignees

Comments

@ivangalvan
Copy link

Hello,

I am encountering differences in predictions when using the predict() function compared to the myPLN$fitted object. Here is a small reproducible example:

library(PLNmodels)
data(trichoptera)
trichoptera <- prepare_data(trichoptera$Abundance, trichoptera$Covariate)

myPLN <- PLN(Abundance ~ 1, data = trichoptera)

new_predictions = predict(myPLN,newdata = trichoptera)

plot(myPLN$fitted,new_predictions)

I would like to make predictions on both training and test sets and ensure that I am correctly using the predict() function for the test set. Could you please assist me with this issue? I am currently using PLNmodels_1.2.1.

Thank you for your help.

Best regards,
Ivan

@mahendra-mariadassou mahendra-mariadassou self-assigned this Oct 7, 2024
@jchiquet
Copy link
Member

Hi @ivangalvan ,

(please @mahendra-mariadassou correct of complete my answer if needed)

Sorry it took us so long to answer.

Here is a short reminder about the mathematical rational of the fitted and predict function for the PLN model.

The fitted value $\hat{Y}$ sent back by myPLN$fitted is the approximated (variational) model conditional expectation $\tilde{\mathbb{E}}(Y)$ based on the estimated model parameter and on the variational parameter $M,S$, used to approximate $\mathbb{E}$. They depends on the current values of $Y,X$. So we get :

$$ \tilde{\mathbb{E}}(Y) = \exp(O + X \hat{B} + \hat{M} + \hat{S}^2) $$

When we predict from new data (ie new $X_{\text{new}}$), the predict(, newdata) function sends back

$$ \tilde{\mathbb{E}}(Y_{\text{new}}) = \exp(O + X_{\text{new}} \hat{B}) $$

Indeed, a part of the variability coming from the response part is unknown , because we do not observe $Y_{\text{new}}$.

So the discrepancy that you observed is the part of variability explained by the term $M + S^2$ that you cannot assume to know on new data. In the trichoptera dataset , a couple of points show a lot of variability compared to the others, so it is not well explained by the mean term $O + XB$ only :

myPLN <- PLN(Abundance ~ 1, data = trichoptera)
plot(myPLN$fitted, predict(myPLN,newdata = trichoptera, type = "response"))
abline(0,1)

Image

If you give the response on top of the new data, we recover a perfect estimation (outputs of fitted and predict matches since we are now capable of estimating an additional part of the variance from $Y_{\text{new}}$ to get $M + S^2$) :```

plot(myPLN$fitted, predict(myPLN, trichoptera$Abundance, newdata = trichoptera, type = "response"))
abline(0,1)

Image

But anyway, to answer your question, in a train/test context, since data is supposed to be relatively homogeneous, you are using it the right way. You just forget the "type = response" argument (otherwise it is sending back the link, that is $O + XB$, not $\exp(O + XB)$. In this case, we try

$$ \tilde{\mathbb{E}}(Y_{\text{new}}) = \exp(O + X_{\text{new}} \hat{B} + \text{diag}(\hat{\Sigma})),$$

where we try to estimate missing part of the variance $S^2$ from the model parameter $\Sigma$.

And if your test/train folds are homogeneous enough, the additional part from coming from $M$ should be close to $0$.

Hope that make sense.

@jchiquet jchiquet reopened this Mar 25, 2025
jchiquet added a commit that referenced this issue Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants