Assistance with Predictions Using PLNmodels 1.2.1 #139

ivangalvan · 2024-10-04T14:02:20Z

Hello,

I am encountering differences in predictions when using the predict() function compared to the myPLN$fitted object. Here is a small reproducible example:

library(PLNmodels)
data(trichoptera)
trichoptera <- prepare_data(trichoptera$Abundance, trichoptera$Covariate)

myPLN <- PLN(Abundance ~ 1, data = trichoptera)

new_predictions = predict(myPLN,newdata = trichoptera)

plot(myPLN$fitted,new_predictions)

I would like to make predictions on both training and test sets and ensure that I am correctly using the predict() function for the test set. Could you please assist me with this issue? I am currently using PLNmodels_1.2.1.

Thank you for your help.

Best regards,
Ivan

The text was updated successfully, but these errors were encountered:

jchiquet · 2025-03-25T10:36:57Z

Hi @ivangalvan ,

(please @mahendra-mariadassou correct of complete my answer if needed)

Sorry it took us so long to answer.

Here is a short reminder about the mathematical rational of the fitted and predict function for the PLN model.

The fitted value $\hat{Y}$ sent back by myPLN$fitted is the approximated (variational) model conditional expectation $\tilde{\mathbb{E}}(Y)$ based on the estimated model parameter and on the variational parameter $M,S$, used to approximate $\mathbb{E}$. They depends on the current values of $Y,X$. So we get :

$$ \tilde{\mathbb{E}}(Y) = \exp(O + X \hat{B} + \hat{M} + \hat{S}^2) $$

When we predict from new data (ie new $X_{\text{new}}$), the predict(, newdata) function sends back

$$ \tilde{\mathbb{E}}(Y_{\text{new}}) = \exp(O + X_{\text{new}} \hat{B}) $$

Indeed, a part of the variability coming from the response part is unknown , because we do not observe $Y_{\text{new}}$.

So the discrepancy that you observed is the part of variability explained by the term $M + S^2$ that you cannot assume to know on new data. In the trichoptera dataset , a couple of points show a lot of variability compared to the others, so it is not well explained by the mean term $O + XB$ only :

myPLN <- PLN(Abundance ~ 1, data = trichoptera)
plot(myPLN$fitted, predict(myPLN,newdata = trichoptera, type = "response"))
abline(0,1)

If you give the response on top of the new data, we recover a perfect estimation (outputs of fitted and predict matches since we are now capable of estimating an additional part of the variance from $Y_{\text{new}}$ to get $M + S^2$) :```

plot(myPLN$fitted, predict(myPLN, trichoptera$Abundance, newdata = trichoptera, type = "response"))
abline(0,1)

But anyway, to answer your question, in a train/test context, since data is supposed to be relatively homogeneous, you are using it the right way. You just forget the "type = response" argument (otherwise it is sending back the link, that is $O + XB$, not $\exp(O + XB)$. In this case, we try

$$ \tilde{\mathbb{E}}(Y_{\text{new}}) = \exp(O + X_{\text{new}} \hat{B} + \text{diag}(\hat{\Sigma})),$$

where we try to estimate missing part of the variance $S^2$ from the model parameter $\Sigma$.

And if your test/train folds are homogeneous enough, the additional part from coming from $M$ should be close to $0$.

Hope that make sense.

mahendra-mariadassou self-assigned this Oct 7, 2024

jchiquet closed this as completed Mar 25, 2025

jchiquet reopened this Mar 25, 2025

jchiquet added a commit that referenced this issue Mar 25, 2025

fix in predict related to #139

30a6552

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assistance with Predictions Using PLNmodels 1.2.1 #139

Assistance with Predictions Using PLNmodels 1.2.1 #139

ivangalvan commented Oct 4, 2024

jchiquet commented Mar 25, 2025

Assistance with Predictions Using PLNmodels 1.2.1 #139

Assistance with Predictions Using PLNmodels 1.2.1 #139

Comments

ivangalvan commented Oct 4, 2024

jchiquet commented Mar 25, 2025