Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Dual-Stream Information Exchange in MP-SENet #9

Open
EuiYeonKim opened this issue Feb 14, 2025 · 5 comments
Open

Question about Dual-Stream Information Exchange in MP-SENet #9

EuiYeonKim opened this issue Feb 14, 2025 · 5 comments

Comments

@EuiYeonKim
Copy link

Hello,

I’m really impressed by your approach to directly estimating phase, and I truly appreciate the great work you’ve been consistently publishing.

I wanted to ask about your previous paper, MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra. In that work, was there a specific reason why you didn’t adopt PHASEN’s dual-stream information exchange mechanism?

Looking forward to your thoughts!

Best regards,

@yxlu-0102
Copy link
Owner

Thank you for your interest in our previous works.

In fact, MP-SENet does not employ a dual-stream structure;
the magnitude and phase parts share the encoder and Transformer blocks, and only diverge in the decoders.
We believe that the magnitude and phase information is already integrated in the previous blocks, so there is no need for an additional information interaction mechanism.

@EuiYeonKim
Copy link
Author

Thank you for your response.
I understand that your point is that since there was already sufficient information exchange between amplitude and phase earlier, the parallel decoder does not need to exchange information later.
Your answer was very helpful. I will now close this issue.

@EuiYeonKim
Copy link
Author

Oh, and one more question! Would your AP-BWE model work well for a parallel decoder-only speech enhancement task, similar to PHASEN?

@yxlu-0102
Copy link
Owner

Sorry for the delayed reply, I believe the AP-BWE framework can handle the SE task by just modifying the magnitude stream to a masking or mapping-based architecture.

But I think it won't have such strong SE capabilities as the AP-BWE doesn't employ the Transformers to capture long-term dependencies, which is important for handling the time-variant noise in noisy signals.

@EuiYeonKim
Copy link
Author

Thank you for the detailed and helpful response!

I just have one last question. As far as I understand, you used a ConvNeXt block as the backbone, and converted the original 2D model into a 1D version.
Could you please explain the reasoning behind this design choice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants