-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement](cloud) Prohibit changing deployment mode #40764
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 43530 ms
|
TPC-DS: Total hot run time: 195235 ms
|
ClickBench: Total hot run time: 30.91 s
|
TeamCity be ut coverage result: |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
PR approved by at least one committer and no changes requested. |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
8a3408d
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TeamCity be ut coverage result: |
## Proposed changes At present, the version of separation of storage and computation version and the version of computational storage cannot be converted to each other. But if the user insists on mixing the two, there is no way to avoid it at the code level. The following are possible scenarios that may occur: Case | The node has been in Cloud cluster before | The node has been in Local cluster before | The node never been in any cluster -- | -- | -- | -- add BE to local cluster | Add successfully, but error `invalid cluster id. ignore. ` will be occurred. No negative impact on the original two clusters. | Add successfully, but error `invalid cluster id. ignore. ` will be occurred. No negative impact on the original two clusters. | If cloud configuration is not added, it can work normally<br />If cloud configuration has been added, it will resulting in the inability to start normally add FE to local cluster | Add successfully, but error `Socket is closed by peer. ` will be occurred. No negative impact on the original two clusters. | Add successfully, but error `Socket is closed by peer. ` will be occurred. No negative impact on the original two clusters. | If cloud configuration is not added, it can work normally<br />If cloud configuration has been added, it will resulting in the inability to start normally add BE to cloud cluster | Add successfully, but error `invalid cluster id. ignore. ` will be occurred. No negative impact on the original two clusters. | Add successfully, but error `invalid cluster id. ignore. ` will be occurred. No negative impact on the original two clusters. | If cloud configuration is not added, BE can run successfully, but error will occur when execute inserting.<br />If cloud configuration has been added, it can work normally add FE to cloud cluster | Add successfully, but error `Socket is closed by peer. ` will be occurred. No negative impact on the original two clusters. | Add successfully, but error `Socket is closed by peer. ` will be occurred. No negative impact on the original two clusters. | If cloud configuration is not added, FE will be hang and error `Unknown meta module: cloudWarmUpJob.`<br />If cloud configuration has been added, it can work normally ---- | Case | Situation | | --------------------------------------------- | ------------------------------------------------------------ | | BE in Local cluster add cloud config items | Hang up | | FE in Local cluster add cloud config items | Hang up | | BE in Cloud cluster remove cloud config items | run successfully, but error occur when do query or insert | | FE in Cloud cluster remove cloud config items | service down | In this PR, I will check Doris' deployment mode. If the deployment mode is modified later, the service will be down and a clear error message will be given. ---- ## 拟议变更 目前存算分离和存算一体模式不能互相转换,大部分情况下,这两种模式的部署应该不会搞混,但也不排除有些用户稀里糊涂,添加错了。另一个就是用户可能误删cloud相关的配置(比如从其他地方拷贝配置覆盖当前配置),导致以local模式启动。 针对不同集群的不同节点的情况: | 情况 | 此节点之前已在其他Cloud集群 | 此节点之前已在其他Local集群 | 此节点之前从未添加到任何集群 | | :-------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | | 把BE添加到Local的集群 | 可以添加,但心跳的时候会报invalid cluster id. ignore. 不影响原来两个集群的正常使用 | 可以添加,但心跳的时候会报invalid cluster id. ignore. 不影响原来两个集群的正常使用 | 如果未加cloud相关配置信息,能正常工作如果已加cloud相关配置信息,会以cloud的逻辑启动,导致不能正常启动 | | 把FE添加到Local的集群 | 可以添加,但心跳的时候会报 Socket is closed by peer. 不影响原来两个FE的正常使用 | 可以添加,但心跳的时候会报 Socket is closed by peer. 不影响原来两个FE的正常使用 | 如果未加cloud相关配置信息,能正常工作如果已加cloud相关配置信息,会以cloud的逻辑启动,导致不能正常启动 | | 把BE添加到Cloud的集群 | 可以添加,但心跳的时候会报invalid cluster id. ignore. 不影响原来两个集群的正常使用 | 可以添加,但心跳的时候会报invalid cluster id. ignore. 不影响原来两个集群的正常使用 | 如果未加cloud相关配置信息,能添加成功,但比如insert会报错,甚至会导致原有正常的be core如果已加cloud相关配置信息,能正常工作 | | 把FE添加到Cloud的集群 | 可以添加,但心跳的时候会报 Socket is closed by peer. 不影响原来两个FE的正常使用 | 可以添加,但心跳的时候会报 Socket is closed by peer. 不影响原来两个FE的正常使用 | 如果未加cloud相关配置信息如果没加入cloud集群,会报failed to get local fe's type, sleep 5 s, try again.如果已加入cloud集群,读取元数据会报错Unknown meta module: cloudWarmUpJob.,卡住如果已加cloud相关配置信息,能正常工作 | ---- | 情况 | 现象 | | :--------------------------- | :--------------------------------------------------- | | Local集群的BE添加cloud的配置 | 会以cloud的逻辑启动,导致启动卡住 | | Local集群的FE添加cloud的配置 | 会以cloud的逻辑启动,导致启动卡住 | | Cloud集群的BE删除cloud的配置 | 能正常启动,但查询导入会报错 | | Cloud集群的FE删除cloud的配置 | 不断刷get version from meta service failed,然后挂掉 | 针对这些情况,节点切换cloud/local模式的,应该快速失败,然后告知用户 --------- Co-authored-by: yagagagaga <[email protected]>
…0764 (#43891) Cherry-picked from #40764 Co-authored-by: yagagagaga <[email protected]> Co-authored-by: yagagagaga <[email protected]>
Proposed changes
At present, the version of separation of storage and computation version and the version of computational storage cannot be converted to each other. But if the user insists on mixing the two, there is no way to avoid it at the code level. The following are possible scenarios that may occur:
invalid cluster id. ignore.
will be occurred. No negative impact on the original two clusters.invalid cluster id. ignore.
will be occurred. No negative impact on the original two clusters.If cloud configuration has been added, it will resulting in the inability to start normally
Socket is closed by peer.
will be occurred. No negative impact on the original two clusters.Socket is closed by peer.
will be occurred. No negative impact on the original two clusters.If cloud configuration has been added, it will resulting in the inability to start normally
invalid cluster id. ignore.
will be occurred. No negative impact on the original two clusters.invalid cluster id. ignore.
will be occurred. No negative impact on the original two clusters.If cloud configuration has been added, it can work normally
Socket is closed by peer.
will be occurred. No negative impact on the original two clusters.Socket is closed by peer.
will be occurred. No negative impact on the original two clusters.Unknown meta module: cloudWarmUpJob.
If cloud configuration has been added, it can work normally
In this PR, I will check Doris' deployment mode. If the deployment mode is modified later, the service will be down and a clear error message will be given.
拟议变更
目前存算分离和存算一体模式不能互相转换,大部分情况下,这两种模式的部署应该不会搞混,但也不排除有些用户稀里糊涂,添加错了。另一个就是用户可能误删cloud相关的配置(比如从其他地方拷贝配置覆盖当前配置),导致以local模式启动。
针对不同集群的不同节点的情况:
针对这些情况,节点切换cloud/local模式的,应该快速失败,然后告知用户