Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(manager/indexer): pruning fixes #1147

Merged
merged 33 commits into from
Nov 7, 2024
Merged

fix(manager/indexer): pruning fixes #1147

merged 33 commits into from
Nov 7, 2024

Conversation

srene
Copy link
Contributor

@srene srene commented Oct 18, 2024

PR Standards

Opening a pull request should be able to meet the following requirements

--

PR naming convention: https://hackmd.io/@nZpxHZ0CT7O5ngTp0TP9mg/HJP_jrm7A


Close #1145

<-- Briefly describe the content of this pull request -->

For Author:

  • Targeted PR against correct branch
  • included the correct type prefix in the PR title
  • Linked to Github issue with discussion and accepted design
  • Targets only one github issue
  • Wrote unit and integration tests
  • All CI checks have passed
  • Added relevant godoc comments

For Reviewer:

  • confirmed the correct type prefix in the PR title
  • Reviewers assigned
  • confirmed all author checklist items have been addressed

After reviewer approval:

  • In case targets main branch, PR should be squashed and merged.
  • In case PR targets a release branch, PR should be rebased.

@srene srene self-assigned this Oct 18, 2024
@srene srene requested a review from a team as a code owner October 18, 2024 11:22
@srene srene marked this pull request as draft October 18, 2024 11:22
@srene srene marked this pull request as ready for review October 18, 2024 13:56
Copy link
Contributor

@omritoptix omritoptix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still in the middle of the review

block/pruning.go Outdated
// PruneBlocks prune all block related data from dymint store up to (but not including) retainHeight. It returns the number of blocks pruned, used for testing.
func (m *Manager) PruneBlocks(retainHeight uint64) (uint64, error) {
// PruneBlocks prune all block related data from dymint store up to (but not including) retainHeight.
func (m *Manager) PruneBlocks(retainHeight uint64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can DRY it out and remove the else part also. check this (didn't test it):

func (m *Manager) PruneBlocks(retainHeight uint64) {
	nextSubmissionHeight := m.NextHeightToSubmit()
	if m.IsProposer() && nextSubmissionHeight < retainHeight {
		m.logger.Debug("cannot prune blocks before they have been submitted. using height last submitted height for pruning", "retain_height", retainHeight, "height_to_submit", nextSubmissionHeight)
		retainHeight = nextSubmissionHeight
	}

	m.pruneStore("blocksync", func(from, to uint64) (uint64, error) {
		return m.P2PClient.RemoveBlocks(context.Background(), from, to)
	}, &m.State.BaseBlocksyncHeight, retainHeight)
	m.pruneStore("indexer", m.IndexerService.Prune, &m.State.BaseIndexerHeight, retainHeight)
	m.pruneStore("dymint", func(from, to uint64) (uint64, error) {
		return m.Store.PruneStore(from, to, m.logger)
	}, &m.State.BaseHeight, retainHeight)

	if _, err := m.Store.SaveState(m.State, nil); err != nil {
		m.logger.Error("save state", "err", err)
	}
}

func (m *Manager) pruneStore(storeName string, pruneFunc func(uint64, uint64) (uint64, error), baseHeight *uint64, retainHeight uint64) {
	pruned, err := pruneFunc(*baseHeight, retainHeight)
	if err != nil {
		m.logger.Error(fmt.Sprintf("pruning %s store", storeName), "retain_height", retainHeight, "err", err)
		return
	}
	m.logger.Debug(fmt.Sprintf("%s store pruned", storeName), "from", *baseHeight, "to", retainHeight, "pruned", pruned)
	*baseHeight = retainHeight
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added inline func for logging

block/pruning.go Outdated
}

m.State.BaseHeight = retainHeight
// store state with base heights updates
_, err = m.Store.SaveState(m.State, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think saving the state in the background could lead on a race condition on the state as you may save incomplete state which was changed by the applyBlock.

The m.State should only be ever accessed in either thread-safe way or by the applyBlock.

I see no reason to save pruning data on the state but on an ad-hoc object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseheight moved to store outside state

pruned := uint64(0)
tobeFlushed := uint64(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tobe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

// first all events are pruned associated to the same height
prunedEvents, err := txi.pruneEvents(h, batch)
if err != nil {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log here at least something so will know we had problem pruning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@srene srene force-pushed the srene/1145-fix-pruning branch from 691f18a to e6d28c7 Compare October 22, 2024 16:17
@danwt
Copy link
Contributor

danwt commented Oct 22, 2024

build n test failing sir

@srene srene marked this pull request as draft October 23, 2024 07:46
@srene srene force-pushed the srene/1145-fix-pruning branch 3 times, most recently from a27927b to 75e0ffe Compare October 30, 2024 08:51
@srene srene marked this pull request as ready for review October 30, 2024 20:31
@srene srene requested a review from omritoptix October 30, 2024 20:32
block/manager.go Outdated
@@ -101,6 +101,9 @@ type Manager struct {

// validates all non-finalized state updates from settlement, checking there is consistency between DA and P2P blocks, and the information in the state update.
SettlementValidator *SettlementValidator

// BaseHeight is the height of the first block we have in store after pruning.
BaseHeight uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

store/pruning.go Outdated
}

prunedBlocks, err := s.pruneBlocks(from, to, logger)
prunedBlocks, err = s.pruneBlocks(from, to, logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we e.g get erorr here, than we still save the to as base height. (i.e some stores succeeded and some not).
not sure if critical for now but probably worth opening an issue on it otherwise you have stores which are partially pruned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually refactored the func to prune all indexes for every height iteration, so in case it fails it will be more likely all failed at the same height, although the solution not perfect. i created issue for this #1196

@@ -325,8 +325,12 @@ func (c *Client) GenesisChunked(context context.Context, id uint) (*ctypes.Resul
func (c *Client) BlockchainInfo(ctx context.Context, minHeight, maxHeight int64) (*ctypes.ResultBlockchainInfo, error) {
const limit int64 = 20

minHeight, maxHeight, err := filterMinMax(
int64(c.node.BlockManager.State.BaseHeight),
baseHeight, err := c.node.BlockManager.Store.LoadBaseHeight()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we only set base height when pruning. it means if we never set pruning, will never have base height and we won't be able to serve this method.

when was base height saved before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set to 1 in case baseheight not found

_, err := is.blockIdxr.Prune(from, to, is.Logger)
func (is *IndexerService) Prune(to uint64, s store.Store) (uint64, error) {
// load indexer base height
indexerBaseHeight, err := s.LoadIndexerBaseHeight()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that if we fail to loa the height we just log it and the indexerBaseHeight will be set to 0.
this means tha the pruning functions will be called with from 0 (e.g blockIndexer) (and may unncessarily iterate tons of indexes.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what is the issue here, if not found it means it was never set and therefore it needs to start from 0 because it was never pruned...


if to <= from {
return fmt.Errorf("to height must be greater than from height: to: %d: from: %d: %w", to, from, gerrc.ErrInvalidArgument)
from, err := c.store.LoadBlockSyncBaseHeight()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as my previous. from will be set to 0 which will cause very high iteration.

@srene srene force-pushed the srene/1145-fix-pruning branch from e92551b to 9f47fed Compare November 4, 2024 16:46
@srene srene requested a review from omritoptix November 4, 2024 16:48
@srene srene marked this pull request as draft November 5, 2024 07:23
@srene srene force-pushed the srene/1145-fix-pruning branch from 2510be5 to 4ee4cfb Compare November 5, 2024 12:06
@srene srene marked this pull request as ready for review November 5, 2024 12:30
@omritoptix omritoptix merged commit e5e8857 into main Nov 7, 2024
5 of 6 checks passed
@omritoptix omritoptix deleted the srene/1145-fix-pruning branch November 7, 2024 09:23
srene added a commit that referenced this pull request Nov 10, 2024
srene added a commit that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible memory leak while pruning
3 participants