Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loggerd: add unit tests and fix identified sync issues #34840

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

deanlee
Copy link
Contributor

@deanlee deanlee commented Mar 11, 2025

resolve: #34839

This PR adds unit tests specifically aimed at resolving synchronization issues between encoderd and loggerd. While existing tests thoroughly cover logger rotation, these new test cases address the previously untested logic for synchronizing encoder segment numbers with logger segment numbers.

Key Changes:

  1. Refactored the code to make unit testing feasible. Due to the original design, it was nearly impossible to unit test the synchronization logic. To enable testing, LoggerdState and RemoteEncoder classes were moved to loggerd.h.
  2. The synchronization logic has been encapsulated into a new method RemoteEncoder::syncSegment, which can now be unit tested.

Future Improvements:
As part of future work, loggerd.h can be split to separate configuration and logic. For instance, renaming loggerd.h to encoder_config.h and limiting loggerd.h to only contain declarations used by loggerd.cc. However, since this would introduce substantial changes, it will be addressed in a follow-up PR.

@deanlee
Copy link
Contributor Author

deanlee commented Mar 11, 2025

Unit tests revealed three flawed assumptions in the encoder-logger synchronization logic, leading to synchronization failures and continuous queue growth, which ultimately exhausted memory (these issues will be fixed later in this PR.):

  1. Incorrect Assumption: The encoder’s first packet always starts at segment 0
    Problem: The code assumes the encoder’s first packet is aligned with segment 0, resulting in incorrect segment offset calculations.
    Location:

    re.encoderd_segment_offset = idx.getSegmentNum();

    Fix: Dynamically adjust the encoder segment offset based on the actual segment number:

    re.encoderd_segment_offset = idx.getSegmentNum() - s.logger.segment();

  2. Incorrect Assumption: loggerd's segment always starts at 0 when receiving the encoder stream

    Problem: The logic assumes that loggerd starts at segment 0 when it begins receiving the encoder stream, which causes segment tracking to be misaligned.
    Location:

    name.c_str(), idx.getSegmentNum(), s->logger.segment(), re.encoderd_segment_offset);
    // free the message, it's useless. this should never happen
    // actually, this can happen if you restart encoderd
    re.encoderd_segment_offset = -s->logger.segment();

    Fix: Adjust the offset by referencing loggerd's current segment:

    re.encoderd_segment_offset = idx.getSegmentNum() - s->logger.segment();

  3. Incorrect Assumption: Encoder segment numbers always increment sequentially

    Problem: The code assumes the encoder’s segment numbers increment sequentially by 1 (e.g., 5 to 6). If the encoder skips segments (e.g., 5 to 7), this creates a gap larger than 1 between the adjusted encoder segment and loggerd's segment, causing synchronization to fail.
    Location:

    } else if (offset_segment_num > s->logger.segment()) {
    // encoderd packet has a newer segment, this means encoderd has rolled over
    if (!re.marked_ready_to_rotate) {
    re.marked_ready_to_rotate = true;
    ++s->ready_to_rotate;
    LOGD("rotate %d -> %d ready %d/%d for %s",
    s->logger.segment(), offset_segment_num,
    s->ready_to_rotate.load(), s->max_waiting, name.c_str());
    }
    // queue up all the new segment messages, they go in after the rotate
    re.q.push_back(msg);

    Fix: Update the logic to account for non-sequential segment jumps and maintain synchronization.

@deanlee deanlee changed the title loggerd: add unit tests to address sync issues loggerd: add unit tests and fix identified sync issues Mar 11, 2025
@deanlee deanlee marked this pull request as ready for review March 12, 2025 02:18
@deanlee deanlee force-pushed the loggerd_unit_tests_for_sync branch 2 times, most recently from 9351c2f to fcc7a0c Compare March 12, 2025 11:10
@deanlee deanlee force-pushed the loggerd_unit_tests_for_sync branch from fcc7a0c to 492a959 Compare March 12, 2025 12:16
@adeebshihadeh adeebshihadeh added this to the 0.9.9 milestone Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

loggerd: define behavior when encoders get out of sync
2 participants