Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed members-only content being leaked in excerpts #434

Merged
merged 9 commits into from
Mar 27, 2025

Conversation

sagzy
Copy link
Contributor

@sagzy sagzy commented Mar 25, 2025

closes https://linear.app/ghost/issue/AP-943/members-only-content-is-leaked-in-excerpt

  • when a post visibility is set to members-only/paid-members-only/specific-tiers, we remove the gated content before federation
  • however, the post excerpt still exposed gated content. This is now fixed, by re-generating the excerpt from the public content

closes https://linear.app/ghost/issue/AP-943/members-only-content-is-leaked-in-excerpt

- when a post visibility is set to members-only/paid-members-only/specific-tiers, we remove the gated content before federation
- however, the post excerpt still exposed gated content. This is now fixed, by re-generating the excerpt from the public content
Copy link

coderabbitai bot commented Mar 25, 2025

Walkthrough

The changes introduce a new property custom_excerpt to the GhostPost interface in the Post entity, allowing for an additional excerpt option. The createArticleFromGhostPost method is modified to utilize this new property, updating the logic to determine the excerpt based on whether custom_excerpt is null or an empty string. A new static method regenerateExcerpt is added to the ContentPreparer class, which generates excerpts from HTML content while adhering to a specified character limit. This method is complemented by an instance method of the same name. A new test suite for the regenerateExcerpt method is created, including several test cases that validate its functionality. Additionally, the package.json file is updated to include new dependencies for HTML content processing, enhancing type safety and functionality.

Suggested reviewers

  • allouis

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bca8aad and 046a26b.

📒 Files selected for processing (5)
  • features/step_definitions/stepdefs.js (1 hunks)
  • src/http/api/webhook.ts (1 hunks)
  • src/post/post.entity.ts (3 hunks)
  • src/post/post.entity.unit.test.ts (8 hunks)
  • src/post/post.repository.knex.integration.test.ts (18 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • features/step_definitions/stepdefs.js
  • src/post/post.entity.unit.test.ts
  • src/post/post.repository.knex.integration.test.ts
  • src/post/post.entity.ts
  • src/http/api/webhook.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build, Test and Push

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/publishing/content.ts (2)

45-47: Parameter name mismatch between static and instance methods.

The static method uses wordLimit while the instance method uses charLimit. This creates confusion about the actual functionality, as one suggests word-based truncation while the actual implementation is character-based.

-static generateExcerpt(html: string, wordLimit = 50) {
-    return ContentPreparer.instance.generateExcerpt(html, wordLimit);
+static generateExcerpt(html: string, charLimit = 500) {
+    return ContentPreparer.instance.generateExcerpt(html, charLimit);

85-97: Consider HTML-aware truncation for the excerpt generation.

The current implementation does a simple character-based truncation without considering HTML structure. This could potentially result in malformed HTML if truncation happens in the middle of an HTML tag.

Consider using an HTML-aware approach that preserves the integrity of HTML tags in the excerpt. You could:

  1. Use a DOM parser to properly handle HTML structure
  2. Ensure truncation happens at word or sentence boundaries
  3. Properly close any open HTML tags after truncation
import { parseHTML } from 'some-html-parser-library';

generateExcerpt(content: string, charLimit = 500) {
    if (content.length <= charLimit) {
        return content;
    }

    // Simple implementation that at least truncates at word boundaries
    let truncated = content.substring(0, charLimit - 3);
    // Find the last space to avoid cutting words in half
    const lastSpace = truncated.lastIndexOf(' ');
    if (lastSpace > 0) {
        truncated = truncated.substring(0, lastSpace);
    }
    
    return `${truncated}...`;
}

For a more robust solution, consider using a dedicated HTML parsing library that can properly handle HTML tag closure.

src/publishing/content.unit.test.ts (1)

92-110: Good basic test coverage for excerpt generation.

The tests cover the basic functionality of the generateExcerpt method, ensuring it returns the full content when under the limit and properly truncates with an ellipsis when over the limit.

Consider adding tests for additional edge cases:

  1. Empty content
  2. Content exactly at the character limit
  3. Content with HTML tags to ensure proper handling of HTML structure

Example additional test:

it('handles content with HTML tags appropriately', () => {
    const content = '<p>This is a paragraph with <strong>bold text</strong> that should be handled properly</p>';
    const result = preparer.generateExcerpt(content, 30);
    
    // Verify HTML is preserved correctly in truncation
    expect(result).toContain('...');
    expect(result.length).toEqual(30);
});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fa309d and 9b00c03.

📒 Files selected for processing (3)
  • src/post/post.entity.ts (2 hunks)
  • src/publishing/content.ts (2 hunks)
  • src/publishing/content.unit.test.ts (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/post/post.entity.ts (1)
src/publishing/content.ts (1)
  • ContentPreparer (30-150)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build, Test and Push
🔇 Additional comments (3)
src/post/post.entity.ts (3)

221-221: Good initialization of the excerpt variable.

This initialization preserves the original excerpt value from the Ghost post, maintaining backward compatibility for public posts while allowing for customization in non-public posts.


227-228: Excellent fix for the member content leakage issue.

This change addresses the core issue by regenerating the excerpt from the already-sanitized content when dealing with non-public posts. This ensures that any members-only content removed from the main content is also not present in the excerpt.


245-245: Proper usage of the processed excerpt in the Post constructor.

The change correctly uses the potentially modified excerpt variable instead of directly using ghostPost.excerpt, completing the fix for preventing members-only content leakage in excerpts.

@sagzy sagzy force-pushed the fix/excerpt-contains-gated-content branch from 6aa0de4 to 10abc2d Compare March 25, 2025 09:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b00c03 and 10abc2d.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (5)
  • package.json (2 hunks)
  • src/post/post.entity.ts (2 hunks)
  • src/post/post.entity.unit.test.ts (2 hunks)
  • src/publishing/content.ts (3 hunks)
  • src/publishing/content.unit.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/post/post.entity.ts
🧰 Additional context used
🪛 GitHub Actions: CI
src/publishing/content.ts

[error] 1-2: Import statements differs from the output.

🔇 Additional comments (6)
src/post/post.entity.unit.test.ts (1)

181-201: Test now correctly verifies both content and excerpt filtering.

The test description and implementation have been appropriately updated to verify that both the content and excerpt in private posts are properly filtered. This ensures that members-only content isn't leaked in excerpts, which aligns with the PR's objective.

package.json (2)

34-36: LGTM!

Simple formatting change that doesn't affect functionality.


41-41: Added appropriate dependencies for HTML-to-text conversion.

The added dependencies (html-to-text and its type definitions) are necessary for the new excerpt regeneration functionality.

Also applies to: 73-73

src/publishing/content.ts (2)

45-47: LGTM!

Clean implementation of the static method that delegates to the instance method.


85-113: Well-implemented excerpt regeneration.

The implementation effectively converts HTML to text while:

  1. Skipping irrelevant elements (images, footnotes, etc.)
  2. Preserving important formatting
  3. Properly handling truncation with ellipsis

This directly addresses the PR's objective of ensuring members-only content isn't leaked in excerpts.

src/publishing/content.unit.test.ts (1)

92-132: Comprehensive test coverage for the new functionality.

The test suite thoroughly verifies the regenerateExcerpt method's behavior:

  1. Correctly handles content shorter than the limit
  2. Properly truncates content that exceeds the limit
  3. Ignores image tags as expected
  4. Correctly processes link elements

These tests ensure the method works correctly in various scenarios and maintain the fix for the members-only content leakage issue.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/publishing/content.unit.test.ts (3)

122-131: Consider standardizing newline handling in excerpts.

The test expects newlines (\n\n) between the link text and paragraph content. This seems inconsistent with how other HTML elements are handled in the other tests. Consider whether this behavior is intentional, as different newline handling between element types might create inconsistent excerpt formatting.


144-152: Add length assertion for consistency.

For consistency with the other test cases, consider adding an assertion to verify the result length is equal to 48 characters:

 expect(result).toEqual(
     'I expect content to be truncated exactly here...',
 );
+expect(result.length).toEqual(48);

92-153: Consider adding tests for additional HTML elements and nested structures.

The current tests cover basic HTML elements like <img>, <a>, <figcaption>, and <hr>. To ensure comprehensive coverage, consider adding tests for:

  1. Common elements like <div>, <span>, <strong>, <em>
  2. Nested HTML structures
  3. Content with special characters
  4. Edge cases like very short content or content with exactly the limit length

This would help ensure the excerpt regeneration works reliably across all content types.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02f83cc and b25c7e8.

📒 Files selected for processing (2)
  • src/post/post.entity.unit.test.ts (3 hunks)
  • src/publishing/content.unit.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/post/post.entity.unit.test.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build, Test and Push
🔇 Additional comments (1)
src/publishing/content.unit.test.ts (1)

92-153: Good set of test cases for the new regenerateExcerpt functionality!

The test suite comprehensively covers various scenarios for the new excerpt generation functionality:

  • Handling content shorter than the limit
  • Properly truncating content that exceeds the limit
  • Ignoring HTML tags like <img>, <figcaption>, and <hr>
  • Special handling for <a> tags

This ensures the excerpt generation correctly implements the fix for members-only content leakage in excerpts.

@sagzy sagzy closed this Mar 26, 2025
@sagzy sagzy force-pushed the fix/excerpt-contains-gated-content branch from bca8aad to 046a26b Compare March 27, 2025 08:05
@sagzy sagzy merged commit 24f1811 into main Mar 27, 2025
6 checks passed
@sagzy sagzy deleted the fix/excerpt-contains-gated-content branch March 27, 2025 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant