-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed members-only content being leaked in excerpts #434
Conversation
closes https://linear.app/ghost/issue/AP-943/members-only-content-is-leaked-in-excerpt - when a post visibility is set to members-only/paid-members-only/specific-tiers, we remove the gated content before federation - however, the post excerpt still exposed gated content. This is now fixed, by re-generating the excerpt from the public content
WalkthroughThe changes introduce a new property Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (5)
⏰ Context from checks skipped due to timeout of 90000ms (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/publishing/content.ts (2)
45-47
: Parameter name mismatch between static and instance methods.The static method uses
wordLimit
while the instance method usescharLimit
. This creates confusion about the actual functionality, as one suggests word-based truncation while the actual implementation is character-based.-static generateExcerpt(html: string, wordLimit = 50) { - return ContentPreparer.instance.generateExcerpt(html, wordLimit); +static generateExcerpt(html: string, charLimit = 500) { + return ContentPreparer.instance.generateExcerpt(html, charLimit);
85-97
: Consider HTML-aware truncation for the excerpt generation.The current implementation does a simple character-based truncation without considering HTML structure. This could potentially result in malformed HTML if truncation happens in the middle of an HTML tag.
Consider using an HTML-aware approach that preserves the integrity of HTML tags in the excerpt. You could:
- Use a DOM parser to properly handle HTML structure
- Ensure truncation happens at word or sentence boundaries
- Properly close any open HTML tags after truncation
import { parseHTML } from 'some-html-parser-library'; generateExcerpt(content: string, charLimit = 500) { if (content.length <= charLimit) { return content; } // Simple implementation that at least truncates at word boundaries let truncated = content.substring(0, charLimit - 3); // Find the last space to avoid cutting words in half const lastSpace = truncated.lastIndexOf(' '); if (lastSpace > 0) { truncated = truncated.substring(0, lastSpace); } return `${truncated}...`; }For a more robust solution, consider using a dedicated HTML parsing library that can properly handle HTML tag closure.
src/publishing/content.unit.test.ts (1)
92-110
: Good basic test coverage for excerpt generation.The tests cover the basic functionality of the
generateExcerpt
method, ensuring it returns the full content when under the limit and properly truncates with an ellipsis when over the limit.Consider adding tests for additional edge cases:
- Empty content
- Content exactly at the character limit
- Content with HTML tags to ensure proper handling of HTML structure
Example additional test:
it('handles content with HTML tags appropriately', () => { const content = '<p>This is a paragraph with <strong>bold text</strong> that should be handled properly</p>'; const result = preparer.generateExcerpt(content, 30); // Verify HTML is preserved correctly in truncation expect(result).toContain('...'); expect(result.length).toEqual(30); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/post/post.entity.ts
(2 hunks)src/publishing/content.ts
(2 hunks)src/publishing/content.unit.test.ts
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/post/post.entity.ts (1)
src/publishing/content.ts (1)
ContentPreparer
(30-150)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build, Test and Push
🔇 Additional comments (3)
src/post/post.entity.ts (3)
221-221
: Good initialization of the excerpt variable.This initialization preserves the original excerpt value from the Ghost post, maintaining backward compatibility for public posts while allowing for customization in non-public posts.
227-228
: Excellent fix for the member content leakage issue.This change addresses the core issue by regenerating the excerpt from the already-sanitized content when dealing with non-public posts. This ensures that any members-only content removed from the main content is also not present in the excerpt.
245-245
: Proper usage of the processed excerpt in the Post constructor.The change correctly uses the potentially modified excerpt variable instead of directly using ghostPost.excerpt, completing the fix for preventing members-only content leakage in excerpts.
6aa0de4
to
10abc2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
yarn.lock
is excluded by!**/yarn.lock
,!**/*.lock
📒 Files selected for processing (5)
package.json
(2 hunks)src/post/post.entity.ts
(2 hunks)src/post/post.entity.unit.test.ts
(2 hunks)src/publishing/content.ts
(3 hunks)src/publishing/content.unit.test.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/post/post.entity.ts
🧰 Additional context used
🪛 GitHub Actions: CI
src/publishing/content.ts
[error] 1-2: Import statements differs from the output.
🔇 Additional comments (6)
src/post/post.entity.unit.test.ts (1)
181-201
: Test now correctly verifies both content and excerpt filtering.The test description and implementation have been appropriately updated to verify that both the content and excerpt in private posts are properly filtered. This ensures that members-only content isn't leaked in excerpts, which aligns with the PR's objective.
package.json (2)
34-36
: LGTM!Simple formatting change that doesn't affect functionality.
41-41
: Added appropriate dependencies for HTML-to-text conversion.The added dependencies (
html-to-text
and its type definitions) are necessary for the new excerpt regeneration functionality.Also applies to: 73-73
src/publishing/content.ts (2)
45-47
: LGTM!Clean implementation of the static method that delegates to the instance method.
85-113
: Well-implemented excerpt regeneration.The implementation effectively converts HTML to text while:
- Skipping irrelevant elements (images, footnotes, etc.)
- Preserving important formatting
- Properly handling truncation with ellipsis
This directly addresses the PR's objective of ensuring members-only content isn't leaked in excerpts.
src/publishing/content.unit.test.ts (1)
92-132
: Comprehensive test coverage for the new functionality.The test suite thoroughly verifies the
regenerateExcerpt
method's behavior:
- Correctly handles content shorter than the limit
- Properly truncates content that exceeds the limit
- Ignores image tags as expected
- Correctly processes link elements
These tests ensure the method works correctly in various scenarios and maintain the fix for the members-only content leakage issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/publishing/content.unit.test.ts (3)
122-131
: Consider standardizing newline handling in excerpts.The test expects newlines (
\n\n
) between the link text and paragraph content. This seems inconsistent with how other HTML elements are handled in the other tests. Consider whether this behavior is intentional, as different newline handling between element types might create inconsistent excerpt formatting.
144-152
: Add length assertion for consistency.For consistency with the other test cases, consider adding an assertion to verify the result length is equal to 48 characters:
expect(result).toEqual( 'I expect content to be truncated exactly here...', ); +expect(result.length).toEqual(48);
92-153
: Consider adding tests for additional HTML elements and nested structures.The current tests cover basic HTML elements like
<img>
,<a>
,<figcaption>
, and<hr>
. To ensure comprehensive coverage, consider adding tests for:
- Common elements like
<div>
,<span>
,<strong>
,<em>
- Nested HTML structures
- Content with special characters
- Edge cases like very short content or content with exactly the limit length
This would help ensure the excerpt regeneration works reliably across all content types.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/post/post.entity.unit.test.ts
(3 hunks)src/publishing/content.unit.test.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/post/post.entity.unit.test.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build, Test and Push
🔇 Additional comments (1)
src/publishing/content.unit.test.ts (1)
92-153
: Good set of test cases for the new regenerateExcerpt functionality!The test suite comprehensively covers various scenarios for the new excerpt generation functionality:
- Handling content shorter than the limit
- Properly truncating content that exceeds the limit
- Ignoring HTML tags like
<img>
,<figcaption>
, and<hr>
- Special handling for
<a>
tagsThis ensures the excerpt generation correctly implements the fix for members-only content leakage in excerpts.
bca8aad
to
046a26b
Compare
closes https://linear.app/ghost/issue/AP-943/members-only-content-is-leaked-in-excerpt