2992: Replace normalize-strings #3108

hannaseithe · 2025-02-19T11:55:29Z

Short Description

The dependency on the library 'normalize-string' has been replaced by the ES6 String.normalize() method. This has been implemented following the approach of digitalfabrik/entitlementcard/#1782

Proposed Changes

the dependency on the 'normalize-string' library was deleted
implementation of normalizeString follows the implementation of digitalfabrik/entitlementcard/#1782
a new file normalizeString.ts was created from where const normalizeString (and default) is exported (also exported from shared) + export from search.ts was deleted
tests were moved to seperate file

Side Effects

any functionality that relies on string normalization, especially search

Testing

check if search functionality still works:
- "munchen" finds "München"
- " test " finds "Test"
- "straße" finds "Straße"
- "strasse" finds "Straße"

Resolved Issues

Fixes: #2992

steffenkleinle · 2025-02-19T12:33:04Z

I think ß should definitely continue to work 👍 Could you try to fix that such that ß just gets normalized as well?

hannaseithe · 2025-02-19T13:28:08Z

#2992 (comment)

[Continuing the discussion here]
@hauf-toni Technically this would not be a problem, I can just make an exception for ß. Its not due to the actual normalization but because we filter out certain characters now beforehand

hannaseithe · 2025-02-24T08:54:11Z

#2992 (comment)

[Continuing the discussion here] @hauf-toni Technically this would not be a problem, I can just make an exception for ß. Its not due to the actual normalization but because we filter out certain characters now beforehand

So I have done some more research into the Eszett (ß) situation. This is what I have come up with.

Before the PR

ß was not filtered out
what finds "Straße"
- straße (all 6 letters marked bold)
what doesn't find "Straße"
- strasse/strae/strase

After the change as also implimented in EC

ß filtered out
what finds "Straße"
- straße/strae (only the five letters Straß are bold, e is normal)
what doesn't find "Straße"
- strasse/strase

After adapting the EC changes to not filter out ß (current commits), but still use string.normalize()

as in 1.)

My suggestion: Option to replace ß bei ss (internally only)

ß not filtered but replaced by 'ss'
what would find "Straße" (all six letters get bolded)
- straße/strasse
what would not find "Straße"
- strae/strase

PS.: string.normalize() does not replace ß with ss in any version of normalization (NFC, NFD, NFKC, NFKD), but this can be done by hand

steffenkleinle · 2025-02-24T09:07:12Z

#2992 (comment)
[Continuing the discussion here] @hauf-toni Technically this would not be a problem, I can just make an exception for ß. Its not due to the actual normalization but because we filter out certain characters now beforehand

So I have done some more research into the Eszett (ß) situation. This is what I have come up with.
1. Before the PR


* ß was not filtered out

* what finds "Straße"
  
  * straße  (all 6 letters marked bold)

* what doesn't find "Straße"
  
  * strasse/strae/strase


2. After the change as also implimented in EC


* ß filtered out

* what finds "Straße"
  
  * straße/strae (only the five letters Straß are bold, e is normal)

* what doesn't find "Straße"
  
  * strasse/strase


3. After adapting the EC changes to not filter out ß (current commits), but still use string.normalize()


* as in 1.)


4. **My suggestion**: Option to replace ß bei ss (internally only)


* ß not filtered but replaced by 'ss'

* what would find "Straße" (all six letters get bolded)
  
  * straße/strasse

* what would not find "Straße"
  
  * strae/strase
PS.: string.normalize() does not replace ß with ss in any version of normalization (NFC, NFD, NFKC, NFKD), but this can be done by hand

Sounds great, I like your suggestion. Lets go with that 🔍

steffenkleinle · 2025-02-28T13:36:19Z

@hannaseithe do you want to create an issue in the entitlementcard repository for this problem and with your findings? Thanks :)

steffenkleinle · 2025-03-10T09:16:57Z

I think a solution making use of the default findChunk implementation would be more elegant/less error prone:

import { findChunks } from 'highlight-words-core'

const findNormalizedChunks = (props: FindChunks): Chunk[] => {
  const chunks: Chunk[] = findChunks(props)
  return chunks.map(chunk => {
    const match = props.textToHighlight.slice(chunk.start, chunk.end)
    const sanitationAdditionalChars = match.split('ß').length - 1
    return { start: chunk.start, end: chunk.end - sanitationAdditionalChars }
  })
}

Also, the current solution crashes for me when entering ).

… reusable

steffenkleinle · 2025-03-11T09:44:34Z

native/package.json

@@ -79,7 +79,6 @@
    "react-native-calendars": "^1.1306.0",
    "react-native-gesture-handler": "^2.19.0",
    "react-native-get-random-values": "^1.11.0",
-    "react-native-highlight-words": "^1.0.1",


This is unmaintained and does not support custom findChunks method, so I wrote a custom Highlighter component

…th custom component

bahaaTuffaha

Great work ✨
Unfortunately there is one issue left.. if you searched in Arabic (after you switch the language) it will return the whole list due to replacing non-Ascii letters and Arabic uses non ascii as far as I know.

You can test it by typing اللغة

steffenkleinle · 2025-03-28T08:25:27Z

Great work ✨ Unfortunately there is one issue left.. if you searched in Arabic (after you switch the language) it will return the whole list due to replacing non-Ascii letters and Arabic uses non ascii as far as I know.

You can test it by typing اللغة

Very good catch, thanks for testing thoroughly. Should be fixed now, could you please check again?

bahaaTuffaha · 2025-03-28T12:35:19Z

Very good catch, thanks for testing thoroughly. Should be fixed now, could you please check again?

"strasse" now doesn't show same results as "straße" maybe writing a condition to check the language whether it's Deutsch?

steffenkleinle · 2025-03-28T13:09:52Z

Very good catch, thanks for testing thoroughly. Should be fixed now, could you please check again?

"strasse" now doesn't show same results as "straße" maybe writing a condition to check the language whether it's Deutsch?

Yes. I think that check is not necessary, this normalization should always be done imo, no matter the user language.

bahaaTuffaha · 2025-03-28T13:18:09Z

Very good catch, thanks for testing thoroughly. Should be fixed now, could you please check again?

"strasse" now doesn't show same results as "straße" maybe writing a condition to check the language whether it's Deutsch?

Yes. I think that check is not necessary, this normalization should always be done imo, no matter the user language.

Keep normalization but change the regex for Deutsch to nonAsciiRegex = /[^\x00-\x7F\xDF]/g and the rest nonAsciiRegex = /[^\p{Letter}|\p{Number}]/gu .

steffenkleinle · 2025-03-28T13:34:57Z

Keep normalization but change the regex for Deutsch to nonAsciiRegex = /[^\x00-\x7F\xDF]/g and the rest nonAsciiRegex = /[^\p{Letter}|\p{Number}]/gu .

Why? What benefit does that serve? In what usecase would we not want to use the same regex?

bahaaTuffaha · 2025-03-28T13:53:32Z

Keep normalization but change the regex for Deutsch to nonAsciiRegex = /[^\x00-\x7F\xDF]/g and the rest nonAsciiRegex = /[^\p{Letter}|\p{Number}]/gu .

Why? What benefit does that serve? In what usecase would we not want to use the same regex?

My bad I didn't pull your latest changes. ("strasse" works as expected)

It's not highlighted completely also noticed some words are highlighted but not relevant to the search.

hannaseithe self-assigned this Feb 19, 2025

hannaseithe marked this pull request as ready for review February 19, 2025 14:15

hannaseithe requested review from steffenkleinle, f1sh1918, LeandraH, bahaaTuffaha and lunars97 as code owners February 19, 2025 14:15

hannaseithe marked this pull request as draft February 19, 2025 14:49

steffenkleinle changed the title ~~2992: Replaced dependency 'normalize-string' with string.normalize()~~ 2992: Replac normalize-strings Mar 6, 2025

steffenkleinle changed the title ~~2992: Replac normalize-strings~~ 2992: Replace normalize-strings Mar 6, 2025

hannaseithe added 3 commits March 10, 2025 13:32

2992: Replaced dependency 'normalize-string' with string.normalize()

ed4cec7

2992: Add exception for eszett character

c470e00

2992 Add findMatchingSelection

fa6db2d

steffenkleinle force-pushed the 2992--Replace-normalize-strings-dependency branch from bb034a6 to fa6db2d Compare March 10, 2025 12:32

hannaseithe removed their assignment Mar 10, 2025

2992: Reuse findChunks from highlight-words-core and make Highlighter…

32a6ac5

… reusable

steffenkleinle reviewed Mar 11, 2025

View reviewed changes

steffenkleinle force-pushed the 2992--Replace-normalize-strings-dependency branch from 535f3a4 to cb51398 Compare March 11, 2025 09:51

steffenkleinle added 2 commits March 11, 2025 10:57

2992: Remove unmaintained react-native-highlight-words and replace wi…

87a7c02

…th custom component

2992: Normalize search query and terms

49a5832

steffenkleinle force-pushed the 2992--Replace-normalize-strings-dependency branch from cb51398 to 49a5832 Compare March 11, 2025 09:57

steffenkleinle marked this pull request as ready for review March 19, 2025 07:47

bahaaTuffaha requested changes Mar 27, 2025

View reviewed changes

2992: Use unicode character classes to keep unicode letters and numbers

9661cc4

steffenkleinle requested a review from bahaaTuffaha March 28, 2025 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2992: Replace normalize-strings #3108

2992: Replace normalize-strings #3108

hannaseithe commented Feb 19, 2025 •

edited

Loading

steffenkleinle commented Feb 19, 2025

hannaseithe commented Feb 19, 2025 •

edited

Loading

hannaseithe commented Feb 24, 2025 •

edited

Loading

steffenkleinle commented Feb 24, 2025

steffenkleinle commented Feb 28, 2025

steffenkleinle commented Mar 10, 2025 •

edited

Loading

steffenkleinle Mar 11, 2025 •

edited

Loading

bahaaTuffaha left a comment

steffenkleinle commented Mar 28, 2025

bahaaTuffaha commented Mar 28, 2025 •

edited

Loading

steffenkleinle commented Mar 28, 2025 •

edited

Loading

bahaaTuffaha commented Mar 28, 2025

steffenkleinle commented Mar 28, 2025

bahaaTuffaha commented Mar 28, 2025

2992: Replace normalize-strings #3108

Are you sure you want to change the base?

2992: Replace normalize-strings #3108

Conversation

hannaseithe commented Feb 19, 2025 • edited Loading

Short Description

Proposed Changes

Side Effects

Testing

Resolved Issues

steffenkleinle commented Feb 19, 2025

hannaseithe commented Feb 19, 2025 • edited Loading

hannaseithe commented Feb 24, 2025 • edited Loading

steffenkleinle commented Feb 24, 2025

steffenkleinle commented Feb 28, 2025

steffenkleinle commented Mar 10, 2025 • edited Loading

steffenkleinle Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

bahaaTuffaha left a comment

Choose a reason for hiding this comment

steffenkleinle commented Mar 28, 2025

bahaaTuffaha commented Mar 28, 2025 • edited Loading

steffenkleinle commented Mar 28, 2025 • edited Loading

bahaaTuffaha commented Mar 28, 2025

steffenkleinle commented Mar 28, 2025

bahaaTuffaha commented Mar 28, 2025

hannaseithe commented Feb 19, 2025 •

edited

Loading

hannaseithe commented Feb 19, 2025 •

edited

Loading

hannaseithe commented Feb 24, 2025 •

edited

Loading

steffenkleinle commented Mar 10, 2025 •

edited

Loading

steffenkleinle Mar 11, 2025 •

edited

Loading

bahaaTuffaha commented Mar 28, 2025 •

edited

Loading

steffenkleinle commented Mar 28, 2025 •

edited

Loading