Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance search operator #3502

Merged
merged 6 commits into from
Oct 30, 2018
Merged

Enhance search operator #3502

merged 6 commits into from
Oct 30, 2018

Conversation

Jermolene
Copy link
Member

It is proposed to enhance the search operator:

  • to permit an arbitrary list of fields to be specified
  • to add support for flags to control the type of search performed: the current token-based search, a literal search, a whitespace-tolerant search, and a regexp search

For example:

[search:caption,description[this]]
[search:caption,description:literal[this phrase]]
[search:caption,description:literal,casesensitive[this phrase]]
[search::literal,casesensitive[this phrase]]
[search::regexp,casesensitive[(bl|d)o(ggs|e)]]

The extended syntax for operator suffixes has been implemented in a manner that allows other filter operators to opt in to using it if they want to. We should review carefully whether the proposed format will be adequate for other operators that we may want to extend.

These changes are backwards compatible as long as existing wikis are not using the search operator with fields whose names contain colons or commas.

We could extend the advanced search box to allow the new search options to be used but I'm concerned that it might be overkill for most users while adding a lot to the core, and so might be best in a plugin.

and also searching all fields except nominated fields.
@xcazin
Copy link
Contributor

xcazin commented Oct 30, 2018

Hi @Jermolene,

Well, this proposal doesn't seem too controversial :-) I find it quite elegant myself, but I wish it could also produce several outputs when the search leads to several matches.

For instance, I'd like to obtain the following list FMCBlWgcrGc 390 out of the following string input:
https://www.youtube.com/watch?v=FMCBlWgcrGc&start=390. I could imagine that the removeprefix and removesuffix operators get extended to support a regexp flag and combine them into appropriate runs. Even nicer if I can get the string directly in the url field of the input tiddlers.

Still, in order to build a more ubiquitous YouTube dropper, I'd also need to take in account the multiple variants of a YouTube video url, depending on whether the host name is youtube.com or youtu.be, whether the video is part of a playlist like in https://www.youtube.com/watch?v=FMCBlWgcrGc&index=3&list=PLJXb5DyfMUF7UwlTeFqIuiCLz4wHanAdt&t=390, with or without a starting time, etc.

Such aim leads to monster regexps like https:\/\/(youtu\.be\/|www\.youtube\.com\/(watch\?(.*&)?v=|(embed|v)\/))([^\?&"'>]+)&t=([0-9]+)(.*) for which the prefix/suffix game is no fun anymore. On the other hand, if the search operator could retrieve every groups in one go, parsing with filters would become magic again. The issue then is that some groups are bound to be empty. How could I ask such a search operator for matches 5, 6 and 7 (see http://rubular.com/r/yDtKjZ56eR)?

@Jermolene
Copy link
Member Author

Jermolene commented Oct 30, 2018

Hi @xcazin

I'd like to obtain the following list FMCBlWgcrGc 390 out of the following string input:
https://www.youtube.com/watch?v=FMCBlWgcrGc&start=390

I think that this sort of thing needs a full-blown widget that sets variables to the results of a match:

<$match text={{$:/IncomingURL}} regexp="""https:\/\/(youtu\.be\/|www\.youtube\.com\/(watch\?(.*&)?v=|(embed|v)\/))([^\?&"'>]+)&t=([0-9]+)(.*)""" casesensitive="no" global="yes" multiline="yes">

Matching text: <$text text=<<match>>/>

Matching index: <$text text=<<index>>/>

Last index: <$text text=<<last-index>>/>

Capture groups:  <$text text=<<match-1>>/> <$text text=<<match-2>>/> etc.

</$match>

We might also support attributes for specifying different variable names for the various outputs (output-match="my-match-var").

@Marxsal
Copy link
Contributor

Marxsal commented Oct 30, 2018

Matching text: <$text text=<>/>

Matching index: <$text text=<>/>

Last index: <$text text=<>/>

Capture groups: <$text text=<>/> <$text text=<>/> etc.

</$match>

If you're doing that, shouldn't there be a <<match-list>>, that returns all groups? Otherwise you're forced to know in advance what matches (<<match-1>>, <<match-4>>, etc.) will be there.

@Jermolene
Copy link
Member Author

Hi @Marxsal

If you're doing that, shouldn't there be a <<match-list>>, that returns all groups? Otherwise you're forced to know in advance what matches (<<match-1>>, <<match-4>>, etc.) will be there.

Yes to the first, and I added a comment in
#2963 (comment) to the same effect. But the reason for having independent access to each match is so that you can use multiple capture groups to grab the different components of a URL, say.

jho1965us pushed a commit to jho1965us/TiddlyWiki5 that referenced this pull request Apr 3, 2019
* Enhance search operator

* Add support for searching all fields

and also searching all fields except nominated fields.

* Docs tweaks

Thanks @pmario

* Error message improvements

* Improve error message formatting
@twMat
Copy link
Contributor

twMat commented Mar 8, 2020

@Jermolene - first, thanks for your input on this matter. It seems a few of us don't fully understand the reasons why a widget is preferable over a filter op on this matter. For example, the snip you quote from @xcazin is:

I'd like to obtain the following list FMCBlWgcrGc 390 out of the following string input:
https://www.youtube.com/watch?v=FMCBlWgcrGc&start=390

...but this output is trivial to get - if using two separate filterings. Is that ("two") the problem? (BTW, with #4452 it'd be one). Or is your reasoning perhaps about @xcazin 's observation that:

Such aim leads to monster regexps like https://(youtu.be/|www.youtube.com/(watch?(.&)?v=|(embed|v)/))([^\?&"'>]+)&t=([0-9]+)(.) for which the prefix/suffix game is no fun anymore.

...but a widget would not simplify this aspect, or?

Also, other than the effort to construct it, would it be inappropriate with both filter ops and a widget? As I note in #4479 , string manipulation does "feel" like a filter operator matter.

Thanks.

@Jermolene
Copy link
Member Author

It's fair that regexp feels like a filter operator matter but at the moment the limitations of filter operators make that difficult. In particular, a filter operator is able to return a list of results but a regular expression operation yields a match string, a list of captured groups, and a numeric position for the match, all of which can be useful. We could jam all of that information into a single list but it the filter code to manipulate it would become very clumsy and inscrutable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants