Jump to content

Community Wishlist Survey 2017/Search

From Meta, a Wikimedia project coordination wiki
Search
8 proposals, 184 contributors



Blame tool

  • Problem: When I see suspicious text on a page, I should be able to find its creator and first revision. Some version control systems have a tool for it. Now I have to check history manually.
  • Who would benefit: Curators.
  • Proposed solution:
  • More comments: It would be good also to find the last revision for a text found in the article's history.
  • Phabricator tickets:

Discussion

[edit]
  • This already exists. MER-C (talk) 12:35, 9 November 2017 (UTC)[reply]
    • It does but it does not work very well :/ We actually removed the XTools version because it was even worse. If there is a rock-solid solution to this, it hasn't been implemented yet (to my knowledge). Either way I think this proposal is more fit for the "Search" category, so I am moving it there. Regards, MusikAnimal (WMF) (talk) 19:33, 10 November 2017 (UTC)[reply]
      • WikiWho is probably a good starting point for this. The Wiki Ed / P&E Dashboard uses it to show which parts of an article were contributed by which editor, and although it's not perfect, it's been really, really useful. There are some failure modes based on how it tokenizes things, for text that gets transformed gradually at sub-word level, and it just doesn't give results for a some kinds of markup (including some refs, and tables, and other harder-to-parse syntax). But in practice, it's got excellent accuracy. It's also still being actively developed, and has been improving quite a bit in recent months.--Ragesoss (talk) 17:05, 11 December 2017 (UTC)[reply]

Is this essentially phab:T2639? --Vachovec1 (talk) 22:37, 13 December 2017 (UTC)[reply]

Voting

[edit]

Preferences settings to modify crosswiki search results

  • Problem: This year, the crosswiki search results from other projects, like Wiktionary and Wikisource, are implemented throughout all language sites of Wikipedia. Reactions toward the implementation have varied. Before implementation, the en.WP community agreed in one discussion to suppress search results from Wikimedia Commons, Wikiversity, and Wikinews in en.WP. (Side note: Wikimedia Commons isn't fully suppressed; just the sidebar results as integrated results from Commons are still shown via including results from "File:" namespace, like this example.) (Another side note: Wikibooks was supposed to be suppressed, but a discussion resulting into "no consensus" led to... enabling Wikibooks in enWP, but that's a separate issue that can be raised at en.WP.)

    After implementation, some users wanted all sidebar search results from other projects suppressed. Consequently, the enWP community has been given per another discussion the "HideInterwikiSearchResults" gadget via Preferences settings to hide the results. Currently, 181+ editors use the gadget there.

    Edit: The team decided to sort Wiktionary as the top of the list of crosswiki results in all Wikipedia sites, while the team put Wikibooks at the very bottom in only English Wikipedia search results.

  • Who would benefit: Registered users throughout all projects (especially Wikipedia language sites) that have already implemented sidebar search results, including users who want to enable results from any project(s) and those wanting to disable results from any or all projects
  • Proposed solution: I propose two three solutions things as part of the solution:
    1. options to enable/opt-in or disable/opt-out a sidebar result from any individual project. For example, a individual user configure user settings to suppress results from one project but enable results from other projects.
    2. an option to disable/opt-out all sidebar search results from sister projects. (en.WP already has this option as a gadget)
    3. (if necessary) setting to sort the ordering of sidebar search results from sister projects.
  • More comments: Consensus at enWP regarding individual projects should be respected. In other words, if the proposal succeeds, Commons, Wikinews, and Wikiversity should be already opted-out by default setting, and signed-out users still won't see them. Therefore, for example, if a registered en.WP user wants to see images from Commons at the sidebar, the user must select an option to enable the image results via user settings.

    Almost forgot, user scripts to suppress crosswiki search results (i.e. sidebar search results from sister projects) are offered, especially for non-English Wikimedian users. IMHO, considering that more than 100+ en.WP users use the opting-out gadget, the custom user script would have risked creating/modifying bytes of 100+ user script subpages in en.WP.

Discussion

[edit]

Minor note: including commons results in File: namespace searches predates sidebar sister project results by some time: we implemented it wayyyyyy back in 2013. This isn't a comment on the appropriateness of it, whether they should be included or not. Just a bit of historical data that this was happening long before people noticed it as a result of sister project searches :) 😂 (talk) 00:00, 10 November 2017 (UTC)[reply]

Regarding the Wikibooks situation, the notes on the Phabricator ticket explained why Wikibooks was not suppressed. Basically, the developers read the first RfC discussion and thought that consensus to exclude Wikibooks was not strong at all, contrary to the closer's comments. After they decided to not exclude Wikibooks (including it by default), the second RfC was held. Due to a no consensus result on that second RfC, Wikibooks stayed included and the ticket was declined and closed. A new ticket was supposed to be opened to put Wikibooks at the bottom of the sidebar. Ca2james (talk) 20:25, 19 November 2017 (UTC)[reply]

It was already done, Ca2james (phab:T171803). George Ho (talk) 02:25, 20 November 2017 (UTC)[reply]

Voting

[edit]

Unlimited number of search results

  • Problem: Elasticsearch (Wikipedia search engine) has a hard-limit of 10,000 search results. This is to prevent DDoS attacks. However it means anyone wanting more than 10k will need to download a full Database Dump and use AWB or homegrown tools which is costly and slow. True for API:Search also.
  • Proposed solution: Deploy a solution so the Elasticsearch limit is lifted for trusted users/developers.
  • Who would benefit: Bot writers, anyone needing > 10k search results.

Discussion

[edit]

Probably a good idea to bundle this with the bot flag? Headbomb (talk) 00:33, 10 November 2017 (UTC)[reply]

Perhaps it would nice to elaborate on the use-cases here, as described in the phab ticket there are technical limitations that may be hard to circumvent with the current api parameters. For example: is ranking still important for such use cases? Would the api client be OK to maintain more states on its side to help the search engine? The main blocker here is that the search engine needs to hold in mem offset+size results on multiple machines. In short to make this happen we'll certainly have to drop some features or make a dedicated API endpoint with a limited set of features. It's why I suggest to discuss about the use-cases here so that we can evaluate the feasibility. Thanks! DCausse (WMF) (talk) 09:09, 10 November 2017 (UTC)[reply]

Hi User:DCausse (WMF) .. ok great thanks for exploring this more. In my experience the use case is only for generating a list of article titles, wherein the article body (or optionally title) contains the search string. No snippets, ranking etc.. just a list of titles. It's the same use case for AWB users who currently need to download the entire Dump and searches can take a long time. A dedicated API endpoint would be great, it can use the API offset maximum 500 per request or whatever. The search would ideally support regex via the insource:/<regex>/ syntax. -- GreenC (talk) 15:34, 11 November 2017 (UTC)[reply]
@GreenC: What specific use case do you have in mind for this? Ryan Kaldari (WMF) (talk) 00:21, 21 November 2017 (UTC)[reply]
@Ryan Kaldari (WMF):: Use case: A dedicated API endpoint that generates a list of article titles, wherein the article body (or optionally title) contains the given search string. This is a very common task for bot operators. For example, a bot that fixes articles with double 'the' ("the The Washington Post"). ie. search on regex /[Tt]he[ ][Tt]he/ and generate a list of article titles. This list is then fed to your bot or AWB to make the correction, so it knows which articles to target. -- GreenC (talk) 00:51, 21 November 2017 (UTC)[reply]
@GreenC: I'm not sure that having more than 10K results of 'the The' would be helpful. Wouldn't you want it in some sort of manageable amount of numbers to have the bot/person go in and correct and then run another query to show the next set of 10K issues? deb (talk) 19:56, 21 November 2017 (UTC)[reply]
@DTankersley (WMF): - The numbers are managed by the API endpoint which allow one to pull 500 results per request (defined in the API request). It doesn't work well (or at all) to build an application around an API that doesn't allow one to get the full search results for a number of reasons. For example depending on the complexity of the search and number of API calls, the full list needs to be unique'd before being processed by the bot to avoid processing an article multiple times. It also presents challenges to build an application that must do a full run every 10k titles before starting over to get a new set of articles. There are applications where it isn't making writes to the database but only reads in which case the 10k limit is a barrier. I could probably think of other reasons but these are all things I've experienced. -- GreenC (talk) 20:16, 21 November 2017 (UTC)[reply]
  • Bundle it with any advanced user level (bot, admin, template editor, edit filter manager, maybe file mover and page mover) since these already require a level of trust and competence. The ability to do this would be especially useful for those doing big piles of maintenance work.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  08:17, 4 December 2017 (UTC)[reply]

Voting

[edit]

Order search results by date of last edit or alphabetically

  • Problem: Search results in wikipedia are always sorted by relevance. It would be often very helpful, if results could be sorted by date of last edit or alphabetically by title.
  • Who would benefit: Everyone
  • Proposed solution:
  • More comments:

Discussion

[edit]

Can you provide an example or two where your proposed criteria would provide better results? I am constantly amazed that the search feature has no options. Adding options (like the Ti: Au: etc. options for searching back issues of a journal) would add so much functionality for users Downtowngal (talk) 23:26, 11 November 2017 (UTC)[reply]

  • Endorse this (original) proposal, it is so bad to not have any options of searching through fora archives and scripts, for example, without having an option to sort the list in any way. stjn[ru] 16:42, 12 November 2017 (UTC)[reply]
  • I too would like to better understand the uses cases for alphabetical results. Would the results be an alphabetical list by page title? As for date-related search features, you can add "prefer-recent:" to your search queries to add more weight to recently edited pages (example) (documentation). It is not perfect. CKoerner (WMF) (talk) 15:49, 13 November 2017 (UTC)[reply]
There are so many obvious usecases, that I really don’t know where to start... -Arch2all (talk) 17:42, 13 November 2017 (UTC)[reply]
Use cases are really important for determining how and / or what to build or adjust. For instance, if you search on 'butterfly' — do you really need them to be in alphabetical order? What use case would that solve? deb (talk) 01:41, 14 November 2017 (UTC)[reply]
"What to build or to adjust?": Making search results sortable by alphabet or last edit! I won’t post here an example for such a general task, because I don’t want to start a discussion about a specific usecase (and possible workarounds) --Arch2all (talk) 12:06, 14 November 2017 (UTC)[reply]
No-one will build what you want if you can not or will not discuss the rationale. Please add your use case(s). --Izno (talk) 13:56, 14 November 2017 (UTC)[reply]
This page is called "Community Wishlist Survey". I just posted my wishes and I don’t want to discuss or justify them. --Arch2all (talk) 20:01, 14 November 2017 (UTC)[reply]
Then you probably won’t even make it out of the proposal phase. The main page basically says that you need to have what we’re suggesting. Even if it does make it out of proposals, no-one will vote for it if they don’t understand why it is needed. --Izno (talk) 20:58, 14 November 2017 (UTC)[reply]
And even if it’s voted on, and somehow makes it into the top 10-20, then it still won’t be implemented. Are you being stubborn Just Because? Do you actually want this feature or are you attempting to waste a lot of other’s time? --Izno (talk) 21:01, 14 November 2017 (UTC)[reply]
Another bad thing on the current relevance based result scheme: Search results (from different dates) are difficult to compare, because relevance to a certain search keyword is a complex unstable thing. --Arch2all (talk) 17:51, 13 November 2017 (UTC)[reply]
  • That is partially already supported by mw:User:PerfektesChaos/js/resultListSort.
    In a nutshell:
    • A lot of special page results (actually 30 page types) may be ordered.
    • Search is one of them.
    • The search results which are sorted by best match when arriving may change order interactively (or even programmatically)
      • by page name (or title as significant)
      • by size
      • by last change
    • Listed sequence of the local page may be changed as often as wanted locally.
    • Naturally, this is limited to the case that all results of interest would be retrieved by a single page. If you want to walk through 20,000 pages in alphabetical order as a category list permits, the current server still delivers in best match order.
Greetings --PerfektesChaos (talk) 18:16, 15 November 2017 (UTC)[reply]
"PerfektesChaos sorgt für Ordnung": Thanks, for this Javascript tool. Indeed this helps to sort the search results as needed, but as You already mentioned, this is limited to results with max 500 entries. More results can't retrieved as a single page. And it should be obvious, that sorting results becomes really useful on large resultsets. A serverside solution is still necessary. --Arch2all (talk) 18:06, 20 November 2017 (UTC)[reply]
I had the same idea: Community Wishlist Survey 2017/Search/Search by date. Livenws (talk) 01:10, 21 November 2017 (UTC)[reply]
I would like to give an example of why I would like results sorted by date. If I need to recategorize a group of images from one photographer, I cannot find them by title. But if they were batch-uploaded by a bot, I can find them together by the date of upload. Downtowngal (talk) 01:58, 2 December 2017 (UTC)[reply]

Voting

[edit]

Search by date

  • Problem: The search-function in a Wikiproject does not allow to filter articles by date on creation or date when published.
  • Who would benefit: Wikinews-projects: news items are time-bound in contrast to other projects which are not.
  • Proposed solution: Make a filter, so users can select on which date/month/year they want to search
  • More comments:
  • Phabricator tickets:

Discussion

[edit]

Voting

[edit]
  • Problem: Currently, the "What links here" page shows a list of pages, where item has has buttons for "What links this page" and "Edit page". This list is ordered by page creation date.
I think that it should have more options to make it even more useful.
  • Who would benefit: Editors, and some experienced readers.
  • Proposed solution: For each page, the entry should show first edit date, last edit date, file size, number of editions, and additional buttons for "View history" and "Page information".
Also, the list should be sortable by pages name, file size, first edit date, last edit date, and number of editions.
  • More comments:
  • Phabricator tickets:

Discussion

[edit]

I note that, without database schema changes, fetching the first edit date and number of revisions would be expensive operations (i.e. they'd be slow and use a lot of server resources). Sorting by pretty much anything besides what it's currently sorted by would also be expensive, and is unlikely to be helped by any reasonable schema change. Anomie (talk) 14:33, 21 November 2017 (UTC)[reply]

Sorting can be done client-side. The server only needs to send the data with any default sorting. --186.48.82.146 17:42, 21 November 2017 (UTC)[reply]
People would consider it a bug that when they hit "sort by size" it sorts only the 200 already-shown entries instead of bringing in the larger entries from later in the default ordering. Anomie (talk) 14:55, 22 November 2017 (UTC)[reply]

Voting

[edit]

Improvements in category pages

  • Problem: Currently, a category page includes a list of pages, with no extra information. The list is ordered by category tag.
The subcategories have an arrow icon to show its subcategories. However, the supracategories don't have that option.
Also, the pages of the subcategories are not directly accessible.
I think that category pages should have more options to make them even more useful.
  • Who would benefit: Editors and readers alike.
  • Proposed solution:The page list should have a "Show pages of subcategories" option.
The page list should be sortable by both category tag and page name.
The page list should have an "More information" option. When active, each page in the list also show the first edit date, last edit date, file size, number of editions, and additional buttons for "Edit page", "View history" and "What links here". The list should be sortable by category tag, page name, file size, first edit date, last edit date, and number of editions.
The supracategories list should have the same format as the subcategories list, with an arrow icon to show its supracategories.
  • More comments:
  • Phabricator tickets:

Discussion

[edit]

Voting

[edit]

Search page: integrated "incategory" functionality

  • Problem: The search by keywords has some hidden filters. An example is "incategory", which gives results for pages that are included in that category.
For example,
incategory:"Australian male film actors" incategory:"1983 births"
returns a list of Australian male film actors born in 1983.
I think that the search page should have this filter integrated into the interface, to make it easier to find and use. Also, the filter should have more options to make it more powerful.
  • Who would benefit: Editors and readers alike.
  • Proposed solution: The search page should have a section with options to add and remove categories to the filter.
Also, each category in the filter list should have an option "include subcategories".
For example,
Canadian men's basketball players (don't include subcategories)
1980s births (include subcategories)
would return a list of Canadian men's basketball players born in the 1980s (who are in the included subcategories of "1980s births"), but not wheelchair basketball p[layers (who are in an excluded subcategory of "Canadian men's basketball players").
  • More comments:

Discussion

[edit]

I think this would be helped by the work being done on Advanced Search. deb (talk) 21:29, 21 November 2017 (UTC)[reply]

Once upon a time I made a gadget for this - https://commons.wikimedia.org/wiki/MediaWiki:Gadget-advanced-search.js . Bawolff (talk) 23:22, 28 November 2017 (UTC)[reply]

Voting

[edit]