Property talk:P5279
positions where a word can be hyphenated
Represents | syllabification (Q11994045) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Data type | String | ||||||||||||
Allowed values | [^‧](.*[^‧])? | ||||||||||||
Usage notes | use the symbol ‧ to indicate where the hyphen can be placed | ||||||||||||
Example | no label (L37298) → dåd‧en no label (L42982) → spæd‧barn‧et | ||||||||||||
Lists |
| ||||||||||||
Proposal discussion | Proposal discussion | ||||||||||||
Current uses |
| ||||||||||||
Search for values |
List of violations of this constraint: Database reports/Constraint violations/P5279#Scope, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P5279#Format, SPARQL
Hyphenation character
[edit]Did we ever get to a consensus regarding which character to use as an indicator for hyphenation before this property was created? — Finn Årup Nielsen (fnielsen) (talk) 16:38, 11 June 2018 (UTC)
If you want to change it later, Template:Autofix should work.
--- Jura 19:08, 12 June 2018 (UTC)- Actually, probably not yet. As I don't think Lexemes are supported.
--- Jura 18:34, 13 June 2018 (UTC)
- Actually, probably not yet. As I don't think Lexemes are supported.
I have used "‧" see Lexeme:L2955. It is a pain to type in. The vertical bar, "|" would be easier. That character is used, e.g., in Retskrivningsordbogen (Q3398246), see https://dsn.dk/?retskriv=penge. — Finn Årup Nielsen (fnielsen) (talk) 13:09, 14 June 2018 (UTC)
- I have used "•" see Lexeme:L2977, but I agree vertical bar would be easier. KaMan (talk) 11:40, 15 June 2018 (UTC)
- @Fnielsen, KaMan: IMHO, ‧ U+2027 HYPHENATION POINT is the best Unicode character to use. The problem of data entry could be solved with tooling, I think – perhaps a user script that replaces | with ‧ when typing values for hyphenation (P5279)? --Lucas Werkmeister (talk) 13:18, 19 June 2018 (UTC)
- perhaps a user script like this one? :) --Lucas Werkmeister (talk) 13:42, 19 June 2018 (UTC)
- @Lucas Werkmeister: generally this script works fine but sometimes it fails to replace last character in sequence. Second try helps. KaMan (talk) 10:01, 21 June 2018 (UTC)
- @KaMan: hm, I’m not sure why that would happen to be honest (and it seems to be working for me)… are there any errors in the browser console? --Lucas Werkmeister (talk) 12:35, 21 June 2018 (UTC)
- @Lucas Werkmeister:, No error in console. You can look at my try at shoulder (L3541). Chrome 67.0.3396.87. KaMan (talk) 12:48, 21 June 2018 (UTC)
- @KaMan: hm, I’m not sure why that would happen to be honest (and it seems to be working for me)… are there any errors in the browser console? --Lucas Werkmeister (talk) 12:35, 21 June 2018 (UTC)
- @Lucas Werkmeister: generally this script works fine but sometimes it fails to replace last character in sequence. Second try helps. KaMan (talk) 10:01, 21 June 2018 (UTC)
- perhaps a user script like this one? :) --Lucas Werkmeister (talk) 13:42, 19 June 2018 (UTC)
- @Fnielsen, KaMan: IMHO, ‧ U+2027 HYPHENATION POINT is the best Unicode character to use. The problem of data entry could be solved with tooling, I think – perhaps a user script that replaces | with ‧ when typing values for hyphenation (P5279)? --Lucas Werkmeister (talk) 13:18, 19 June 2018 (UTC)
Some German (Q188) words do not use "‧" (hyphenation character), but instead uses "·" (interpunct (Q1067693), as far as I can tell). I have taken the liberty to fix that for a few forms, see, e.g., [1]. — Finn Årup Nielsen (fnielsen) (talk) 17:44, 4 October 2024 (UTC)
- Watching the Flying Dehyphenator at https://ordia.toolforge.org/flying-dehyphenator/ there is actually quite a lot of languages that does use various characters: the dash, the fullstop and the interpunct. Shouldn't we converge on the hyphenation character? Finn Årup Nielsen (fnielsen) (talk) 21:36, 4 October 2024 (UTC)
if a word can't be hyphenated…
[edit]…because it has only one syllable like ask (L53). Should we set hyphenation (P5279) to no value or ask
(no hyphenation marks)? --Shisma (talk) 16:39, 15 September 2018 (UTC)
- @Shisma: Good question. I am perhaps leaning towards the use of novalue. I might be easier to validate and query. For instance, if you want to now whether the form can be hyphenated it is a question of whether the triple "?form wdt:P5279 ?hyphenation" is there or not. It seems to be easier than to query after the hyphenation literal and then examining whether there is a "‧" character. In terms of validation with ShEx it seems that we can do it with "a [ wdno:P5279 ] | ps:P5279 /.+‧.+/ ;" as I am currently doing at [2]. — Finn Årup Nielsen (fnielsen) (talk) 14:45, 7 June 2019 (UTC)
@Shisma:, @Fnielsen:: if there is no P5279, it usually means no-one has added it yet. We need to distinguish between that and non-hypenatable word. I am leaning towards adding the word as is. Also, I don't think there is that much value in searching for non-hypenatable words (if anything, people could search for words that have only one syllable). --Yurik (talk) 21:43, 11 August 2019 (UTC)
- @Shisma:, @Fnielsen:, @Yurik: does this mean that it is easier to search for words with only one syllable if we choose Yuriks suggestion (e.g. check the value for the hyphen dot, if none it is one syllable), if the value is null we don't know? It is less typing work for us to input no value but from what I understand this cannot be done via quickstatements (if it ever start supporting lexemes it would be nice to be able to add hyphenation through it also)--So9q (talk) 22:47, 25 November 2019 (UTC)
- any kind of "find me words with just a single syllable" (or any other specific count) is always going to be slow linear search - because you have to go through all the matching lexemes, and do a regex matching on them. If we want to optimize for that use case (which I doubt is very useful or common), we would store the number of syllables as a separate property/facet. As for typing -- typing should not be the priority when it sacrifices clarity/usability of the data. Quickstatements should be fixed/improved, and I'm sure we can work on it soon enough to get all the needed lexeme support into it. There is some rumored basic support of lexemes already added, hopefully there will be more soon enough. --Yurik (talk) 23:10, 25 November 2019 (UTC)
- I like the idea of being able to easily list lexeme forms according to number of syllables. I also doubt its usefulness, but hey I can't think of all the wonderful ways of using our data, so lets go ahead and add a property for indicating it. Will you do it? We could surely create some kind of game helping people to fill i this data. Kids love to clap words so we could make a word clap game that enables you to randomize the number of claps (maybe that is more fun/challenging to play than a game that does not know the number of claps needed for the form). We could even make a game that listens to the users microphone for claps and store the number as "number of syllables".--So9q (talk) 12:10, 26 November 2019 (UTC)
- We could also add a property of how long the word is. And how many vowels it has. And a flag if the word has more than one consonant one after another. But we shouldn't, unless there is a very direct and practical reason to add that data. Adding easily compute-able values to wikidata makes it into a giant unmanageable pile of junk rather than a useful human-curated data source. Lets limit to what is actually non-trivial data, and let code do the rest, but not store it together with something humans will be looking at. --Yurik (talk) 19:53, 11 December 2019 (UTC)
- I like the idea of being able to easily list lexeme forms according to number of syllables. I also doubt its usefulness, but hey I can't think of all the wonderful ways of using our data, so lets go ahead and add a property for indicating it. Will you do it? We could surely create some kind of game helping people to fill i this data. Kids love to clap words so we could make a word clap game that enables you to randomize the number of claps (maybe that is more fun/challenging to play than a game that does not know the number of claps needed for the form). We could even make a game that listens to the users microphone for claps and store the number as "number of syllables".--So9q (talk) 12:10, 26 November 2019 (UTC)
- any kind of "find me words with just a single syllable" (or any other specific count) is always going to be slow linear search - because you have to go through all the matching lexemes, and do a regex matching on them. If we want to optimize for that use case (which I doubt is very useful or common), we would store the number of syllables as a separate property/facet. As for typing -- typing should not be the priority when it sacrifices clarity/usability of the data. Quickstatements should be fixed/improved, and I'm sure we can work on it soon enough to get all the needed lexeme support into it. There is some rumored basic support of lexemes already added, hopefully there will be more soon enough. --Yurik (talk) 23:10, 25 November 2019 (UTC)
From SPARQL queries I see that I in Danish was the only one using wdno:P5279 (no value), except for the New Persian نان/נאן/нон/non (L226809) (and an English lexeme that I have changed). For other language no hyphenation is just written as with no center dot, see also https://synia.toolforge.org/#hyphenation — Finn Årup Nielsen (fnielsen) (talk) 09:35, 2 August 2024 (UTC)
hyphenation vs syllables
[edit]My understanding is that most of the time hyphenation can happen at the syllable boundaries, except that you can't hyphenate with just a single character being on one line (at least that's the Russian rules, and might be similar in English?). So should this property be storing just the breaking boundaries, or should it store syllables, and assume no-single letter rule is done by the data consumer?
P.S. what about words with dashes in them - should there be a separation symbol before and after the dash? --Yurik (talk) 20:56, 6 August 2019 (UTC)
- I suppose there might be different rules depending on the language. In Danish, there are even two main rules of hyphenation. Hyphenation in connection a dash would always (I think) be after the dash in Danish. In principle you can have one letter on one line in Danish, e.g., "æ-ble" [3]. In Danish, the syllable boundary does not necessarily fall together with the hyphenation (dependent on what you call a syllable), e.g., æb-le or "æ-ble" is ok [4]. — Finn Årup Nielsen (fnielsen) (talk) 20:14, 11 August 2019 (UTC)
- @Fnielsen: there might be different rules per language, but can we assume that no language needs to store both the hyphenation and the syllables as part of the word info? Or are they too different and both need to be stored? --Yurik (talk) 21:45, 11 August 2019 (UTC)
- Probably different. I suppose English has strange (to Russian reader) rules of hyphenation and they don't correspond to syllabification too. By the way, in Russian there can be several hyphenations in one word (ду-блет и дуб-лет) while syllables are more unique. --Infovarius (talk) 09:39, 13 August 2019 (UTC)
- Another example (yes, about 1 hanging letter): е·щё as for syllables vs. ещё as for hyphenation. So, the question is still on agenda: Is this property for hyphenation or syllabification? And how to express the other? --Infovarius (talk) 10:23, 5 September 2024 (UTC)
- @Fnielsen: there might be different rules per language, but can we assume that no language needs to store both the hyphenation and the syllables as part of the word info? Or are they too different and both need to be stored? --Yurik (talk) 21:45, 11 August 2019 (UTC)
Forms with dash
[edit]I wonder how Wikidata lexeme editors are handling the case where there is a dash in the form. Take the Danish lexeme e-mail (L235600). Here the form "e-mail" can be hyphenated at the dash. I have now indicated that with a hyphenation character "e-‧mail", but perhaps that should be done in another way? E.g., by just "e-mail"? I cannot recollect any Danish word where it cannot be hyphenated after a dash. — Finn Årup Nielsen (fnielsen) (talk) 17:41, 6 October 2024 (UTC)