User:Zar2gar1
Howdy!
Oh, There are User Templates
[edit]This user is a WikiHobbit. |
This user is a mathematician. |
Big list o' reorg ideas
[edit]Merger suggestions
[edit]Bureaucracy + Civil service (Society)
Trust (business) + Trust company + Corporate group + Holding company (Society)
Academic study of Western esotericism into Western esotericism (Society)
- Already discussed and seconded on the article talk-page
- The content can probably be pruned significantly before / after moving into the primary page
Classification + Classification (general theory) + Categorization (Science Basics)
- May be a tricky one, but all 3 have significant overlap, even if technically different
- Also see Classification of the sciences (Peirce) and the general Typology article (currently a disambig)
Primary alcohol into Alcohol (chemistry) (Chemistry)
- This is an easy one :-)
Primary carbon, Secondary carbon, Tertiary carbon, Quaternary carbon all into Carbon-carbon bond (Chemistry)
- Another easy one
Ice II + Ice III + etc. (Chemistry)
- Essentially all phases of ice except maybe Ice Ih
- Two possible approaches:
- Squeeze them into the current Phases section on the Ice article
- Split out the Phases section into its own article, then consolidate there
Aqueduct (water supply) + Navigable aqueduct + Aqueduct (bridge) + Canal (Tech)
- This will be a challenging one that takes some thinking through
- Some of these articles should definitely remain separate, but there's a lot of overlap across them
Assisted-opening knife into Switchblade (Tech)
- Technically slightly different, but the same idea, with a lot of redundant content
Manchester Baby and Manchester Mark I into Manchester computers (Tech)
- A borderline case so ask for consensus first at the article
- Definitely a lot of redundancy though, especially in the Background sections
MEMS + Micromachinery + Nanoelectromechanical systems (Tech)
- Even if keeping the micro- and nano-scale articles separate, there's a lot of redundancy
- Also check links on the MEMS article for others that could potentially be absorbed
Solid geometry into Three-dimensional space (Math)
Symmetry + Symmetry (geometry) (Math)
- At first glance, makes sense they're separate
- A close reading shows the Symmetry article is still overwhelmingly mathematical though
Engineering optimization into Design optimization (Applied science)
Engineering studies into Science and technology studies (Applied science)
Solar fuel into power-to-X (Tech)
Engineering research into Applied science#Applied research (Applied science)
New article ideas (or expand in existing articles)
[edit]Benediction sign (Religion)
- Noticed no direct explanation of benedictio latina & benedictio graeca on English wiki
- Especially relevant to art history
- Can migrate over article from French wikipedia: fr:Signe de bénédiction
- Will need to disambiguate with:
- Hand of benediction (medical condition with related etymology)
- Benediction (a full ritual, not just the gesture)
- Related concepts include Mudra and (parts of) Priestly Blessing
- Include intro sections and out-link to main articles?
Event notification (Tech)
- Currently just a redirect to a (pretty generic) subsection on Event (computing)
- Noticed while working on Reactor pattern that there's no detailed discussion of the mechanism
- Many specific instances have their own articles:
- Select (Unix), kqueue, epoll, IOCP
- libevent is also related, though it's more of an abstraction layer
Improve separation of related articles
[edit]Gossypium vs. Cotton (product)
- Already split, but could improve cross-links and hatnotes
- Also check for redundancy between articles
Sugar vs. Sugarcane vs. Sugar (chemistry) vs. chemical types like Sucrose vs. product classes like Brown sugar
- Another product vs. source one, but this one gets messy really quick
- Current suggested course of action
- Add Sugarcane to hatnote on main article and disambiguation page
- Move Sugar (chemistry from current redirect (Carbohydrate) to its own page
- Migrate very specific details from Carbohydrate to Sugar (chemistry)
- Migrate chemical details from main article to Sugar (chemistry)
- Consolidate / re-orient specific types like Sucrose to Sugar (chemistry)
- Consolidate / re-orient specific product classes towards the main article
- Consolidate cultivation parts of production from main article onto specific source crops
Palaquium gutta vs. Gutta-percha
- One more product vs. source one
- Improve hatnotes and cross-links, then consolidate redundancies
- Possibly add disambiguation page?
Western esotericism vs. Exoteric
- Already discussed this some on talk for Western esotericism
- Essentially, eso- and exo-teric have two historically related but distinct contexts:
- In the loose sense, esoteric doctrines vs. more mainstream ideas
- More technical in philosophical scholarship, when a philosopher's works are believed to be written for select students vs. a general audience
- Consensus seems to be for the following course of action:
- Rename Exoteric to Esoteric and exoteric
- Move content on the scholarly context from Western esotericism to the new page
Surface vs. Surface (mathematics) vs. Surface (topology) et al.
- Need to discuss and get consensus; no clear course of action yet but consider the following
- Migrate out details from Surface (mathematics) to more specific articles, such as:
- Algebraic surface, Coordinate surfaces, and Solid geometry
- This has already been done to an extent for Surface (topology)
- Migrate out generalities from Surface (mathematics) to the main page
- Re-evaluate Surface (mathematics) page
- Make a redirect to the main article section if minor enough at that point
- Re-evaluate specific articles for further consolidation with each other
- E.g. Coordinate surfaces vs. Solid geometry
Simple template & module ideas
[edit]Here are a few ideas I've had that maybe I'll get around to someday. Unless someone else wants to beat me to the punch:
Improve VA link template
[edit]Template:VA link is used a lot on the VA discussion pages, and people seem pretty fond of putting it in the header. However, this results in unstable section anchors. How about...
Update the underlying module at Module:Vital article to accept a dummy control flag in the VA link function, but still default to false
Make the dummy flag functional to inject a plaintext marker ("VA §") instead of the VA bullseye icon
Create a second VA link template, with safesubst, to invoke the module with the plaintext option
- This should minimize any disruption to using the current template while the new one gains traction
Create a custom user.js widget to replace the plaintext marker with the icon in browser
- Have it filter on namespace too, especially in the off-chance of collisions in articles
Update the template & module docs to indicate usage
Report the new template on the VA talk page and update VA instructions to indicate usage
Systemize industrial infoboxes
[edit]We have infoboxes for |products, companies, and even industrial processes.
However, there doesn't seem to be a clean schema connecting them together, and there actually aren't more general infoboxes for industries and technologies (the link Template:Infobox technology is actually just a redirect for industrial processes)...
Create a general industry infobox
Create a general technology infobox
Refactor the existing infoboxes a bit
Seed 10 articles with the industry infobox
Seed 10 articles with the technology infobox
Update 10 articles each with the other refactored infoboxes
Preliminary research: VA code and data
[edit]The VA project is especially starting to pick up at level 5, which is at a whole different scale. Cewbot already does a lot, but I'm interested in trying something new and maybe taking up a bit of the load:
Preliminary research and planning
- Want to try doing an initial version in Lua even though it doesn't have a bot framework
- Shouldn't be too bad though if I keep the logic clean and Mediawiki API calls simple
- Can always fall back onto Python / PyWikibot if necessary
Check I won't be stepping on Cewbot's toes
- Spoke with Kanashimi who said a 2nd bot would be good
- Cewbot's code is available if I decide to fall back onto JS and reuse it
Settle on vitality metrics & figure out sources
- Quarry is good for basic queries & testing
- However, the DB replicas impose a lot of limits
- Particularly in regards to views & indices (and therefore potential joins)
- By creating a user DB on ToolDB, one can have much fuller control
- Many items will need to be pulled from content though
- Probably via the Wikimedia API
Vitality metrics
[edit]After playing with Quarry some, I've determined I probably will need to create a user DB on ToolDB. However, the table-based metrics should still be easier to gather than the Mediawiki API ones to start:
Task #1: Compile DB vitality metrics
[edit]Get account setup on ToolDB
Get enwiki_p as a user clone
Configure all tables, views, & queries
Collate the following result set for VA articles only:
Metric | Frame | Expected dynamics | Breakout? | Other comments | Implementation status |
---|---|---|---|---|---|
Creation date | Historical | Stable | See Lindy effect | ||
Last revision date | Current | Unstable | Primarily to filter out stale articles | ||
Edit density | Moving average (MA) | Cyclical and fluid | 3, 12, & 36 month MAs | ||
Languages | Current | Sticky | |||
Interwikis | Current | Sticky | |||
Wikilinks | Current | Sticky | In-, out-, total, and ratio | Article namespace only |
Task #2: Create Mark I model
[edit]It may not be pretty, but I'll probably just download the results and load them into a spreadsheet to start.
Then I'll try building up a few models. The key points to keep in mind:
- Try each factor twice, one raw and another logarithmic (may follow a power law)
- Set the objective to the VA level, viewed as a log (VA5 is 1 point, VA4 is log_10(5), VA3 is log_10(50) ...)
- Don't forget to randomly assign VA datapoints to training & validation sets
- Get effect size estimates too (use ANOVA if the lin-reg solver doesn't return)
Thoroughly discuss results and share with WP:VA
After discussion and comments, save model as 1st baseline
Task #3: Generate Mark I recommendation
[edit]Implement model in code (using my VA bot?)
Gather metrics for all articles
Generate & publish list of likely vital articles
Task #4: Integrate pageview data
[edit]It's often cited (along with interwikis) in proposals so it will be really interesting to see how strong a correlation it is:
Gather page-view data for all VA articles only
- Use the Wikimedia Analytics API
Retrain and re-validate model; discuss results
Take baseline as model Mark II; generate & publish new recommendations
Task #5: Integrate page data from XTools
[edit]Gather other metrics from pages or XTools (will likely require a bot):
Metric | Frame | Expected dynamics | Breakout? | Other comments | Implementation status |
---|---|---|---|---|---|
Wikiproject priorities | Current | Stable | Tally by rank | ||
Prose size | Current | Sticky | May be symmetric, follow a normal distribution? | ||
Assessment | Current | Stable | Be careful, could be particularly circular | ||
Watcher count | Current | Sticky | Redacted < 30, adjust down |
Retrain and re-validate model; discuss results
Take baseline as model Mark III; generate & publish new recommendations
Task #6: Integrate page data from Wikimedia REST API
[edit]Gather other metrics from the REST API and scanning content (will definitely require a bot):
Metric | Frame | Expected dynamics | Breakout? | Other comments | Implementation status |
---|---|---|---|---|---|
Citation density | Current | Stable | Seems promising, but details need some thought | ||
Infobox presence | Current | Stable | Tally several with cap? | ||
Media file density | Current | Stable | By file type? |
Retrain and re-validate model; discuss results
Take baseline as model Mark IV; generate & publish new recommendations
Task #7: Automate recommendation sets
[edit]Should actually be pretty straight-forward, especially if the model is already coded.
Task #8: Collate historical list size data
[edit]This was a request on the VA talk pages, may be more insightful for Lv 4 and 5 subpages. This should probably get its own bot too. Obviously a pretty heavy lift so won't be implemented anytime soon
Grab more recent counts from edit-descriptions
- Probably the simplest strategy going back as far as Cewbot documents the section count
- Obviously, won't be 100% accurate for all times (e.g. if Cewbot was down for maintenance)
Export data dump somewhere
- This may make more sense as a table or page under WP:VA
- The data should mostly (barring corrections) be append-only
Include moving-average calculations in data dump
(Wishlist) Data-mine actual page-versions prior to Cewbot
- This could get tedious so probably won't implement anytime soon
VA bot plans
[edit]While it will probably intertwine with my work on the vitality estimator, I'd also like to whip up a more vanilla bot to further automate things at the VA lists.
To start, I think I'm just going to consume the json files gathered by Cewbot at Wikipedia:Vital articles/data. Eventually though, I'd like to help Kanashimi out some, and maybe my bot can handle some overlapping functions with Cewbot as a fallback. It could just audit by default, then actively edit only after it notices Cewbot has gone MIA for a few days.
Task #1: Create skeletal bot
[edit]Start proposal process for new bot
Create a bot account
Create skeletal bot (in Lua for kicks?) to perform actions
- Can always fall back to Python if it's too much work
Perform some allowed test runs on sandbox to ensure I can read & edit
Task #2: Automate updates to VA5 table
[edit]Create a new quota subpage (as a single source of truth)
Add wikitable formatting to the bot (if needed)
Write up actual collating logic and test in sandbox
Quick improvement pass on wikitable layout
- For example, supercategories should be genuine roll-up lines, not detachable (e.g. when sorting)
Start running on VA5 page
Update VA5 instructions to note table is automated
Rollout to VA4 page too
Task #3: Audit and sub-in for all counters
[edit]Gather list of all counters in VA project
Implement counting logic
Provide audit report (see database reports like Cewbot)
Check with Kanashimi and allow editing for miscounts older than 72 hrs
Task #4: Add supplemental list quality checks
[edit]Flag duplicates within a single level
- Cewbot already does this too
Detect category crossovers between levels
- For example, if Petroleum is in Chemistry at one level and Tech at another
Auto-resolve redirects
- Cewbot may already do this
Flag other non-article types (lists, disambig, etc.)