Home

Works and Editions

Leave a comment

Title page from the 1911 edition of Treasure Island

Just as I’ve been getting back up to speed with Wikisource following an internet-less Christmas, Wikisource has begun to be integrated with Wikidata.  At the moment, this just means interwiki links and new Wikidata pages but, naturally, some small problems have already occurred.  The most significant I’ve seen is the question of how to handle the mainspace.

Wikisource is the third of the Wikimedia sorority to be supported by Wikidata, after big sister Wikipedia and little, adopted sister Wikivoyage (or possibly fourth if we count the seemingly partial support of Commons).  Wikisource is different from these projects because, while the others will usually just have one page for each item, Wikisource can host multiple editions of the same work, each requiring a separate but linked data item.  (Then there is the subject of subpages but we’ll leave that for now.)

The books task force on Wikidata had already come up with a system to implement this: two separate classes of item, a “work” item to cover the text in general and one or more “edition” items to cover individual instances of that text.  A “work” item will usually correspond with the article on Wikipedia (if one exists), listing general metadata that are common to all instances of the text; like the title, the author and so forth.  The “edition” item would list specific metadata that is not shared between all instances of the text; like the publisher, the date of publication, the place of publication, the illustrator, the translator, the editor, and so forth.

This is best illustrated with Treasure Island as English Wikisource has two distinct and sourced versions of that text.  (See fig. 1.)

The page “Treasure Island” is a versions page (one of three types of disambiguation page on English Wikisource).  Attached to this are two texts: “Treasure Island (1883)” for the first book publication of the story, published by Cassell & Company (note that is was originally serialised in a magazine, which we do not have yet), and “Treasure Island (1911)” for an American edition published in 1911 by Charles Scriber’s Sons (Wikisource does not strictly require notability but, if it did, this edition would be notable for its N.C. Wyeth illustrations).

The disambiguation page has the associated data item Q185118, which is also the item used by Wikipedia and Commons.  The 1883 work has the data item Q14944007 and the 1911 work has the data item Q14944010; both link to the first item with the “edition of” property.

Diagram showing two versions of Treasure Island as children of the disambiguation page

Fig 1: Treasure Island on English Wikisource.

However, other Wikisources only have one translation each of Treasure Island.  If these each have their own “edition” data item, containing its unique metadata, then the interwiki function breaks down.

If the interwiki links are kept at the edition level, then few if any interwiki links will exist between works on Wikisource.  There might be a dozen different editions of Treasure Island in as many languages but, as each is different with different metadata, they will each have separate data items.

Ideally, from a database point of view, each Wikisource will also have a separation between the “work” and the “edition(s)”.  This occurs in this case on English Wikisource because there is a disambiguation page at the “work” level.  To implement this on a large scale, however, would require a disambiguation page for every work on every Wikisource, even if most would only contain a single link to a text (the “edition”); see fig. 2 for an example.  This would work from a computing point of view but it is unlikely to be popular or intuitive for humans.

Diagram of ideal situation, with interwiki linking via disambiguation pages

Fig 2: Wikilinking between disambiguation pages.

Practically, the solution is to mix the classes, as shown in fig. 3.  In this case, English Wikisource will (correctly) have the interwikis at the disambiguation level, connecting to the general “work” data item on Wikidata.  The two versions of Treasure Island in English will link to the disambiguation page within Wikisource as normal and would each have their own, separate Wikidata item with their individual data (but would not have interwiki links to any other language).  The non-English Wikisources will have no “work” level data item, instead linking their “editions” directly to the “work”.  This is messy and may confuse future users, not to mention depriving the non-English editions of their own data items with their individual metadata on Wikidata.  It isn’t good practice for a database but it may be the best compromise.

Diagram of compromise situation, with interwiki linking via both disambiguation pages and individual instances of the text

Fig 3: Wikilinking split between both levels.

This isn’t just an English vs. Other-Languages situation.  The roles are almost certainly reversed in some cases and the majority of works on English Wikisource stand alone, raising the question of whether they should have their own “edition” data items with specific data or link directly to the general “work” item.

A peripheral issue is that some data items on Wikidata do have metadata, often derived from Wikipedia articles, which would be inconsistent with Wikisource’s texts (or just wrong in some cases).

One long term goal for Wikisource-on-Wikidata is to centralise metadata, which is currently held both on Commons (for the scan file) and on Wikisource (primarily on the scan’s Index page, with some in the mainspace).  It should also facilitate interproject links, to quickly show a Wikipedian (for example) that associated content exists on other projects like Wikisource, Wikivoyage or Commons, possibly with a brief summary.  Neither may be possible without consistent data available.

This problem has not really been solved yet and it might be a while before a stable solution develops.

Advertisements

Author demographics

Leave a comment

August 2013 was Female Author Month on Wikisource, with two works by women transcribed from scratch via the community Proofread of the Month and a third work partly validated.[1] This is a result of a request for more works by female authors made on Scriptorium.

However, we don’t actually know if we have a significant dearth of female-authored works. We don’t have any demographic information about our authors beyond era, nationality (usually) and religion (sometimes).

Wikidata may help with that, whenever it is rolled out to the Wikisources. Amending each and every author page on English Wikisource would be hard work at the moment because the process would have to be mostly manual. However, with Wikidata, we wouldn’t even need a bot. The author header template (and maybe a Lua module) could just read the Wikidata “sex” property (P21) and apply a hidden tracking category.

This could be extended to other metadata. We could have tracking categories for the entire QUILTBAG[2] range with the addition of the “sexual orientation” property (P91) and whatever is used to cover transsexuality. Ethnicity might be possible with the “ethnic group” property (P172). There may be even more demographics worth tracking too, and these could be easily added over time.

This might bring to mind the recent controversy over Wikipedia consigning female authors to be categorised into a female author ghetto, while leaving male authors categorised as just authors. However, Wikisource wouldn’t be discriminating as this approach would be fully automated and applied equally to all authors in the Authorspace. Hidden categories would avoid labelling authors too much; not to mention avoiding redundant information many readers could deduce from the name and/or portrait.

Then we would actually know where we stand.

Notes

  1. These were: Marriage as a Trade by Cicely HamiltonDiaries of Court Ladies of Old Japan edited by Annie Shepley Omori; and Pride and Prejudice by Jane Austen.
  2. QUILTBAG: Queer Intersexual Lesbian Transsexual Bisexual Asexual Gay
    EDIT: Actually, I got this wrong.  The acronym stands for Queer/Questioning Undecided Intersex Lesbian Trans(-gender/-exual) Bisexual Asexual Gay.  See Wiktionary for a full definition and history.  Personally I would have merged the first two into Quantumsexual, but that’s just me.