Home

Works and Editions

Leave a comment

Title page from the 1911 edition of Treasure Island

Just as I’ve been getting back up to speed with Wikisource following an internet-less Christmas, Wikisource has begun to be integrated with Wikidata.  At the moment, this just means interwiki links and new Wikidata pages but, naturally, some small problems have already occurred.  The most significant I’ve seen is the question of how to handle the mainspace.

Wikisource is the third of the Wikimedia sorority to be supported by Wikidata, after big sister Wikipedia and little, adopted sister Wikivoyage (or possibly fourth if we count the seemingly partial support of Commons).  Wikisource is different from these projects because, while the others will usually just have one page for each item, Wikisource can host multiple editions of the same work, each requiring a separate but linked data item.  (Then there is the subject of subpages but we’ll leave that for now.)

The books task force on Wikidata had already come up with a system to implement this: two separate classes of item, a “work” item to cover the text in general and one or more “edition” items to cover individual instances of that text.  A “work” item will usually correspond with the article on Wikipedia (if one exists), listing general metadata that are common to all instances of the text; like the title, the author and so forth.  The “edition” item would list specific metadata that is not shared between all instances of the text; like the publisher, the date of publication, the place of publication, the illustrator, the translator, the editor, and so forth.

This is best illustrated with Treasure Island as English Wikisource has two distinct and sourced versions of that text.  (See fig. 1.)

The page “Treasure Island” is a versions page (one of three types of disambiguation page on English Wikisource).  Attached to this are two texts: “Treasure Island (1883)” for the first book publication of the story, published by Cassell & Company (note that is was originally serialised in a magazine, which we do not have yet), and “Treasure Island (1911)” for an American edition published in 1911 by Charles Scriber’s Sons (Wikisource does not strictly require notability but, if it did, this edition would be notable for its N.C. Wyeth illustrations).

The disambiguation page has the associated data item Q185118, which is also the item used by Wikipedia and Commons.  The 1883 work has the data item Q14944007 and the 1911 work has the data item Q14944010; both link to the first item with the “edition of” property.

Diagram showing two versions of Treasure Island as children of the disambiguation page

Fig 1: Treasure Island on English Wikisource.

However, other Wikisources only have one translation each of Treasure Island.  If these each have their own “edition” data item, containing its unique metadata, then the interwiki function breaks down.

If the interwiki links are kept at the edition level, then few if any interwiki links will exist between works on Wikisource.  There might be a dozen different editions of Treasure Island in as many languages but, as each is different with different metadata, they will each have separate data items.

Ideally, from a database point of view, each Wikisource will also have a separation between the “work” and the “edition(s)”.  This occurs in this case on English Wikisource because there is a disambiguation page at the “work” level.  To implement this on a large scale, however, would require a disambiguation page for every work on every Wikisource, even if most would only contain a single link to a text (the “edition”); see fig. 2 for an example.  This would work from a computing point of view but it is unlikely to be popular or intuitive for humans.

Diagram of ideal situation, with interwiki linking via disambiguation pages

Fig 2: Wikilinking between disambiguation pages.

Practically, the solution is to mix the classes, as shown in fig. 3.  In this case, English Wikisource will (correctly) have the interwikis at the disambiguation level, connecting to the general “work” data item on Wikidata.  The two versions of Treasure Island in English will link to the disambiguation page within Wikisource as normal and would each have their own, separate Wikidata item with their individual data (but would not have interwiki links to any other language).  The non-English Wikisources will have no “work” level data item, instead linking their “editions” directly to the “work”.  This is messy and may confuse future users, not to mention depriving the non-English editions of their own data items with their individual metadata on Wikidata.  It isn’t good practice for a database but it may be the best compromise.

Diagram of compromise situation, with interwiki linking via both disambiguation pages and individual instances of the text

Fig 3: Wikilinking split between both levels.

This isn’t just an English vs. Other-Languages situation.  The roles are almost certainly reversed in some cases and the majority of works on English Wikisource stand alone, raising the question of whether they should have their own “edition” data items with specific data or link directly to the general “work” item.

A peripheral issue is that some data items on Wikidata do have metadata, often derived from Wikipedia articles, which would be inconsistent with Wikisource’s texts (or just wrong in some cases).

One long term goal for Wikisource-on-Wikidata is to centralise metadata, which is currently held both on Commons (for the scan file) and on Wikisource (primarily on the scan’s Index page, with some in the mainspace).  It should also facilitate interproject links, to quickly show a Wikipedian (for example) that associated content exists on other projects like Wikisource, Wikivoyage or Commons, possibly with a brief summary.  Neither may be possible without consistent data available.

This problem has not really been solved yet and it might be a while before a stable solution develops.

Copyright illiteracy redux

Leave a comment

Original Weird Tales illustration for

The problems of copyright-renewed works being added to Wikisource continue.  In this case, by me.  I added “Tell Your Fortune” by Robert Bloch to Wikisource as part of Weird Tales (vol. 42, no. 4, May 1950).  Bloch, author of Psycho and mentee of Lovecraft, mostly renewed his copyrights but missed the occasional piece.  He has a few letters hosted on Wikisource already but this would have been his first work of fiction.  I uploaded it, transcribed it, proofread it and eventually transcluded (ie. “published”) it when the work was done. 

And then it transpired that the copyright had been renewed after all and hosting it on Wikisource is illegal.

I honestly did try to make sure that I caught all the copyright renewals.  I checked scans of the copyright renewal catalogues, transcriptions of those scans, the US Copyright Office’s online database and Google searches.  1950 is an odd year as it was transitional; renewals can be recorded in either the old-style printed catalogues or on the newer official database. There is no complete, single source for this type of renewal.  I created Weird Tales and its subpages mostly to record information like this for this precise reason.  I did catch some other renewals in this issue, “The Last Three Ships” by Margaret St. Clair and “The Man on B-17” by August Derleth, and redacted them from the scan accordingly.  This one escaped me, however, despite being clearly entered on the Copyright Office’s database.

So, it’s worth quadruple-checking the copyrights before you do all of the work necessary to get a text on Wikisource.

There are still usable parts of the issue, such as the poem “Luna Aeternalis” by Clark Ashton Smith and the short story “The Triangle of Terror” by William F. Temple.  Smith has many works already on Wikisource but few of them are backed by scans yet (and some were recently deleted and re-hosted in Canada on Wikilivres).  Temple, a British science fiction author, is new to Wikisource.  This story is actually interesting copyright-wise because Temple only died in 1989 and so his works are still under copyright in the UK.  As this work was first published in the US, however, it is in the public domain under American law due to non-renewal.

The internal cost of copyright illiteracy

6 Comments

More so than most other Wikimedia projects, except perhaps Commons, copyright is a big deal for Wikisource.  Obviously we can only host public domain or freely licensed works; which is generally understood.  The problem comes from copyright law itself not being generally understood.  (I can’t claim to be especially knowledgeable about copyright myself but I have picked up a lot as part of the Wikisource community.)

Many people apparently believe certain works must or should be out of copyright without checking or they do check but miss some detail of copyright law.  Wikisource as a project can deal with this by deletion but it still impacts volunteers.

A recent example is the science fiction short story “Time Pawn” by Philip K. Dick, a story that was published in 1954 in an issue of Thrilling Wonder Stories.  Under the law of the time, the initial copyright period ended in 1982 when it could have been renewed for another period.  As this didn’t happen it would seem to have entered the public domain.  However, while the short story was not renewed, the issue of the magazine itself was, under renewal registration number RE0000112616 in January 1982 by CBS Publications.  It has been established, in Goodis v. United Artists Television, Inc., “that where a magazine has purchased the right of first publication under circumstances which show that the author has no intention to donate his work to the public, copyright notice in the magazine’s name is sufficient to obtain a valid copyright on behalf of the beneficial owner, the author or proprietor.”  Lacking information to the contrary, we must assume that this applies to Dick’s story; the renewal of the copyright on Thrilling Wonder Stories also renewed the copyright on “Time Pawn” so, unless it was reassigned, CBS currently hold the rights on the story until about 2050.

The real issue here is that another user, not the uploader, completed the proofreading of the entire story in good faith.  At which point it was noticed by yet another user and rightly marked it as a copyright violation.  Now that good-faith user’s effort is wasted and they may be permanently disillusioned with the project.  Everyone loses.

This is actually partly my fault.  I noticed the upload and I tagged a separate, similar upload (“Small Town“) for deletion for the same reason but I didn’t connect the two.

I’m not sure what else can be done to prevent things like this from happening.  Both Wikisource and Commons already have help pages on copyright that should explain the problem.  Constant vigilance (and better awareness on my part, at least) may be the only solution, but that is unlikely to be foolproof.

Note 1: “Small Town” was published in Amazing Stories, which hardly ever had its copyrights renewed, in the very first issue to do so.  Conversely, Thrilling Wonder Stories, along with the entire “Thrilling…” stable of magazines, apparently had consistent copyright renewals across the board.  Ironically, that isn’t true under its earlier incarnation as simply Wonder Stories, a pulp also created by Hugo Gernsback after he lost control of Amazing Stories.

Note 2: A later version of “Time Pawn” (published in Startling Stories, Summer 1955) appears to have been renewed as well, under RE0000190631 in 1983 by Dick’s children.  This may or may not be relevant; a court could declare it close enough.

To Annotate or Not to Annotate?

Leave a comment

I have recently been reminded about the topic of annotation.

Annotation remains a vexed issue on the English Wikisource. Not all Wikisources accept annotations; English used to be one that did. After a contentious debate the entire policy ended up being blanked pending any sort of consensus and has remained that way for over a year. That just lead to a sort of no-man’s-land, with different editors doing their own, potentially contradictory things.

The main issue, of course, is whether or not Wikisource should host texts with user-generated annotations.

Part of Wikisource’s mission is to provide accessible copies of source texts. Texts that should remain as faithful and pure as possible. Wikisource does not even correct typos.

Being a wiki, however, the texts could have added depth and usefulness if they provided more information. Place names, for example, change over time and perhaps a reader does not know that Constantinople is Istanbul. It’s simple to add this to the text, in many different ways, but if you do, then the text is slightly less faithful and slightly less pure than it could have been.

That leads to the next two issues: What counts as an annotation and how much is allowed, if any. Some say that even a humble wikilink is an annotation and these must all be purged to maintain textual purity. Users have removed wikilinks for this reason in the past. Others go further than wikilinks and add new footnotes, diagrams and maps to help improve the clarity of a text. Most users are somewhere in between; I’ve done both all of the above.

A casual reader can be helped by having information put in context, or locations pointed out on maps, or have names linked to full biographies. However, if a reader wants to know what exactly a reader in the past would have read, or what a specific author actually published, then user annotations start to obfuscate matters, even if marked.

Keeping multiple copies of texts is one solution: a pure text and a clearly marked annotated version. That doubles work load, however, and presents some technology problems. Technology might be a solution, with the mooted “onion skin” Wikisource 3.0, but that remains theoretical at the moment. Hebrew Wikisource, the oldest standalone Wikisource, uses a special namespace just for annotations, although it is currently the only one to do so. If we are going to put the text somewhere else, why not a different project altogether? This does technically fall within Wikibooks’ bailiwick but will simple wikilinks be enough on that project and are they going to be happy with the buck being passed to them? Even if so, how would we stop new users coming along and putting wikilinks on a Wikimedia project?

The case continues.