Home

Blackletter is broken again

2 Comments

Example text in a blackletter, or fraktur, typeface.The Wikisource template blackletter is broken again.  All this template does is render text is a different font, in this case UnifrakturMaguntia, so this isn’t critical.  It is a little annoying, however, as this occurred without warning and it’s the second time it’s happened now.

Wikisource is different from its sister projects in that it tries to remain as faithful as possible to a specific, pre-existing source.  We don’t make a distinction between serifed and non-serifed fonts but be do like to reproduce some more significant type, like red “ink” and blackletter (aka fraktur or gothic) text, whether it is decorative or meant to confer some meaning.

This was supported by WebFonts but was initially broken by the move to the Universal Language Selector.  That was easily fixed but now the ULS has been changed to an opt-in preference, so the majority of Wikisource’s readers cannot see the font.  Also affected are the language-specific fonts, such as small pieces of Greek or Arabic within otherwise English text in the Latin script.  This will hopefully be fixed soon (it isn’t the biggest effect of the change) but it may be necessary to stop using these extensions in this way and find an alternative.

As an aside: We still don’t have an insular font.  Where needed, there is currently a template inserting an SVG image for each letter, which isn’t ideal.

Advertisements

Works and Editions

Leave a comment

Title page from the 1911 edition of Treasure Island

Just as I’ve been getting back up to speed with Wikisource following an internet-less Christmas, Wikisource has begun to be integrated with Wikidata.  At the moment, this just means interwiki links and new Wikidata pages but, naturally, some small problems have already occurred.  The most significant I’ve seen is the question of how to handle the mainspace.

Wikisource is the third of the Wikimedia sorority to be supported by Wikidata, after big sister Wikipedia and little, adopted sister Wikivoyage (or possibly fourth if we count the seemingly partial support of Commons).  Wikisource is different from these projects because, while the others will usually just have one page for each item, Wikisource can host multiple editions of the same work, each requiring a separate but linked data item.  (Then there is the subject of subpages but we’ll leave that for now.)

The books task force on Wikidata had already come up with a system to implement this: two separate classes of item, a “work” item to cover the text in general and one or more “edition” items to cover individual instances of that text.  A “work” item will usually correspond with the article on Wikipedia (if one exists), listing general metadata that are common to all instances of the text; like the title, the author and so forth.  The “edition” item would list specific metadata that is not shared between all instances of the text; like the publisher, the date of publication, the place of publication, the illustrator, the translator, the editor, and so forth.

This is best illustrated with Treasure Island as English Wikisource has two distinct and sourced versions of that text.  (See fig. 1.)

The page “Treasure Island” is a versions page (one of three types of disambiguation page on English Wikisource).  Attached to this are two texts: “Treasure Island (1883)” for the first book publication of the story, published by Cassell & Company (note that is was originally serialised in a magazine, which we do not have yet), and “Treasure Island (1911)” for an American edition published in 1911 by Charles Scriber’s Sons (Wikisource does not strictly require notability but, if it did, this edition would be notable for its N.C. Wyeth illustrations).

The disambiguation page has the associated data item Q185118, which is also the item used by Wikipedia and Commons.  The 1883 work has the data item Q14944007 and the 1911 work has the data item Q14944010; both link to the first item with the “edition of” property.

Diagram showing two versions of Treasure Island as children of the disambiguation page

Fig 1: Treasure Island on English Wikisource.

However, other Wikisources only have one translation each of Treasure Island.  If these each have their own “edition” data item, containing its unique metadata, then the interwiki function breaks down.

If the interwiki links are kept at the edition level, then few if any interwiki links will exist between works on Wikisource.  There might be a dozen different editions of Treasure Island in as many languages but, as each is different with different metadata, they will each have separate data items.

Ideally, from a database point of view, each Wikisource will also have a separation between the “work” and the “edition(s)”.  This occurs in this case on English Wikisource because there is a disambiguation page at the “work” level.  To implement this on a large scale, however, would require a disambiguation page for every work on every Wikisource, even if most would only contain a single link to a text (the “edition”); see fig. 2 for an example.  This would work from a computing point of view but it is unlikely to be popular or intuitive for humans.

Diagram of ideal situation, with interwiki linking via disambiguation pages

Fig 2: Wikilinking between disambiguation pages.

Practically, the solution is to mix the classes, as shown in fig. 3.  In this case, English Wikisource will (correctly) have the interwikis at the disambiguation level, connecting to the general “work” data item on Wikidata.  The two versions of Treasure Island in English will link to the disambiguation page within Wikisource as normal and would each have their own, separate Wikidata item with their individual data (but would not have interwiki links to any other language).  The non-English Wikisources will have no “work” level data item, instead linking their “editions” directly to the “work”.  This is messy and may confuse future users, not to mention depriving the non-English editions of their own data items with their individual metadata on Wikidata.  It isn’t good practice for a database but it may be the best compromise.

Diagram of compromise situation, with interwiki linking via both disambiguation pages and individual instances of the text

Fig 3: Wikilinking split between both levels.

This isn’t just an English vs. Other-Languages situation.  The roles are almost certainly reversed in some cases and the majority of works on English Wikisource stand alone, raising the question of whether they should have their own “edition” data items with specific data or link directly to the general “work” item.

A peripheral issue is that some data items on Wikidata do have metadata, often derived from Wikipedia articles, which would be inconsistent with Wikisource’s texts (or just wrong in some cases).

One long term goal for Wikisource-on-Wikidata is to centralise metadata, which is currently held both on Commons (for the scan file) and on Wikisource (primarily on the scan’s Index page, with some in the mainspace).  It should also facilitate interproject links, to quickly show a Wikipedian (for example) that associated content exists on other projects like Wikisource, Wikivoyage or Commons, possibly with a brief summary.  Neither may be possible without consistent data available.

This problem has not really been solved yet and it might be a while before a stable solution develops.

Common wikisource

Leave a comment

As a follow up to my ramblings about Multilingual Wikisource: I have heard some people ask why all Wikisources are not Multilingual Wikisource, like Commons. (I have even heard “Why isn’t Wikisource part of Commons?”)

The latter is easily answered. Aside from the fact that Wikisource needs specific technology to function, it has a different scope and mission to Commons, which would clash if both were part of the same project.

There are many reasons for the former. I think the original was something to do with right-to-left text, which has been solved by now. Others still stand, however.

Disambiguation would be a nightmare, for example. The Bible is complicated enough in English on just one project. Multiple editions in each of hundreds of languages would be ridiculous. This could be solved with, say, namespaces but there are a finite number of namespaces in the MediaWiki software. Besides, the difference between a namespace and a language subdomain is negligible from a technological point of view. The same goes for disambiguation for that matter. A language subdomain is just a bigger version of the concept.

On a different tangent, while Commons is technically multilingual—and a lot of work has gone into supporting that—it is still predominantly English. Community communication is overwhelmingly done in English, English is the default for categories and templates, and so forth. Some grasp of English is often necessary to function on Commons. Language subdomains allow the monolingual (and the multilingual but not anglophone) Wikimedians to take part too, which is more important in curating a library than a media depository.

Obviously, now that we actually have language subdomains, we also have the problems of different cultures and communities on the different projects. Italian doesn’t allow translation, German doesn’t allow non-scans, French doesn’t allow annotation; while some languages, like English and Spanish, are pretty promiscuous in their content. There are likely to many more, seemingly trivial, quirks that are at odds across different projects. If anyone ever did attempt unification, these communities would clash and conflict all over the place, probably ending in either mutually assured destruction or a very small surviving user base.

You may as well ask why Wikipedia bothers with language subdomains when it could just be Multilingual Wikipedia, like Commons.

Multilingual ramblings

Leave a comment

Old Wikisource” (oldwikisource:) is the incubator of the Wikisources. Languages that do not yet have enough works in their library are all held here, from Akkadian to Zulu, before later potentially budding off into their own projects. They are not part of the actual Incubator because Wikisource relies on specific technology that is not installed there (and probably would need to be heavily adapted to fit it).

One problem this creates is that “oldwikisource” is not a recognised ISO 639 language code. Interwiki links do not work. Wikidata will have a hard time indexing it. No one really knows it’s there.

Fortunately, the International Organization for Standardization predicted situations such as this and included a few extra codes in their set. One of these is “mul” for multiple languages, for situations where databases need to categorise things by language but where some of those things have many. This could mean, for example, mul.wikisource, or even mul.wikipedia, mul.wikibooks, etc (although those are just possibilities, not suggestions).

In other words, exactly what Wikimedia requires for Old Wikisource. Mul could be used for interwiki links from other Wikisources, bringing some attention and potential traffic to an otherwise excluded and ostracised project. Mul could be used on Wikidata to collect and connect pages. Mul is already used in some parts of Wikisource to refer to the not-sub-domain.

It also helps that Old Wikisource, while accurate as the original project, is not as easily explained to Wikimedians on our sister projects as is “English Wikisource”, “French Wikisource” or, as it happens, “Multilingual Wikisource”.

So, for preference, I would see Old Wikisource become Multilingual Wikisource. I think it would make lots of things easier, while making the project more visible, more functional, and slightly more obvious to outsiders. It must be said that I am not a regular on Old Wikisource and those that are may not agree.

Fully enabling ISO 639 in Wikimedia would also technically affect user language options too. A user could conceivably select “Multiple” as their preferred language, regardless of where they were in Wikimedia. In practice, this would probably just default to English, so I don’t think it would be a big problem.

More serious would be the amount of trouble this would be to implement. Just creating an alias for Old Wikisource would be easiest, so the code could be used as described without really changing much.

In my view, moving the project entirely is still better: with most existing pages going to mul.wikisource.org and just a portal remaining at wikisource.org (in line with its sister projects like Wikipedia). If changes are going to be made, we might as well go all the way rather than patch the system with aliases. That’s a lot of work for relatively little gain though, and I don’t know how keen the current Old Wikisourcers would be with this option (nor the technical people who would have to do all the heavy lifting).

I haven’t actually made any proposal based on this (a some related bug reports have been open for years, however). I’m still not sure what would be best nor what the wider community would prefer and I’m just thinking, or typing, out loud. This is just a blog after all.

As it stands, though, my opinion is that Multilingual Wikisource would probably work better than Old Wikisource.

Wikisource and e-books

5 Comments

Wikimedia in general can now produce e-books in EPUB format on demand.  However, Wikisource was actually there first and is ahead of the pack in this area.

Wikimedia projects, such as Wikisource’s sister project Wikipedia, use the “book tool” to collect pages into books.  These can be printed and bound as print books via PediaPress, as well as produced into electronic format.  Initially PDF was the main format and recently EPUB has been added.  A problem from the point of view of Wikisource is that this tool does not take into account it’s specific qualities; it was built for Wikipedia and ignores the other projects.  For example, it adds a licence in the back matter of its output that claims a Creative Commons licence.  This is entirely accurate for many projects but amounts to copyfraud when applied to Wikisource’s public domain works.

Anyway, there is an alternative.  France is the technological home of Wikisource.  Hebrew was the first language-specific Wikisource and English is currently the largest but the technology on which Wikisource runs always seems to emanate from French Wikisource.  In this case, the tool WS Export was originally developed by French wikisourcers for French Wikisource and works for all language domains.  It supported EPUB before the book tool and looks likely to support Mobipocket first too.  More importantly, the tool and its output works better with Wikisource and attends to Wikisource’s quirks.

In November 2012, 3,700 EPUB works were produced by this tool.  Not surprisingly, French Wikisource produced the most (1,176), followed by Italian Wikisource (1,049) and English Wikisource (674).  Other EPUBs ranged from Breton (br) to Farsi (fa) to Venetian (vec).