Fixing Adobe InDesign CS3’s ePub files (part 2)

In this second instalment, we look at editing the content.opf file – part of the ePub file package. Our examples are based on the files created for our recent free e-book, Make Do & Cook: Savvy Shopping.

Editing content.opf

The content.opf file is the main metadata file describing the publication and its overall structure. It’s an XML file, and if you don’t know what that is, you may want to do some Googling on the subject first. The key thing to know is that XML uses tags inside angle brackets, much as HTML does. And like HTML, these tags come in pairs, with an opening tag and a closing one that includes a forward slash. Here’s a typical line from the content.opf file:

<dc:identifier id="wvp-id" opf:scheme="ISBN">1441493379</dc:identifier>

This is the <identifier> tag. Its name is prefixed with ‘dc:’ (which stands for Dublin Core – a metadata standard). Such prefixes are known as namespaces, and if that term is new to you too, then it’s time for more Googling. This tag also has attributes, additional bits of information that go inside the tag.

As content.opf is just a plain text file, you can open it with your favourite text editor. Make sure you use a text editor and not a word processor – you want something that can save ordinary text files. I use TextWrangler and Komodo Edit on the Mac. On Windows, you could use Notepad. Linux users are spoiled for choice (Kate was always my favourite).

Here are the highlights from the content.opf file as produced by InDesign (the bits in square brackets are my notes, not part of the file:

<?xml version="1.1"?>
 <package xmlns="http://www.idpf.org/2007/opf" version="2.0">
   <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
     <dc:title>Make Do &amp; Cook: Savvy Shopping</dc:title>
     <dc:creator>Patricia Mansfield-Devine</dc:creator>
     <dc:subject>shopping</dc:subject>
     <dc:subject>cooking</dc:subject>
     [... ditto for several other subjects ...]
     <dc:description>Shop the smart way so you can enjoy delicious and nutritious meals on the smallest budget.</dc:description>
     <dc:rights>Copyright Patricia Mansfield-Devine. All rights reserved.</dc:rights>
   </metadata>
   <manifest>
     <item id="ncx" href="toc.ncx" media-type="text/xml"/>
     <item id="cover" href="Cover.html" media-type="application/xhtml+xml"/>
     <item id="mdc-cover-front-fmt-jpeg" href="images/MDC_Cover_Front_fmt.jpeg" media-type="image/jpeg"/>
     <item id="mdc-savvyshopping-cove-fmt-jpeg" href="images/MDC_SavvyShopping_cove_fmt.jpeg" media-type="image/jpeg"/>
     <item id="title-page" href="Title_Page.html" media-type="application/xhtml+xml"/>
     <item id="mdc-cover-front-fmt-jpeg-1" href="images/MDC_Cover_Front_fmt.jpeg" media-type="image/jpeg"/>
     <item id="mdc-savvyshopping-cove-fmt-jpeg-1" href="images/MDC_SavvyShopping_cove_fmt.jpeg" media-type="image/jpeg"/>
     <item id="copyright-page" href="Copyright_Page.html" media-type="application/xhtml+xml"/>
     <item id="mdc-cover-front-fmt-jpeg-2" href="images/MDC_Cover_Front_fmt.jpeg" media-type="image/jpeg"/>
     <item id="mdc-savvyshopping-cove-fmt-jpeg-2" href="images/MDC_SavvyShopping_cove_fmt.jpeg" media-type="image/jpeg"/>
     [... ditto for all the other chapters ...]
     <item id="css" href="template.css" media-type="text/css"/>
     <item id="pt" href="page-template.xpgt" media-type="application/vnd.adobe.page-template+xml"/>
   </manifest>
   <spine toc="ncx">
     <itemref idref="cover"/>
     <itemref idref="title-page"/>
     [... ditto for the other chapters ...]
   </spine>
 </package>

The metadata section was created by InDesign based on the metadata we entered in the ID file itself. We also used ID’s ‘book’ feature, where each chapter or section of the book is kept in a separate file. This creates a proper table of contents in the ePub file.

There are two problems with this file. First it’s missing certain elements that are needed to make the file fully standards compliant. This is the source of that worrying message: ‘The document appears to have minor errors that might cause it to be displayed incorrectly’. And second, there are additional elements we would like in there.

Compliance issues

We’re going to be using some XML namespaces not already referred to in the document. So our first job is to edit the <metadata> tag on line 2. Here’s how it ends up:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns:opf="http://www.idpf.org/2007/opf">

The opening <package> tag should contain an element called ‘unique-identifier’. This is actually a pointer to the <identifier> tag we met earlier. As you can see, that tag included the attribute id=”wvp-id”. So in this case, the <identifier> tag has an id of ‘wvp-id’. You can choose whatever value you want for the id (ours is short for WebVivant Press ID), but it pays to be consistent. All of our books will use ‘wvp-id’.

So, we edit the opening <package> tag to read:

<package unique-identifier="wvp-id">

And then we add the <identifier> tag to the metadata section. It can come anywhere in that section, so long as it’s between the metadata opening and closing tags.

<dc:identifier id="wvp-id">{unique-id}</dc:identifier>

You need to replace the {unique-id} bit with something, well, unique. Unique, that is, to this book. If your book has an ISBN, you could use that, in which case you can also add the attribute, opf:scheme=”ISBN”. Here’s what the tag looks like with an ISBN:

<dc:identifier id="wvp-id" opf:scheme="ISBN">1441493379</dc:identifier>

Because Savvy Shopping doesn’t have an ISBN, we went with:

<dc:identifier id="wvp-id">WVP201002-Savvy-Shopping</dc:identifier>

Further down the file, in the <manifest> section, InDesign assigns the wrong mimetype to the ‘ncx’ table of contents entry. It says:

<item id="ncx" href="toc.ncx" media-type="text/xml"/>

In fact, it should be:

<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>

There’s one final problem. In the manifest section, you’ll see that each chapter file gets its own entry. The same goes for the images used in the book. Now, Savvy Shopping has only two images – one used as the cover and one used inside. But references are created to these images with every chapter. Here’s what the lines look like for the first two ‘chapters’ – the cover and the title page:

<item id="cover" href="Cover.html" media-type="application/xhtml+xml"/>
<item id="mdc-cover-front-fmt-jpeg" href="images/MDC_Cover_Front_fmt.jpeg" media-type="image/jpeg"/>
<item id="mdc-savvyshopping-cove-fmt-jpeg" href="images/MDC_SavvyShopping_cove_fmt.jpeg" media-type="image/jpeg"/>
<item id="title-page" href="Title_Page.html" media-type="application/xhtml+xml"/>
<item id="mdc-cover-front-fmt-jpeg-1" href="images/MDC_Cover_Front_fmt.jpeg" media-type="image/jpeg"/>
<item id="mdc-savvyshopping-cove-fmt-jpeg-1" href="images/MDC_SavvyShopping_cove_fmt.jpeg" media-type="image/jpeg"/>

The second and third lines refer to the images. These lines are fine and need to be kept. But lines 5 and 6 are references to the same images (the ‘href’ parts are identical) even though the tags get new IDs (by having the -1 suffix added). Indesign did the same with all the other chapters. This is unnecessary and may make e-readers report an error. Certainly, epubcheck doesn’t like it. So we deleted all the superfluous image references, keeping only the first reference to each image.

Adding lines

That’s it for making the file compliant. Now to add some additional features.

The next step is to add lines to the <metadata> section. Like I said, InDesign has already created a number of entries there. We add:

<dc:publisher>WebVivant Press</dc:publisher>
<dc:language>en-GB</dc:language>
<dc:date xsi:type="dcterms:W3CDTF">{date}</dc:date>

Clearly, you’ll want to use your own publisher name. The ‘en-GB’ entry is for British English. You could use en-US for American English or just en, assuming your book is in English.The contents of this tag need to comply with RFC 3066 (see http://www.ietf.org/rfc/rfc3066.txt).

You need to replace the {date} part with the publication date. For books, the most common formats are YYYY, YYYY-MM or YYYY-MM-DD. The acceptable date formats are defined by ‘Date and Time Formats’ at http://www.w3.org/TR/NOTE-datetime which is based on
ISO 8601. We prefer to go with month and year, so the entry for Savvy Shopping looks like this:

<dc:date xsi:type="dcterms:W3CDTF">2010-02</dc:date>

The <creator> tag works as it is, but we add a little extra information – data that helps systems sort and store the document and what role the creator played in this case. So our tag looks like this:

<dc:creator opf:file-as="Mansfield-Devine, Patricia" opf:role="aut">Patricia Mansfield-Devine</dc:creator>

In the references below, you’ll find lots more info about roles, and also the optional <contributor> tag.

Finally, in the <rights> tag we fix that copyright symbol problem by adding the &#169; entity, so it looks like this:

<dc:rights>Copyright &#169; Patricia Mansfield-Devine. All rights reserved.</dc:rights>

And that’s it for the content.opf file.

In the final part, we’ll look at correcting the toc.ncx file and have a quick discussion about CSS.

Resources:

« Part 1 – Introduction
» Part 3 – Fixing the toc.ncx file and CSS

 

Leave a Reply

Your email address will not be published. Required fields are marked *