Monday, August 17, 2009

What I learned making an epub eBook

I happen to own an Amazon Kindle and a SONY Reader. I've taken it upon myself to learn out how to make every eBook that I own readable on both devices. Sadly, the only eBook format that works on both devices is PDF. But PDF expects a specific page size and my readers happen to have different display sizes. I've decided to convert everything on my Amazon Kindle to MOBI format and everything on my SONY Reader to epub format.

Removing DRM from eBooks is a violation of Federal law or maybe just talking about it is. So, I've been messing about with Gutenberg texts. And other freely available texts. A couple days ago I stumbled upon a CD-ROM full of Puritan texts. I saw one and decided to convert it into an ebook. I intended to use Calibre to convert the ebook from epub to mobi. But first, I'd have to make a good looking epub file.

The first thing I learned was that the CD-ROM wasn't as good as the web, b/c the book was in PDF and it was easier to start with this html.

The first thing I learned was that WinZip can read epub files, because epub files are just zip files with a differently named extent. Just put all the HTMLs in a zip, rename the file extent. And add a few "extras." (I'll come back to this.)

The second thing I learned was that the epub file may look good in Calibre and on the Kindle, but fail miserably on the Sony Reader. The next step was to figure out how to validate the epub. And I found this site helpful.

The validation process told me the obvious: convert html to xhtml. Mostly by changing all <br> elements to <br/> and all <hr> elements to <hr/>. And the error messages eventually directed me to all but one fix that I needed to make.

If your HTML has any illegal characters, e.g. an accented 'e' like this, é, you'll get absolutely no help figuring out what the bad character is or where it occurs. You'll want to convert it to an escaped version: &#233; or you'll get a useless error message like this: "I/O error reading" without any clue as to where or what the problem is.

After you get an epub that passes validation. You're not done, because the SONY has a limitation. It can't handle any single chapter that's more than 100k in size. Thus you'll have to split all the content into pieces that are smaller than that.

Let's suppose you've got a set of cherry XHTML files in a zip file. It's still not an epub file until you've added the extras I mentioned above. You'll need to add files named:

1) mimetype that holds "application/epub+zip"
2) toc.ncx that defines a table of contents
3) content.opf that defines the contents of the epub.
4) container.xml that names content.opf

These extras were a little intimidating for me to dream up from scratch. So I cheated. I used Calibre to convert a tiny PDF to epub. Then I started replacing and extending the parts and pieces until I had replaced the tiny PDF's content with the desired book's content. Moving step by step through the various files, I could study each change in isolation and get an idea of why things worked or didn't.

In the end, if you're going to mess about with Gutenberg ebooks, you really want to put the extra effort into making them look pretty. This means googling around to find an picture of the book's cover. Or if you're artsy, design your own cover. Or if you've got the book in dead-tree format & scan it. And then there's the business of setting up the table of contents. I think you'll want to aim for a table of contents that fits on a single screen. Finally, you'll want to properly identify the book's publisher and isbn number. I usually look up the book on Amazon and copy whatever metadata I find there. Quality is a matter of attention to detail.

This is my latest foray into the realm of "bookmaking" and I know I've got a lot of learning to do.

No comments: