Monday, September 13, 2010


It is the same dilemma that Shakespeare put forth.

DRM or No DRM. 

The world is abuzz with people wanting that DRM to be stripped of eBooks. There is no apparent reason why so many people do not want DRM however the noise is towards having no DRM. 

DRM potentially provides a lot more security to content in terms of copyrights and distribution. The major issue is portability. Today if a person purchases an Adobe PDF or ePUB that has been downloaded through ACS4 on a Sony Reader cannot be ported to iPad. 

This is the cause of frustration. People do want protected content however the portability is a question. As the eBook marketing is growing there are many devices with different versions coming up. eBook consumers certainly do not want to purchase eBooks more than once (unless there is a new edition). 

What is the solution? The solution is the standards must support and provide for portability options. If eBooks become portable that is if I can read the same eBook on Sony Reader, Kindle, iPad and likes of these devices, along with reading on Windows, Mac and Linux or other OS its going to be a boon for the consumer. 

After all the books are being made for the consumers.

Comments welcome

Saturday, September 4, 2010

ePUB the monster

Wow ePUB over the years has become a monster. You don't agree? Why? Well haven't you heard the noice about people finding it difficult to understand ePUB. Read reviews and previews which are available in plentiful on the internet. 

The Monster: For heavens sake everyone needs to understand. ePUB is just a set of XHTML files packaged together. How will it look on this device or on that is not the question. The question is do you really know your XHTML and CSS well?

How do you do it? 
Well understand that the content must look good. With proper styles. With proper structure. Its the same as designing a pBook. The difference, the limitations. 

Best Way out? 
Use standards. 

Does it really matter that the looks same as the pBook?
For me it hardly matters, the electronic medium as much more to just designing. If I read a book with lot of Index terms or Bibliography references then I prefer correct hyperlinks. Nicely styled paragraphs and clearly defined sections. Thats it the eBook is done. 

I have always said this and I reiterate, devices will come and go, what today is true will be obsolete tomorrow. 

Think hard on this. 

Next post XML work flow for Publishing.

Friday, July 30, 2010

Multimedia eBooks

There has been a long standing debates regarding whether there should be multimedia included in ebooks are not! What exactly would be multimedia in eooks? Audio, well then why not have the entire book as a narrative and as an audio book. Video, then why not make a movie out of it or put in on popular video streaming sites.

The very idea of multimedia eBooks poses a lot of unanswered questions

File size?
What happens the file size of the eBook? including audio an video in the ebook will always have an inherent boom in the file size. How much would one want to compress the file? Compression means loss of quality. A quality failure means the eBook will not be sold. Another aspect is that now most of the eBooks are being created with hand held devices in mind. All of these hand held devices will be mostly connected through WiFi, meaning less download speed. Increase in file size will only slow down the process, irritating the readers?

Reading Experience?
Once multimedia is included the reading experience will go for a toss. There will be no reading sequence, or it will be listening and seeing sequence.

Multimedia eBooks are good for the academic and education sector, but it’s a small market. There are already people creating multimedia education systems, hence books run out of the race.

Device Size?
Multimedia requires crystal clear sharpness in Audio as well as Video. Are we looking at 17” hand held devices now? Like holding a monitor, I think it is going that way. Most of us don’t like small screens hence we have 42” LCD or LED televisions, don’t we? But we do want to have multimedia in books. Wow a library of heavy duty books in a big and compact HDD attached to a large hand held screen, I see it coming :)

Why is everyone running in for multimedia eBooks? It’s a long way down. I see people getting back to chisel and hammer to carve out on rocks. Because that is what remains. A good book is the one with an excellent reading experience, and NOT listening or seeing experience. For that Audio books DTBs are there,and for Video movies (created from books) are there.

Content is very important, however different media and mediums have different purposes. Let them be used with the purpose they are meant to be used. Leave the books to be read rather than heard or seen :). There are many other ways of including multimedia with the book rather than within the book. Sometime later I might lose my laziness and express my views on that :)

This is completely my opinion you can very well disagree with it!

Tuesday, July 6, 2010

PDF - As I Understand It

Earlier I have written about PDF and the types of PDF files that can be created. Today I want to elaborate on the my understanding of the PDF.

Since I became accustomed to PDF I always wondered what "Exactly is a PDF file", to be more precise, what exactly is "page in a PDF". I compared it to Microsoft Word, Open Office writer and many similar softwares. One fine day (this was in early 2002), in one of the conversation with my CEO (Versaware India Pvt. Ltd.), he mentioned "its just like paper".

When I returned to my desk, I sat down for a while just thinking on the statement. I realised he was absolutely right. It is a "paper", similar to the ones we day in day out keep printing stuff that is not essential. The only difference being that a PDF is an electronic paper (nature friendly). Enough of flash back, back to real thing.

As per my understanding PDF consists of four layers.

1) Content Layer: This is the upper most layer, which consists of text and or images. This is the layer that is visible (mostly, I will comeback to why I say mostly)
2) Inline Style Layer: This is the layer that decorates the content, the inline styles bold, italics, underline, superscript, subscript etc.
3) Content Style Layer: This is the layer that defines the structure of the content, the paragraph styles, fonts, font size etc.
4) Canvas Layer: This is the layer that defines the Page size,the galley, the margins and the header and footer area.

Seriously I never knew about this until, we were experimenting on content extraction from PDF and I requested one of the programmers to extract as much information from the PDF file as possible. To my surprise, all of the above information is stored very systematically within the PDF. This information can be extracted and reused and repurposed if the content is extracted with the PDF.

It is very important to note that once a PDF is created you cannot do much with it, it is the same as a printed page. At the most you can add in some remarks or annotations or notes. Nothing much.

PDF is a very good source of content storage in an absolutely elegantly styled way.

Coming back to mostly, there are some PDF files where the entire page is an image and the text content of the page is either maintained in front of the image or behind the image. If it is behind the image the text content will not be visible. This mostly done to make an image PDF searchable.

I think content if styled properly, can be extracted to HTML files and this content can be used to created ePUB files. By styled properly I mean the page layout with clearly defined and not to clogged layout. This will help to have the ePUB looking closer to the PDF. It is also important to note that making eBooks look pretty will not always help your books. Most devices just go ahead and destroy your layout.

Keep it simple! Thats the best Bet!


Tuesday, June 29, 2010

Standards, and Adhering to them

Many a times during various phases of my career, I have faced many situations where I have hated to work with standards. Today as I see it, Standards are a must. be it eBook Production, Publishing or Secure Document Management.

So many times I have had this question in mind, everyone is talking about creation of eBooks, Workflows, however I have seen no one talking about Section 508 compliance, or any certification of Production for authenticity of content. For eg. in Singapore any document that is digitized or is digitally created, and needs to be produced in court, needs to be certified under the Evidence Act of Singapore.

In the US documents that are produced need to be compliant to the Section 508 for making the content available for the physical impaired people.

My question, is it that difficult to comply to standards. Today ePUB is sweeping everyone of their feet, however it is a standard by itself.

So many people talking about ePUB and the readers on which they can be read on, not one note have I seen about making it available to the physically impaired people. I may be wrong as I cannot read or see the billions of notes that are being made all over the world, but certainly I am not seeing them in the main stream.

For me all standards compliance is very important. Today there are standards that have been made flexible for certain sectors, people should be taking advantage of this and should increase the number of audience/readers, to their works.

Standards are the way to go.... for me atleast.

Saturday, June 26, 2010

ePUB Packaging

Umpteen times I have seen, heard or read, about how to package an ePUB.

A Wikipedia link actually details about the ePUB format as well as the packaging:

However what it does not explain is the answer to the question: I selected all my files and created a zip out of it, renamed it as .epub but still it doesnt work?

Here is the structure of the ePUB package (right out from the Wikipedia article).

--ZIP Container--
If you have created a zip and it has all the above components and still it does not display as an ePUB file. These are the things that probably have gone wrong.

1) You have created a zip of the folder containing the files. If this is the case get inside the folder, select all files and create the zip.
2) You have done the right thing with creating the zip file, yet it does not display the ePUB file, perform this action,
  • Ctrl + A to select all
  • Click once on mimetype to de-select it
  • Click mimetype to select it
  • then create the zip
Whenever you open the zip file using Winzip or Winrar the mimetype must be the first file. And yes don't zip the folder zip the contents of the folder (after making sure all the necessary components, ie files and the content within the file is as per the standard)


Saturday, January 9, 2010

PDF - Yesterday, Today and Tomorrow

Back again... Seriously did not get time but now that I have time I would want to present my ideas or thoughts on PDF.

Currently with so many electronic formats for content available around the Globe, PDF has been compared to all the electronic content formats from since 1990s. Many people have posted the view that PDF is not a format that is sustainable in the long run. Since the very start PDF has been compared with the likes Softbook, OeB and currently with ePUB. People fail to realise that is not just an eBook.

Currently people fail to realize that there is no other format that can replicate the paper version of any content as PDF can. PDF is a not just a format, it is a standard by itself. I would rather not compare it with other eFormats available for electronic medium distribution.

PDF for me is a universal formats. I have been asked numerous times on what types of PDFs are available or can be created.

For me there are two major types of PDF files in terms of structure further divided in to sub types:
  1. Scan PDF
  2. Text PDF
Scan PDF or Bitmap PDF: This is where the PDF is created out of image files. These good be scanned images of paper or digitally photographed content. In this PDF the content on the PDF pages is non searchable. This type is further sub divided into
  • Printable (POD PDF)
    This PDF is used to create Print Version of the content that is to a particular standard and used majorly for commercial production of content. The images used in this PDF are high resolution for good quality reproduction on paper
  • Non-Printable Scan PDF
    This PDF is generally used only to retain the digital copy of the content in the image format. This format cannot be used for commercial printing as the quality of reproduction on paper is not as good. However this can be used for quality printing which will not be commercially saleable.
  • On-Line Scan PDF
    This PDF is for typical created for online viewing or easy downloads from the internet. These PDF files have a very low resolution images which is supported on PC's and other devices for easy and quick rendition. The sole purpose of this PDF is for making content available online or in a format that is easily distributable, but not printable.

Text PDF: In this PDF file the text is searchable, however images still are not searchable. The content is highly structured or styled, however this does not stop the user from creating the Text PDF from unstructured text.
  • Printable (POD PDF and Traditional Print)
    This PDF is used for commercially viable printing. Hence this PDF is generally used for creation of Books, Magazines, Journals, Newspapers etc. which can be distributed in the print format. The content structure in this PDF is highly structured and styled. Mainly to appeal to the reader and make it look good
  • Non-Printable Text PDF
    This PDF cannot be used for commercial printing however is very good for distribution of content with searchable content. In this PDF the structure or the style used in the PDF does not carry much importance as the reach of this PDF is very low.
  • On-Line Text PDF
    This PDF is for typical created for online viewing or easy downloads from the internet. These PDF files have a very low resolution images which is supported on PC's and other devices for easy and quick rendition. The sole purpose of this PDF is for making content available online or in a format that is easily distributable, but not printable. In this PDF the struture and style may have relatively high value as these can be the online versions of commercially printed content search
    Books, Magazines, Journals, Newspapers etc.
There is another format of the PDF, that has a very different purpose, the Text Under/Over Image PDF. In this PDF the main content is rendered or captured as pages created of the images, however there is a Text layer that is introduced either under or over the Image of the page. The purpose of this to retain the structure or style of the content as is as well as make it searchable. There is a major reason I see as to why this format or version of the PDF is created. The structure or the style of the original content needs to be retained, but the underlying reason is creating a replica of the content structure using Text PDF creation methods is highly expensive in comparison to creation of Scan PDF with a layer of text under or over it.

Secure PDF
Content security is carries utmost importance. Even free to distribute content carries rights. Secure PDF can be created while creation of the PDF itself or using third party DRM servers. Each level of security has its own purpose and carries certain amount of importance.

PDF as Archive Standard
PDF/A is a standard which defines the requirements of creation of PDF file format for long term archiving. There are two sub standards:
  • PDF/A-1a - Level A compliance
  • PDF/A-1b - Level B compliance
More and detailed information about these standards