Page 1 Page 2
.. Paragraph..%
.. ..text. %
.. ..
.. ..
!Heading
When instead it should be moved to the top of the next page: Page 1 Page 2
.. Heading !
.. Paragraph.. %
.. ..text. %
.. ..
..
Rather than being honest about needing one line… Heading !
Paragraph.. %
..text. %
…the heading could instead claim it needs three lines, which would ensure it would never be orphaned: Heading !
!
!
Paragraph.. %
..text. %
But now you have a big gap below the heading.If you could then shift the paragraph up from where it should be in the flow such that the vertical space of the heading and paragraph overlapped…
Heading !
Paragraph.. !%
..text. !%
…then you’d get a heading that would never be orphaned on one line, but which looked as it if only used one line. .heading { margin-bottom: 30px; }
.paragraph { margin-top: -30px; }
Also:> the heading could instead claim it needs three lines, which would ensure it would never be orphaned
It wouldn't prevent being orphaned at this point in your idea. What if there was room for 3 lines at the end of the first page?
h2 { padding-bottom: 3em; }
h2 + p { margin-top: -3em; }
The problem is CSS doesn't support the lh unit yet so what's 3em here should actually be 3 times the height of a line, taking into account the font-family, the font-size, etc. Except maybe if line-height is explicitly set.the problem is that ebooks css does not support what you expect a slightly out of date browser to support, and almost each ebook renderer has its own problems.
Think html email sized problems.
There should be some sort of acid test as well for email readers, of course.
But, IMO, all of them should be limited to avoid JS use (think of how well some sites are capable of rendering without JS [0])
https://jsfiddle.net/bqjsu98o/
However the next problem is how do you tell the print renderer that paragraphs must have a minimum of 4 lines of text?
(Edit, hah! https://developer.mozilla.org/en-US/docs/Web/CSS/orphans )
https://ebooks.stackexchange.com/questions/7014/how-can-i-pr...
—
¹ As in, it's unavoidable if you've implemented layout in such a way.
On the positive side, on proper clients it will approximately do nothing.
The problem is changes in aspect ratio or wanting to reflow text to have larger font when rendering to small screen.
Not sure if it's doable to generate flexible enough rendering commands in PDF. Was doable in PostScript (again, FSVO) but PDF was explicitly made to be decidable and less turing complete.
A given, created, PDF, is fixed to whatever size it was created to represent.
Acrobat claims to be able to "reflow" some PDF's, but whatever it is doing is more black magic voodoo than anything else, given how text layout is represented inside the PDF format.
The text layout becomes (essentially) a stream of instructions to "move current point to 123,456" and "place this text at current point". And the numerical values used are tied directly to the page size and internal "point size" used by the PDF writing software. I.e, the text is simply "physically positioned" on a virtual x,y canvas.
Kindle, the reading device with by far the largest market share, is basically the IE6 of ereaders - too big to ignore, and at the same time dragging down the entire ebook ecosystem with its crappy renderer. Amazon has shown little interest in improving it for over a decade now, while simultaneously fragmenting its own ecosystem with a variety of different proprietary formats that support different CSS and features.
ADE, while less common in new devices, is still very common in much older devices - B&N's eink Nooks were based on ADE at least as late as a few years ago. (Perhaps they still are?) ADE is closer to IE5 in terms of CSS support!
At Standard Ebooks we're often hamstrung in our attempts to make beautiful ebooks by these big players refusing to improve their renderers. We're forced to dumb down our CSS and use outdated techniques (like occasionally having to use tables for layout!) because ebook renderers are so bad.
iBooks is the top tier renderer, because as far as I can tell it's basically a wrapper for an up-to-date Webkit; next is Kobo - also Webkit-based - along with other Webkit-based indie apps. The rest of the big players are far, far, far distant.
<html>
<head><title>
Can I Use ...?
</title></head>
<body>
No.
</body>
</html>
Which was great when it came out, but is comparable to HTML 2.0 era.
https://www.amazon.com/Ebook-Formatting-KF8-Mobi-EPUB-ebook/...
Kerning? Yes. Ligatures? Of course. (Adjustable) hyphenation? Absolutely. Line breaking that's more complex than the first-fit that web browsers use? Well, yes, I think they may have borrowed it from InDesign.
And then it's let down by it's awful CSS support. No `font-variant: small-caps` for you. And your CSS had better be valid, or it will be completely ignored - that includes `!important` by the way.
Adobe essentially abandoned the RMSDK renderer, which is a real shame, because with better CSS support, it would still kick-ass.
As an aside on Kobo ereaders, they use RMSDK to open standard EPUB ebooks. Only their "Kobo EPUB" format is rendered using a webkit based engine.
Leave the pocketbook size ereaders for the pocketbooks.
When I wrote my PhD thesis, I made some effort to follow good typography practices (I thought that since I had devoted so many hours to the work, it would make sense to devote some time to make it look as good as possible... and I guess it was also a way of procrastinating from writing content) :)
Various people who "don't pay attention" told me that it caught their attention for being really nice visually, even if they couldn't pinpoint the concrete differences that led to that.
Layouting is an art and a craft, and the fact that it's automated by people who lack the specialized knowledge, or for whom it is not a priority (quarter century old bug reports, really?) suggests that in 2025, you should still avoid ebooks if you care about quality and aesthetics.
This is a shame because e-ink is just becoming usable. Anyhow, long live the paper book!
- It almost impossible to align the baseline of the formula to the baseline of the surrounding text.
- Often, images are only used for "complex" formulae, while simple ones are implemented using normal typesetting. This resolves the baseline issue for simple formulae, but now the fonts between simple and complex formulae don't match. (This requires extra concentration for the reader, as in other contexts, different font styles are frequently used meaningfully.)
- The images often have sub-par resolution.
A fixed pixel size image of something you want to read does not go well together with rendered text at all. It's okay for photos, but very definitely not for formulas, which are basically mostly text.
I'm not even talking about the aesthetics, different fonts because they too can be set by the user, and layout, since what's inside the image is fixed and untouchable by the renderer that handles all the rest of the text.
Now I'm just waiting for the inevitable "AI image scaler" that handles text inside images.
I'm surprised this isn't a thing already, as it seems doable with what people called "AI" 20 years ago. I mean, unless some unusual/non-default font was used, upscaling text on an image should be almost trivial. Ligatures notwithstanding, "printed letters" have a fixed shape, so:
1. Identify the typeface, size, weight, etc. by looking at the pixels of the text;
2. OCR the text (which should be 100% reliable);
3. Blank out the original text pixels; re-render the content (from step 2.) at a larger size (using parameters from step 1.).
I'm hedging here; it feels to me that OCR-ing normal text that never left the digital realm should be 100% reliable, but I'm not a specialist in that subfield so I surely must be missing something...
Word Lens was demonstrated in late 2010.
A string set in a given font at a given size won't always render as a fixed pattern of pixels. The font describes the curves of the letter forms and how that's rasterized depends on lots of factors such as the zoom level, exactly how the font rendering engine is implemented, whether or not anti-aliasing is turned on which is further complicated by the fact that the text can be set in any color with any other color as a background, etc. And there are a LOT of fonts.
Lastly, OCR is not just about recognizing letter shapes but has to contend with how the text flows. It has to understand line-breaks, multi-column layouts, captions, pulled quotes, page-numbers, hyphenation and all the other weird shit that we make text do.
If no vector formats are supported, use PNG, or another "lossless" format not JPEG. JPEG's compression is designed for photos where the probability of 2 neighbouring pixels being the same is tiny. Note that PNG doesn't have to be lossless - if you want to shrink the file size you can reduce the resolution or the colour space.
Even GIF is a much better choice than JPEG for a diagram, mathematical formula or logo with hard edges and a small number of colours. SVG is usually the right choice, (but don't do what one designer did for me and embed a JPEG in an SVG instead of giving me an SVG direct from Illustrator or Inkscape).
[0] http://koreader.rocks/ [1] https://github.com/notmarek/LanguageBreak
It'll keep updating itself as long as it's powered on, even if you haven't used it in months and there's no telling how long it'll take for current firmware versions to be supported, latest jailbroken version is 17 months old.
https://kindlemodding.gitbook.io/kindlemodding/getting-start...
Simple math, maybe. But, for anything complex, any other symbol would require completely different ways of being expressed, if your aim is to make it more readable for newcomers that is.
I imagine blackboards and chalk will be used in advanced mathematics for a few centuries yet.
Fortunately, some folks did work to preserve the craft and beauty of books --- Dr. Donald Knuth taking a decade off from writing _The Art of Computer Programming_ to create TeX (though initially he thought he'd do it over a sabbatical) is one shining example.
Robert Bringhurst's authoring _The Elements of Typographic Style_ also made a huge difference (I've lost count of how many copies I've given as gifts to folks).
A further issue is that doing a good page layout over an entire chapter (or book if the pagination is continuous) is an NP-hard problem --- I've had a chapter come out correctly on a first pass exactly once in my career (fastest 40 minutes of my life). The usual work-flow is something like:
- check all characters to ensure that hyphens are properly set, en and em dashes replace them where appropriate, and correct the setting of any instances of what should be special characters such as prime or double primes
- assign all formatting and ensure that all heads and paragraphs have settings which will forbid widows/orphans and verify that the callouts for all figures/photos/tables are correct
- review the entire chapter from beginning to end, page by page, verifying that each ends as it should at the bottom of the page, and that a referenced element shows on that page spread
- for instances where things don't work out, check to see which paragraphs can be adjusted to run longer or shorter by one or more lines, adjusting this until one finds a set of adjustments which results in a proper appearance for the page/spread --- repeat for all future pages --- if a particular spread/figure placement is a problem, back up and see if changing previous pages will fix it --- check the last page to ensure that it is full enough, if not, adjust previous spread, if that doesn't work, see if running the entire chapter long or short by a line will fix it.
- review the entire chapter again to ensure that there are no bad breaks or stacks, add discretionary hyphens or non-breaking spaces or adjust paragraph settings as necessary, ensuring that pages still base-align
If someone wants to write an ePub reader or page formatter which can do that, I'd be glad to see it.
1. Have a background image that looks like paper. No, a solid white or tan background doesn't look like paper. Paper has imperfections in it, dirt, a grain, and you can faintly see the other side of the page. Except for the latter, this is easy to achieve. Simply scan a bunch of blank pages, and use the scan for backgrounds!
2. No, having a background with a fake coffee stain ring doesn't work, because it's the same on every page. You need a few dozen pages, each with a different stain.
3. A printed page is imperfect. The letters can be uneven and blotchy. No, don't have a blotchy font. Have maybe 20 slightly different versions of the same font, and randomly select a glyph from one of them.
4. Books open to two pages. Not one. Two. The ereader should show two pages side by side, like a book.
5. Book fonts tend to look better than ereader fonts, though I cannot explain why.
But most of all, the sterile perfection of the ereader is like a drummer who is too perfect. Introduce error in it, it makes the music easier on the ears, and the books more pleasant to read.
Modern paper books have much more precise printing than books printed 20, 40, 60 years ago. I find the irregularity of the older ones rather charming, the new ones are also sterile - being too perfect.
I suppose it is like my indifference to autotuned singers. Too perfect. Doesn't sound human.
P.S. have you ever visited a medieval village in Europe? I find them wonderful, as there are no straight lines anywhere. It's all pleasingly crooked. I considered building my house slightly crooked, but that turned out to be far too expensive as everything would have to be custom made.
That's the efficiency of standardisation at work...
I get that this isn't an easy problem to solve, with many different screen sizes, resolutions, and zoom levels. But the status quo is awful. I refuse to buy tech books in ebook format. Anything with any kind of diagrams (let alone actual pictures) is an automatic no because I know it'll suck.
My Kobo is pretty good for reading sci-fi tomes. Fantasy isn't as fun, as they often come with maps and the maps always render poorly.
There really should be a better eBook specific authoring system. Ideally with importing from the print document, as that's the canonical version. It's just a website, after all, and we have great tools for building those.
details here: http://theroadchoseme.com/how-i-self-published-a-professiona...
Perhaps authors could also produce PDFs designed for common tablets as well, and therefore get the exact expected format.
I do agree with the author that paged format are difficult with browsers to this day, and I also hope this can improve.
I'll take a slightly-less-great ePub over a PDF I have to scroll around or use terrible reflow heuristics via some reader any day.
I would have thought that a PDF of a book would normally be made in the size of the physical book, which could be A4, but usually isn't (at least not when I look at my bookshelves).
(Also I feel that by default, non-book PDFs tend to show up in the US "Letter" size, which looks deceptively similar to A4, until you try to print it.)
Also: What did people screw up with HTML in your opinion?
The problem with PDFs is that you need to create multiple layouts to make them look good in print and on a variety of commonly used screen sizes; all those layouts is extra work. HTML, by its very nature, doesn't have this problem, and yet somehow today we still have to design multiple layouts to support print and common screen sizes. And in practice, we usually don't - instead, we design one layout optimized for mobile phones, and ignore how bad lit looks on desktop or in print. "Responsive web design" turned into forcing HTML to behave like a PDF, except using "iPhone" instead of "A4" as the size.
As for responsive HTML, it's the responsibility of the designer to make it work if he/she is worth their salt. Like you say, HTML without CSS is already responsive. If businesses understood that there are a big segment of customers who will always use their computer and never their phone when it's time to make a purchase, perhaps they'd be better at it.
I hate to break it to you but, https://blog.developer.adobe.com/adobe-sensei-makes-responsi...
Edit: Also, are there any advantages with large papers like A4/Letter for physical prints, except that you can fit more on a single page?
You must be from North America. In the rest of the world, it’s always A4. I encounter A4 PDFs fairly often, but don’t know how long it would be since I encountered Letter, but easily years.
Sometimes one can use a landscape-oriented display to avoid horizontal scrolling, but even if the same word count fits on the screen I seem to be annoyed by the low line count.
Providing large type and huge type PDFs would not entirely solve this problem as sometimes even one with poor eyesight might prefer a smaller font for scanning/skimming. Having to acquire two PDFs and switch between them based on mode of use seems suboptimal.
Fixed paged presentation has significant advantages for familiar reference material; some people seem to have a spatial memory that makes finding specific content by flipping pages faster than trying several search phrases (with the occasional benefit of serendipity).
Poetry often benefits from not reflowing lines and page breaking within a stanza is often more jarring than within a paragraph of prose. Yet a reader might prefer inferior typography over having to use a magnifying glass or carry a very large display.
One might be able to get some of the advantages of paged media for figures and tables by having header (or footer) pop-up links to such content when it is on the same "page" as the displayed text. This is not as low effort as moving one's eyes, but it might be better than inlined presentation on a small (relative to font size) display.
Even with print, there would be times when breaking the text to fit a figure is more disruptive than having the figure on a separate page. There would also be times when all the relevant figures would not fit on the same page as the related text.
Having a separate booklet of illustrations might make going back and forth between text and illustration easier, similar to having a lexicon or commentary open while reading. However, that also introduces position tracking in another book and other inconveniences.
Even when my vision was better, reading academic papers distributed as PDFs (usually 2-column) on a computer screen was less enjoyable than reading similar material in a reflowable format. Academic papers also do not seem to benefit as much from pagination as other writings.
I work for a children's reading platform, and the book publishers universally send us PDFs of everything. We have a bespoke system to convert the PDFs into SVG for higher flexibility and added interactivity.
Literal Kids board books are getting a better treatment!
I sometimes read interactive fiction, and even a format as intentionally simple as Twine supports variables so content can be dynamic. There's no reason ebooks couldn't do things like that, but they are so far behind that they haven't even caught up with where print publishing was 50 years ago yet.
Mistake is largely not in the specs, but in the lack of support for them. Page breaking controls, weirdly breaking tables, lack of access to area outside the page box to influence headers/footers without weird hacks etc. etc. For printing, the 1990ies never ended.
This leads to the bizarre situation where basically everyone who has semi complex printing needs in web applications will create PDF and then print that - and for creating those PDFs, often HTML to PDF conversion is used, just with actually implemented CSS for paged media. Which again proves that the spec is at least 99% there, if somebody would just kindly implement it in a browser, too.
Won't be more complex than having the latest WebGL whatever thing in your browser engine ;-)
In the end I exported the document to libreoffice, and got something way more usable in a few hours just by editing the styles than whatever I was able to do in days of fiddling with html+browser.
iBooks on apple might get a pass as it doesn't need to paginate, but truth be told it seems that epub/ebooks and ereaders in general are being targeted at novels and romance, where form factor, typesetting and formatting doesn't matter that much.
I have access to ebooks through my local library and there's no way I would use, let alone buy, any technical ebook.
Not to mention, I've seen a steady average decline in the quality of printed media in general over the last ~15 years. A lot less attention is put in the typesetting and layout. Even the print quality itself is lower, which I think is due to the smaller and cheaper print runs being done now also for more popular titles.
I am a fan of the old mass market paperbacks. These had a reputation of being low quality books back in the day because they are cheap and not super-durable but I think they are high quality from a Deming point of view because they are made by a process that is highly repeatable. Circa 2000 I thought my 1970s paperbacks were in great shape, but 2010 they were seriously yellowing.
I just looked at my bookshelf and found a '59 James Blish anthology that I bought for 50 cents maybe ten years ago, it is in "poor" condition and will probably crack if I read it without taking great care. Next to that I found a copy of Galbraith's The Affluent Society from 1958 which is perfectly usable except I'd be worried about the cover coming off. A Frank Herbert book from '68 is stained but in great shape other than the cover also being at risk. A '74 Herbert book is a touch discolored but has no problems at all.
(My collection includes not just science fiction of that era but also both self-help and serious books on psychology as well as books about science, politics, social sciences, etc. Government reports about inflation or race relations would be published as mass market paperbacks. You could get Plato and Sartre and Freud and the rest of the Western literary heavyweights)
The construction, materials, process, and such were repeatable enough that they even fail consistently. Not permanent, but 50 years is not bad. The right size to go in a purse or side pocket of a backpack (e.g. part of the loadout of a bibliomaniac who has 12 books in his backpack) I've got to find a good way to reinforce the cover (adhesive tape?)
Those are no longer produced, today it is trade paperbacks. There is wide variation in the dimension, construction, materials and processes for these. You sometimes find a trade paperback that is beautiful, strongly constructed and printed on acid free paper. Others you pay $50 for and the binding breaks the first time you lay the book open on the table.
Don't - it yellows too.
I understand that printers are Satan incarnate and runs on concentrated sins of cost cutting engineers, and nobody has time to read that 1.5 page article someone wrote by giving proper effort, but scientific articles, books, nice blog posts we want to delve in, etc. are real and regardless on the substance they run on, printers are real things and they are used nevertheless.
The tendency to assume that everyone is running on high end laptops and cutting edge, network connected tablets is making me angry sometimes. Implementing features like this will make many programmers' life easier who need to be able to generate and print reports from their web applications, too.
It's not only about books and shrewd people who want to print a blog post and study/read that on a table or wherever.
> This was actually remarkably difficult to get right: the trick is setting line-height and height to < 1em. If they were, eg, 7rem to match the font size then FireFox and Safari render very differently, with Safari showing a much taller gap in the text that the dropcap was centered in.
The same CSS (https://daveon.design/manuscript-vintagedave.css) applies typographical sentence spacing to sentences within a paragraph, which you can see on any article, eg: https://daveon.design/what-are-you-optimising-for.html
I would love to see more CSS support for _typography_: not letter spacing, but actual typographical layout. The article is dead-on accurate when it says we still can't reach what was normal in layout for six centuries.
Apple Books doesn't treat these like pages that display only after you tap/click a link though. They render them as a sort of popup above the regular page, styled differently and even with a different page width, as some sort of endnote-not-at-the-end functionality. If that non-linear page also has a link to tap/click on, too bad. Nothing in the spec even slightly suggests their interpretation of it, and the longer they make it the norm, the more impossible it will become to implement the correct functionality because other book titles will have become dependent on it.
Sad to see how little we've moved on in twelve years.
On a semi-related note, Typst had similar issues but the devs are actively working on fixing issues like this.
My hot(?) take as a reader: I want none of this.
Just give me proper semantics and let my reader (while respecting my personal preferences) figure out presentation. I'm quite tired of publishers thinking that they somehow need to "preserve the unique aesthetics" of a paper-based medium, despite that usually working out quite poorly.
Fonts that look great on paper don't necessarily on my eInk, LCD, or OLED display I'm reading the book on, for one thing. Margings/padding are usually a world of pain, and one of the first things I end up doing when opening a new ePub is often to disable all publisher formatting.
While I love the idea of an open format and really don't like the idea of being stuck in Amazon's walled garden, at least they figured that part out in their Kindle ecosystem from the beginning. With ePub, it's been hit and miss.