Pagination widows, or, why I'm embarrassed about my eBook (2023)

217 points by OuterVale 3 days ago | 132 comments

gorgoiler 2 days ago |
In the page model, a heading says it needs only one line of vertical space, so if there’s a tiny bit of space at the bottom of the page it’ll get orphaned. (Vertical box space shown as ! and % for the heading and paragraph, respectively.)
Page 1 Page 2 .. Paragraph..% .. ..text. % .. .. .. .. !Heading
When instead it should be moved to the top of the next page:
Page 1 Page 2 .. Heading ! .. Paragraph.. % .. ..text. % .. .. ..
Rather than being honest about needing one line…
Heading ! Paragraph.. % ..text. %
…the heading could instead claim it needs three lines, which would ensure it would never be orphaned:
Heading ! ! ! Paragraph.. % ..text. %
But now you have a big gap below the heading.
If you could then shift the paragraph up from where it should be in the flow such that the vertical space of the heading and paragraph overlapped…
Heading ! Paragraph.. !% ..text. !%
…then you’d get a heading that would never be orphaned on one line, but which looked as it if only used one line.
Izkata 2 days ago |
I don't know if it would fix splitting across pages, but that sounds like negative margins. They've been in CSS forever. The basic idea:
.heading { margin-bottom: 30px; } .paragraph { margin-top: -30px; }
Also:
> the heading could instead claim it needs three lines, which would ensure it would never be orphaned
It wouldn't prevent being orphaned at this point in your idea. What if there was room for 3 lines at the end of the first page?
gorgoiler 2 days ago |
If the bottom of a page has room for a heading and 3 lines of text (or some number of lines that looks appealing) then that’s enough space for the heading to not be considered orphaned.
Izkata 2 days ago |
But without moving the paragraph up, there wouldn't be text there, just a big empty area.
dotancohen 2 days ago |
As the author of the fine article states, that is preferable to the orphaned heading.
Izkata 2 days ago |
The article says it's better to have blank space instead of the header, not next to the header.
p4bl0 2 days ago |
I'm not sure the proposed approach would work, in particular on ebook readers. But if if does, I believe margins may not be the right way to do it. Padding might be however, because it is actually inside the box (while margins of elements may be "blended" inside the margins of their parents).
h2 { padding-bottom: 3em; } h2 + p { margin-top: -3em; }
The problem is CSS doesn't support the lh unit yet so what's 3em here should actually be 3 times the height of a line, taking into account the font-family, the font-size, etc. Except maybe if line-height is explicitly set.
bryanrasmussen 2 days ago |
>The problem is CSS doesn't support the lh unit yet so
the problem is that ebooks css does not support what you expect a slightly out of date browser to support, and almost each ebook renderer has its own problems.
Think html email sized problems.
benmanns 2 days ago |
Email rendering is a great analogy.
rolandog 2 days ago |
I wonder if there's some sort of acid tests for eBook readers.
There should be some sort of acid test as well for email readers, of course.
But, IMO, all of them should be limited to avoid JS use (think of how well some sites are capable of rendering without JS [0])
[0]: https://git-send-email.io/
gorgoiler 2 days ago |
That worked quite nicely:
https://jsfiddle.net/bqjsu98o/
However the next problem is how do you tell the print renderer that paragraphs must have a minimum of 4 lines of text?
(Edit, hah! https://developer.mozilla.org/en-US/docs/Web/CSS/orphans )
wbl 2 days ago |
Don't they send the copy editor to kill all the widows and orphans anymore?
fragmede 2 days ago |
Pragmatism wins out of waiting for css properties to get implemented, and div display inline block works today in epubs and doesn't need to be backported to iBooks.
https://ebooks.stackexchange.com/questions/7014/how-can-i-pr...
chrismorgan 2 days ago |
That's not pragmatism: that's poor judgement. That's a hack that shouldn't work at all; and if it does, it must¹ cause other problems, such as preventing breaks inside the entire first paragraph, or breaking graphically rather than linewise (that is, you could get the top half of one line on one page, and the rest on the next).
—
¹ As in, it's unavoidable if you've implemented layout in such a way.
fragmede 2 days ago |
What's the name of this site again?
justinator 2 days ago |
VC-Influenced-News just doesn't have the same ring to it.
chrismorgan 2 days ago |
There are hacks and there are hacks. This is a bad hack, one that might fix a problem, but only by causing a worse problem.
On the positive side, on proper clients it will approximately do nothing.
mjevans 2 days ago |
I thought Div as well, but as a BLOCK element containing the Heading line AND the first paragraph to ensure they're rendered on the same page, or at least start on one and then overflow.
cratermoon 2 days ago |
Maybe the reason we're still stuck with LaTeX and PDFs because ebook software can't be bothered to implement decent typesetting.
HKH2 2 days ago |
For printing, yes.
pseingatl 2 days ago |
PDF's are size agnostic, by the way. We think of them as being restricted to A4 or letter paper, but you can generate a pdf of almost any size.
willvarfar 2 days ago |
Do you mean they can be created for any page size, or do you mean they sensibly resize when you change page size?
lou1306 2 days ago |
He (probably) means that the geometry of a PDF "page" can be customized. You can even have different sizes within the same document. But most people using LaTeX or even basic plotting utilities which export to PDF know this.
p_l 2 days ago |
A properly prepared PDF will scale "perfectly" (FSVO - needs all drawings in vector form) as long as you keep the aspect ratio the same.
The problem is changes in aspect ratio or wanting to reflow text to have larger font when rendering to small screen.
Not sure if it's doable to generate flexible enough rendering commands in PDF. Was doable in PostScript (again, FSVO) but PDF was explicitly made to be decidable and less turing complete.
pwg 2 days ago |
The PDF specification is size agnostic.
A given, created, PDF, is fixed to whatever size it was created to represent.
Acrobat claims to be able to "reflow" some PDF's, but whatever it is doing is more black magic voodoo than anything else, given how text layout is represented inside the PDF format.
The text layout becomes (essentially) a stream of instructions to "move current point to 123,456" and "place this text at current point". And the numerical values used are tied directly to the page size and internal "point size" used by the PDF writing software. I.e, the text is simply "physically positioned" on a virtual x,y canvas.
cratermoon a day ago |
There is a practical limit. "Page dimensions are not limited by the format itself. However, Adobe Acrobat imposes a limit of 15 million by 15 million inches, or 225 trillion in2 (145,161 km2)." <https://en.wikipedia.org/wiki/PDF>
acabal 2 days ago |
If you think it's bad that `break-*` isn't supported in Firefox or Chrome, wait till you see what your ebook looks like in Kindle, or worse, ADE-based readers, of which there are still many in use!
Kindle, the reading device with by far the largest market share, is basically the IE6 of ereaders - too big to ignore, and at the same time dragging down the entire ebook ecosystem with its crappy renderer. Amazon has shown little interest in improving it for over a decade now, while simultaneously fragmenting its own ecosystem with a variety of different proprietary formats that support different CSS and features.
ADE, while less common in new devices, is still very common in much older devices - B&N's eink Nooks were based on ADE at least as late as a few years ago. (Perhaps they still are?) ADE is closer to IE5 in terms of CSS support!
At Standard Ebooks we're often hamstrung in our attempts to make beautiful ebooks by these big players refusing to improve their renderers. We're forced to dumb down our CSS and use outdated techniques (like occasionally having to use tables for layout!) because ebook renderers are so bad.
iBooks is the top tier renderer, because as far as I can tell it's basically a wrapper for an up-to-date Webkit; next is Kobo - also Webkit-based - along with other Webkit-based indie apps. The rest of the big players are far, far, far distant.
Uvix 2 days ago |
I don't rate Kobo's renderer very highly. When the user turns on justifying body text, the device ends up justifying everything, including headings, and it just looks awkward.
omnimus 2 days ago |
You are not rating the renderer but ux decisions made by the Kobo reader devs.
lidavidm 2 days ago |
Neat, I didn't realize Kobo was WebKit-based, but given they offer a full browser (experimentally) on their readers, that makes sense. They also support some 'nice' ePub 3 features: fixed layouts so that comic books and manga work properly; right-to-left pagination for Japanese books, and vertical layout (also for Japanese books). Though, I feel like their page numbering gets mixed up with vertical layout (sometimes opening and reopening the book changes the # of pages, and I lose my spot...)
n_plus_1_acc 2 days ago |
Is there a caniuse for ebook readers?
akdor1154 2 days ago |
You could write one pretty easily:
<html> <head><title> Can I Use ...? </title></head> <body> No. </body> </html>
p_l 2 days ago |
As long as you want wide support on Kindles, you're pretty much stuck on Mobi.
Which was great when it came out, but is comparable to HTML 2.0 era.
__mharrison__ 2 days ago |
I wrote a book a while back that for every feature has a "chapter" using the native rendering and a second version of the chapter with an improved CSS applied.
https://www.amazon.com/Ebook-Formatting-KF8-Mobi-EPUB-ebook/...
shermp 2 days ago |
The most frustrating thing is that in some areas ADE (also known as RMSDK) still has some of the best typography amongst ebook rendering engines.
Kerning? Yes. Ligatures? Of course. (Adjustable) hyphenation? Absolutely. Line breaking that's more complex than the first-fit that web browsers use? Well, yes, I think they may have borrowed it from InDesign.
And then it's let down by it's awful CSS support. No `font-variant: small-caps` for you. And your CSS had better be valid, or it will be completely ignored - that includes `!important` by the way.
Adobe essentially abandoned the RMSDK renderer, which is a real shame, because with better CSS support, it would still kick-ass.
As an aside on Kobo ereaders, they use RMSDK to open standard EPUB ebooks. Only their "Kobo EPUB" format is rendered using a webkit based engine.
WalterBright 2 days ago |
You can ship an ebook as a pdf file! Then no rendering problems.
robin_reala 2 days ago |
But no reflow.
WalterBright 2 days ago |
True, but if you're reading a book with equations in it, it is probably a textbook and textbooks are suited to large page, not pocketbook pages. You'd want a full size ereader anyway.
Leave the pocketbook size ereaders for the pocketbooks.
pseingatl 2 days ago |
Equations should render on A5 paper, you don't need A4/letter. And many here advocate for paper, 6x9 or 5x8 should handle equations easily.
pseingatl 2 days ago |
You can't, at least on Amazon. Amazon is the largest ebook market; no pdf's allowed. You could use Gumroad or sell from your own site, but these are not attractive alternatives.
WalterBright 2 days ago |
Then you can use jpgs in the ebook format! That's what I did for the hieroglyphs in a history book I put up on Amazon.
nephanth 2 days ago |
When author mentioned their book has "javascript-driven syntax highlighting", my first thought was "no way this works on e-readers"
userbinator 2 days ago |
The fact that it's a book about typography may mean the requirements are a little different, because I personally (and likely many others) don't really pay attention to such things.
philk10 2 days ago |
First half of my career was at a company writing pagination systems for books/magazines and so I can now never not notice such things as widows, orphans, kerning...
justsomehnguy 2 days ago |
Give a man a fish and you feed him for a day; teach a man to notice an improper keming and he would curse you for the eternity.
nextIt 2 days ago |
I see what you did there...
Al-Khwarizmi 2 days ago |
You might be paying more attention than you think.
When I wrote my PhD thesis, I made some effort to follow good typography practices (I thought that since I had devoted so many hours to the work, it would make sense to devote some time to make it look as good as possible... and I guess it was also a way of procrastinating from writing content) :)
Various people who "don't pay attention" told me that it caught their attention for being really nice visually, even if they couldn't pinpoint the concrete differences that led to that.
jll29 2 days ago |
Not to mention the ugly/unusable rendering of mathematical formulate in ebooks on my Kindle, which is gatherig dust.
Layouting is an art and a craft, and the fact that it's automated by people who lack the specialized knowledge, or for whom it is not a priority (quarter century old bug reports, really?) suggests that in 2025, you should still avoid ebooks if you care about quality and aesthetics.
This is a shame because e-ink is just becoming usable. Anyhow, long live the paper book!
WalterBright 2 days ago |
You could include the formulae as jpg's, not html.
tagawa 2 days ago |
The downside of that is screenreaders can't read them out.
pohuing 2 days ago |
Alt texts are a thing on epubs. I would hope Amazon's format can do them as well.
WalterBright 2 days ago |
Problem solved!
_Algernon_ 2 days ago |
Doesn't work well with theming.
Skeime 2 days ago |
I have not once seen a formulae-as-images solution that I would consider acceptable, aesthetically. Common problems are:
- It almost impossible to align the baseline of the formula to the baseline of the surrounding text.
- Often, images are only used for "complex" formulae, while simple ones are implemented using normal typesetting. This resolves the baseline issue for simple formulae, but now the fonts between simple and complex formulae don't match. (This requires extra concentration for the reader, as in other contexts, different font styles are frequently used meaningfully.)
- The images often have sub-par resolution.
spookie 2 days ago |
Furthermore, you would need to express them in text anyways for accessibility.
nosianu 2 days ago |
One advantage of ereaders is that the fonts an be set to any size convenient for device and the person reading it.
A fixed pixel size image of something you want to read does not go well together with rendered text at all. It's okay for photos, but very definitely not for formulas, which are basically mostly text.
I'm not even talking about the aesthetics, different fonts because they too can be set by the user, and layout, since what's inside the image is fixed and untouchable by the renderer that handles all the rest of the text.
WalterBright 2 days ago |
On my tablet I can use two fingers to zoom. But I pretty much never need to do that with a full size tablet. That's why I bought one with the retina display.
nosianu 2 days ago |
But zooming scales the fonts only. For pixel images you have the pixels that are in it and that's it. Scaling those either up or down does not produce good text.
Now I'm just waiting for the inevitable "AI image scaler" that handles text inside images.
TeMPOraL 2 days ago |
> Now I'm just waiting for the inevitable "AI image scaler" that handles text inside images.
I'm surprised this isn't a thing already, as it seems doable with what people called "AI" 20 years ago. I mean, unless some unusual/non-default font was used, upscaling text on an image should be almost trivial. Ligatures notwithstanding, "printed letters" have a fixed shape, so:
1. Identify the typeface, size, weight, etc. by looking at the pixels of the text;
2. OCR the text (which should be 100% reliable);
3. Blank out the original text pixels; re-render the content (from step 2.) at a larger size (using parameters from step 1.).
I'm hedging here; it feels to me that OCR-ing normal text that never left the digital realm should be 100% reliable, but I'm not a specialist in that subfield so I surely must be missing something...
ben_w 2 days ago |
It's definitely possible, as Google Translate (and before it, Word Lens) does exactly that.
Word Lens was demonstrated in late 2010.
staplung 2 days ago |
> I'm hedging here; it feels to me that OCR-ing normal text that never left the digital realm should be 100% reliable, but I'm not a specialist in that subfield so I surely must be missing something...
A string set in a given font at a given size won't always render as a fixed pattern of pixels. The font describes the curves of the letter forms and how that's rasterized depends on lots of factors such as the zoom level, exactly how the font rendering engine is implemented, whether or not anti-aliasing is turned on which is further complicated by the fact that the text can be set in any color with any other color as a background, etc. And there are a LOT of fonts.
Lastly, OCR is not just about recognizing letter shapes but has to contend with how the text flows. It has to understand line-breaks, multi-column layouts, captions, pulled quotes, page-numbers, hyphenation and all the other weird shit that we make text do.
auggierose 2 days ago |
That attitude leads to the shitty epubs we currently have. You either do a fixed-size PDF layout, or you have a proper dynamic solution. For technical/mathematical content, I am not interested in anything in between, given that PDF just works for me, and is easily achieved with tools today.
rjmunro 2 days ago |
JPEG is the absolute worst possible solution here. If MathML or similar is not supported, use an SVG or PDF so that it's zoomable and not made of pixels. It's also slightly readable by screen readers (although you probably want some sort of alt-text for those anyway).
If no vector formats are supported, use PNG, or another "lossless" format not JPEG. JPEG's compression is designed for photos where the probability of 2 neighbouring pixels being the same is tiny. Note that PNG doesn't have to be lossless - if you want to shrink the file size you can reduce the resolution or the colour space.
Even GIF is a much better choice than JPEG for a diagram, mathematical formula or logo with hard edges and a small number of colours. SVG is usually the right choice, (but don't do what one designer did for me and embed a JPEG in an SVG instead of giving me an SVG direct from Illustrator or Inkscape).
MilanTodorovic 2 days ago |
Have you tired KOreader[0]? It supports multiple ebook formats, including epub and cbz. You'll need to jailbreak[1] your Kindle though.
[0] http://koreader.rocks/ [1] https://github.com/notmarek/LanguageBreak
dns_snek 2 days ago |
For anyone who might want to jailbreak their Kindle in the future, you'll want to enable airplane mode otherwise it will automatically update its firmware (patching the jailbreak) and there's no way to disable that.
It'll keep updating itself as long as it's powered on, even if you haven't used it in months and there's no telling how long it'll take for current firmware versions to be supported, latest jailbroken version is 17 months old.
https://kindlemodding.gitbook.io/kindlemodding/getting-start...
amadeuspagel 2 days ago |
If this is is still a problem thirty years after the invention of the web, then I say: So much the worse for mathematical notation. In the future, mathematical ideas will be expressed in other ways.
spookie 2 days ago |
Mathematical notation, even when considering all its faults, won't be easy to replace.
Simple math, maybe. But, for anything complex, any other symbol would require completely different ways of being expressed, if your aim is to make it more readable for newcomers that is.
shiandow 2 days ago |
You severely underestimate mathematicians' ability to cling to older more convenient forms of expression.
I imagine blackboards and chalk will be used in advanced mathematics for a few centuries yet.
WillAdams 2 days ago |
Blame the (metal) compositor unions which back in the day bargained for sinecures where all their members were guaranteed perpetual employment rather than choosing to participate in the digital revolution.
Fortunately, some folks did work to preserve the craft and beauty of books --- Dr. Donald Knuth taking a decade off from writing _The Art of Computer Programming_ to create TeX (though initially he thought he'd do it over a sabbatical) is one shining example.
Robert Bringhurst's authoring _The Elements of Typographic Style_ also made a huge difference (I've lost count of how many copies I've given as gifts to folks).
A further issue is that doing a good page layout over an entire chapter (or book if the pagination is continuous) is an NP-hard problem --- I've had a chapter come out correctly on a first pass exactly once in my career (fastest 40 minutes of my life). The usual work-flow is something like:
- check all characters to ensure that hyphens are properly set, en and em dashes replace them where appropriate, and correct the setting of any instances of what should be special characters such as prime or double primes
- assign all formatting and ensure that all heads and paragraphs have settings which will forbid widows/orphans and verify that the callouts for all figures/photos/tables are correct
- review the entire chapter from beginning to end, page by page, verifying that each ends as it should at the bottom of the page, and that a referenced element shows on that page spread
- for instances where things don't work out, check to see which paragraphs can be adjusted to run longer or shorter by one or more lines, adjusting this until one finds a set of adjustments which results in a proper appearance for the page/spread --- repeat for all future pages --- if a particular spread/figure placement is a problem, back up and see if changing previous pages will fix it --- check the last page to ensure that it is full enough, if not, adjust previous spread, if that doesn't work, see if running the entire chapter long or short by a line will fix it.
- review the entire chapter again to ensure that there are no bad breaks or stacks, add discretionary hyphens or non-breaking spaces or adjust paragraph settings as necessary, ensuring that pages still base-align
If someone wants to write an ePub reader or page formatter which can do that, I'd be glad to see it.
fn-mote 2 days ago |
Fascinating, but as an ebook consumer my standards are quite a bit lower. I’m happy if the relevant figures are on the same page as the text (but that’s important), and the spacing is not absolutely awful.
WalterBright 2 days ago |
If I was designing an ebook reader, the display would look like a book's display. None of the ereaders I know of do this.
1. Have a background image that looks like paper. No, a solid white or tan background doesn't look like paper. Paper has imperfections in it, dirt, a grain, and you can faintly see the other side of the page. Except for the latter, this is easy to achieve. Simply scan a bunch of blank pages, and use the scan for backgrounds!
2. No, having a background with a fake coffee stain ring doesn't work, because it's the same on every page. You need a few dozen pages, each with a different stain.
3. A printed page is imperfect. The letters can be uneven and blotchy. No, don't have a blotchy font. Have maybe 20 slightly different versions of the same font, and randomly select a glyph from one of them.
4. Books open to two pages. Not one. Two. The ereader should show two pages side by side, like a book.
5. Book fonts tend to look better than ereader fonts, though I cannot explain why.
But most of all, the sterile perfection of the ereader is like a drummer who is too perfect. Introduce error in it, it makes the music easier on the ears, and the books more pleasant to read.
OscarCunningham 2 days ago |
Do all of that using a PRNG seeded for the book and the reader's ID. So your copy of the book is the same every time, but different from everybody else's.
hydrolox 2 days ago |
To be honest I don't know what ereader you've looked at or what books you read. The way eink works, if you look closely there is a slight grain very reminiscent of paper. Similarly, the font/letters can have small imperfections as well, and it all looks just like a new book might.
WalterBright 2 days ago |
I have probably 8 or so Amazon ereaders over the years, of different models. I also use the Apple ipad, and the pdf readers on many other devices. I've bought a couple ereaders from other companies, but they didn't "take" with me. My current favorite ereader is the retina ipad, though its battery life is short.
Modern paper books have much more precise printing than books printed 20, 40, 60 years ago. I find the irregularity of the older ones rather charming, the new ones are also sterile - being too perfect.
I suppose it is like my indifference to autotuned singers. Too perfect. Doesn't sound human.
P.S. have you ever visited a medieval village in Europe? I find them wonderful, as there are no straight lines anywhere. It's all pleasingly crooked. I considered building my house slightly crooked, but that turned out to be far too expensive as everything would have to be custom made.
Liftyee 2 days ago |
> I considered building my house slightly crooked, but that turned out to be far too expensive as everything would have to be custom made.
That's the efficiency of standardisation at work...
okasaki 2 days ago |
A lot of pirated ebooks are scans of physical books stored as page images in DJVU or PDF. I do agree they have a certain charm to them.
eleveriven 2 days ago |
Adding a touch of imperfection
elric 2 days ago |
I wouldn't be embarrassed about that particular ebook, it's probably the best looking one out there. The other 99% of ebooks however are atrocious.
I get that this isn't an easy problem to solve, with many different screen sizes, resolutions, and zoom levels. But the status quo is awful. I refuse to buy tech books in ebook format. Anything with any kind of diagrams (let alone actual pictures) is an automatic no because I know it'll suck.
My Kobo is pretty good for reading sci-fi tomes. Fantasy isn't as fun, as they often come with maps and the maps always render poorly.
nephanth 2 days ago |
Interestingly, this related bug had very recent activity: https://bugzilla.mozilla.org/show_bug.cgi?id=775617
benmanns 2 days ago |
Most eBooks are produced by creating a new InDesign ebook document from the existing InDesign print document. Then you fiddle with a bunch of stuff to make it look right (removing the forced line breaks and such you did to get the print document looking how you want it, etc). That's then exported to ePub, which is a zip of some HTML, CSS, images, and fonts. The code outputted is absolutely terrible, as one used to get creating webpages with WYSIWYG Dreamweaver et al. and causes a lot of issues that have to be fiddled with in InDesign, manually corrected in the unzipped export, or frequently just left in the final book.
There really should be a better eBook specific authoring system. Ideally with importing from the print document, as that's the canonical version. It's just a website, after all, and we have great tools for building those.
grecy 2 days ago |
I use pandoc to covert the source latex into an epub, then massage the result a little to tweak a few things. It’s all scripted and works extremely well.
details here: http://theroadchoseme.com/how-i-self-published-a-professiona...
zargon 2 days ago |
Another frustrating thing with ebooks is that you can't get them in PDF format any more. So much time is spent making a nicely fomatted hardcopy edition, then the ebook is only available as a terribly auto-converted epub that throws away all the layout and style. Particularly cookbooks, as well as anything technical, I just can't stand how lazy, ugly, and difficult to read the epubs are. All the tooling already exists to produce PDFs identical to the print version, but no, we can't have those.
jiehong 2 days ago |
I think it's a great point!
Perhaps authors could also produce PDFs designed for common tablets as well, and therefore get the exact expected format.
I do agree with the author that paged format are difficult with browsers to this day, and I also hope this can improve.
Tepix 2 days ago |
For me, it's the opposite. Whenever i have the choice, i want EPUB, not PDF. The problem with PDFs is that you don't have a device that is the same size as the original page size in most cases.
Ma8ee 2 days ago |
It depends on what kind of book it is. Fiction EPUB all day. Technical books with figures, tables or code listings, PDF!
eleveriven 2 days ago |
Exactly! EPUBs are great for novels or anything you just want to read line-by-line
lxgr 2 days ago |
They're really great for anything I want to read on my phone, including technical books.
I'll take a slightly-less-great ePub over a PDF I have to scroll around or use terrible reflow heuristics via some reader any day.
zargon 2 days ago |
Really, the point is, why can't we have both? I use and enjoy epubs extensively, but also in many contexts I strongly prefer PDF.
lxgr 2 days ago |
Getting a passable PDF from an ePub is probably significantly easier than the reverse, so I'm all for having both, yes! (And please don't charge me twice for the privilege, publishers.)
eleveriven a day ago |
Each format has its strengths
eleveriven a day ago |
PDFs on smaller screens - it’s like wrestling with the page just to read a single line
carlosjobim 2 days ago |
One question: Why do people make PDFs in A4 format? Wouldn't it make better sense to start making them in A5 or A6, so that they could be better read on e-readers, phones, and on part of a computer screen (which is landscape oriented)?
ileonichwiesz 2 days ago |
Surely the only difference between an A4 PDF and an A6 one would be text size?
carlosjobim 2 days ago |
Yes, exactly. That's what matters.
dagw 2 days ago |
Depends which parameter you choose to hold fixed. You could shrink your text and keep the layout and page count or keep your text size fixed and increase the page count. If people were doing layout for a fixed A5 or A6 size they will probably make many different choices compared to laying out for A4.
rjmunro 2 days ago |
I think the point is that if you are designing with that size in mind you may make different decisions about the column layout etc.
I would have thought that a PDF of a book would normally be made in the size of the physical book, which could be A4, but usually isn't (at least not when I look at my bookshelves).
Maken 2 days ago |
Text size decides everything: paragraph size, heading breaks, figure placement, etc.
SAI_Peregrinus 2 days ago |
Probably also column layout. 2-column documents are fine for A4 sizes, but terrible for A6 or most e-reader screen sizes. Scroll down, then up & across, then down, then across, then repeat. Versus just scroll down or just turn pages.
TeMPOraL 2 days ago |
Adapting PDFs for devices won't save us. HTML, being designed around reflow, had the ultimate solution from day one - and yet we've managed to screw that up so badly it spawned a whole industry sub-specialty of "responsive design". When authors start producing multiple PDF versions for different devices and print, how long until someone gets tired of "extra work" and comes up with "responsive PDFs"?
(Also I feel that by default, non-book PDFs tend to show up in the US "Letter" size, which looks deceptively similar to A4, until you try to print it.)
carlosjobim 2 days ago |
Like you mention, HTML already exists for adaptive text reflow. I assume that people making PDFs want their layouts fixed. But maybe an A5 format would make more sense, even if you're printing it?
Also: What did people screw up with HTML in your opinion?
TeMPOraL 2 days ago |
> Also: What did people screw up with HTML in your opinion?
The problem with PDFs is that you need to create multiple layouts to make them look good in print and on a variety of commonly used screen sizes; all those layouts is extra work. HTML, by its very nature, doesn't have this problem, and yet somehow today we still have to design multiple layouts to support print and common screen sizes. And in practice, we usually don't - instead, we design one layout optimized for mobile phones, and ignore how bad lit looks on desktop or in print. "Responsive web design" turned into forcing HTML to behave like a PDF, except using "iPhone" instead of "A4" as the size.
carlosjobim 2 days ago |
If you make your PDFs in A5, you can print two of them on an A4 paper and read the paper in landscape orientation. For the same reasons the size fits well for displaying on a computer screen and on a tablet/e-reader. It's still a bit too big to squeeze down to a cell phone, but at least better than A4/Letter size.
As for responsive HTML, it's the responsibility of the designer to make it work if he/she is worth their salt. Like you say, HTML without CSS is already responsive. If businesses understood that there are a big segment of customers who will always use their computer and never their phone when it's time to make a purchase, perhaps they'd be better at it.
grncdr 2 days ago |
> how long until someone gets tired of "extra work" and comes up with "responsive PDFs"?
I hate to break it to you but, https://blog.developer.adobe.com/adobe-sensei-makes-responsi...
SilasX 2 days ago |
Yes! This is what I keep complaining about! HTML likewise solved accessibility, but then it goes right through the cycle: someone extends it in a way that requires a special reader, then they focus on people with that reader at the expense of everyone else. Unless you stop the cycle from happening, going to a new format doesn't help!
https://news.ycombinator.com/item?id=38032832
messe 2 days ago |
What's even worse is the PDFs in Letter format.
eleveriven 2 days ago |
Constant zooming and scrolling just to read a single page
freeone3000 2 days ago |
PDFs are in Letter (rarely A4) format, quite simply, to be printed on Letter paper :) The computer screen view is secondary.
carlosjobim 2 days ago |
Yes, but today most of them never exit cyberspace. Wouldn't it be more reasonable to consider that instead of printing?
Edit: Also, are there any advantages with large papers like A4/Letter for physical prints, except that you can fit more on a single page?
freeone3000 2 days ago |
It’s as simple as most consumer printers printing A4/Letter, and most paper being A4/Letter.
carlosjobim 2 days ago |
The more I think about it, the more I'm getting convinced that A4/Letter was a mistake. Maybe we'll see something like A5 as a standard in the future, that would be neat.
chrismorgan a day ago |
> PDFs are in Letter (rarely A4)
You must be from North America. In the rest of the world, it’s always A4. I encounter A4 PDFs fairly often, but don’t know how long it would be since I encountered Letter, but easily years.
Paul_Clayton 2 days ago |
Another problematic aspect is if one has poor eyesight and wishes to use a larger font size. One ends up having to scroll horizontally for each line of text for single column pages. For two-column pages, one has to scroll back a page after reading the first column.
Sometimes one can use a landscape-oriented display to avoid horizontal scrolling, but even if the same word count fits on the screen I seem to be annoyed by the low line count.
Providing large type and huge type PDFs would not entirely solve this problem as sometimes even one with poor eyesight might prefer a smaller font for scanning/skimming. Having to acquire two PDFs and switch between them based on mode of use seems suboptimal.
Fixed paged presentation has significant advantages for familiar reference material; some people seem to have a spatial memory that makes finding specific content by flipping pages faster than trying several search phrases (with the occasional benefit of serendipity).
Poetry often benefits from not reflowing lines and page breaking within a stanza is often more jarring than within a paragraph of prose. Yet a reader might prefer inferior typography over having to use a magnifying glass or carry a very large display.
One might be able to get some of the advantages of paged media for figures and tables by having header (or footer) pop-up links to such content when it is on the same "page" as the displayed text. This is not as low effort as moving one's eyes, but it might be better than inlined presentation on a small (relative to font size) display.
Even with print, there would be times when breaking the text to fit a figure is more disruptive than having the figure on a separate page. There would also be times when all the relevant figures would not fit on the same page as the related text.
Having a separate booklet of illustrations might make going back and forth between text and illustration easier, similar to having a lexicon or commentary open while reading. However, that also introduces position tracking in another book and other inconveniences.
Even when my vision was better, reading academic papers distributed as PDFs (usually 2-column) on a computer screen was less enjoyable than reading similar material in a reflowable format. Academic papers also do not seem to benefit as much from pagination as other writings.
donatj 2 days ago |
I agree, I absolutely hate reading basically everything other than novels/non-fiction narratives in epub. All the work they did laying out the pages is just thrown away! Trying to read any sort of instructional book as an epub is straight up infuriating.
I work for a children's reading platform, and the book publishers universally send us PDFs of everything. We have a bespoke system to convert the PDFs into SVG for higher flexibility and added interactivity.
Literal Kids board books are getting a better treatment!
eleveriven 2 days ago |
It really shows that when there’s care taken with formatting, it makes a huge difference in how engaging and accessible the material is
globular-toast 2 days ago |
Part of this is that books and screens are not interchangeable pieces of technology. Books are still supreme when it comes to reference material and lookup speed but each page needs static typesetting. Screens are more fragile, expensive, generally smaller, with lower contrast and/or resolution, but they allow a fully flexible display with variable font sizes, the ability to scroll half way between pages etc. A PDF on a screen is all of the downsides of books with none of the upsides of screens.
benrutter 2 days ago |
Ebooks where such an exciting prospect when they came out, but whether it's because of anti consumer monopolies, focus on DRM protection instead of features, or just plain old inertia, they haven't really advanced at all passed their first ever implementation.
I sometimes read interactive fiction, and even a format as intentionally simple as Twine supports variables so content can be dynamic. There's no reason ebooks couldn't do things like that, but they are so far behind that they haven't even caught up with where print publishing was 50 years ago yet.
atoav 2 days ago |
As someone who learned layouting from a grumpy old typographer who still had experience in cutting letters by hand I have to say epub layouts are often quite horrible — especially when figures and tables are involved.
warpspin 2 days ago |
While everyone here seems to talk about the epub angle to the story, there's also simply the deeper story here, that "the web's" handling of paged media and the CSS paged media specs (to which his epub problem is related) is a never ending shitshow. Not only for epubs, for everybody who actually wants to print to real paper, too, ideally with a working cross browser solution.
Mistake is largely not in the specs, but in the lack of support for them. Page breaking controls, weirdly breaking tables, lack of access to area outside the page box to influence headers/footers without weird hacks etc. etc. For printing, the 1990ies never ended.
This leads to the bizarre situation where basically everyone who has semi complex printing needs in web applications will create PDF and then print that - and for creating those PDFs, often HTML to PDF conversion is used, just with actually implemented CSS for paged media. Which again proves that the spec is at least 99% there, if somebody would just kindly implement it in a browser, too.
Won't be more complex than having the latest WebGL whatever thing in your browser engine ;-)
wakeupcall 2 days ago |
Some months ago I wanted to format/print some documents, and given the existing tooling I had I decided to try the html->pdf route. I fully agree is a shitshow. The way things break across pages is hard to fix even when hand-tuning the html itself (not just by working it around with css) to avoid content being cut across margins and pages no matter what. I've found chrome to be "less bad", but still unusable. Column handling is even a bigger joke.
In the end I exported the document to libreoffice, and got something way more usable in a few hours just by editing the styles than whatever I was able to do in days of fiddling with html+browser.
iBooks on apple might get a pass as it doesn't need to paginate, but truth be told it seems that epub/ebooks and ereaders in general are being targeted at novels and romance, where form factor, typesetting and formatting doesn't matter that much.
I have access to ebooks through my local library and there's no way I would use, let alone buy, any technical ebook.
Not to mention, I've seen a steady average decline in the quality of printed media in general over the last ~15 years. A lot less attention is put in the typesetting and layout. Even the print quality itself is lower, which I think is due to the smaller and cheaper print runs being done now also for more popular titles.
spookie 2 days ago |
Paper quality in general has seen quite a decline as well.
onetokeoverthe 2 days ago |
Suggestions for better quality paper providers?
PaulHoule 2 days ago |
I thought book quality started going downhill circa 1990.
I am a fan of the old mass market paperbacks. These had a reputation of being low quality books back in the day because they are cheap and not super-durable but I think they are high quality from a Deming point of view because they are made by a process that is highly repeatable. Circa 2000 I thought my 1970s paperbacks were in great shape, but 2010 they were seriously yellowing.
I just looked at my bookshelf and found a '59 James Blish anthology that I bought for 50 cents maybe ten years ago, it is in "poor" condition and will probably crack if I read it without taking great care. Next to that I found a copy of Galbraith's The Affluent Society from 1958 which is perfectly usable except I'd be worried about the cover coming off. A Frank Herbert book from '68 is stained but in great shape other than the cover also being at risk. A '74 Herbert book is a touch discolored but has no problems at all.
(My collection includes not just science fiction of that era but also both self-help and serious books on psychology as well as books about science, politics, social sciences, etc. Government reports about inflation or race relations would be published as mass market paperbacks. You could get Plato and Sartre and Freud and the rest of the Western literary heavyweights)
The construction, materials, process, and such were repeatable enough that they even fail consistently. Not permanent, but 50 years is not bad. The right size to go in a purse or side pocket of a backpack (e.g. part of the loadout of a bibliomaniac who has 12 books in his backpack) I've got to find a good way to reinforce the cover (adhesive tape?)
Those are no longer produced, today it is trade paperbacks. There is wide variation in the dimension, construction, materials and processes for these. You sometimes find a trade paperback that is beautiful, strongly constructed and printed on acid free paper. Others you pay $50 for and the binding breaks the first time you lay the book open on the table.
SSLy 2 days ago |
plenty of high quality books are being printed in indie RPG community.
jamesfinlayson 2 days ago |
> adhesive tape?
Don't - it yellows too.
fuzzythinker 2 days ago |
I use thin 2" tape to wrap the corners and the oldest one is probably 10 years old and no yellowing.
jamesfinlayson 2 days ago |
Might depend on the age and or brand of the tape - I've seen old tape (30+ years maybe) that has yellowed. I have a 15 years old book at home with some tape and it's okay, except for the tape that wasn't in contact with the book (which is yellowed).
bayindirh 2 days ago |
I think being able to format a page just for printing, esp. with HTML/CSS itself is a killer feature and is gigantically underestimated.
I understand that printers are Satan incarnate and runs on concentrated sins of cost cutting engineers, and nobody has time to read that 1.5 page article someone wrote by giving proper effort, but scientific articles, books, nice blog posts we want to delve in, etc. are real and regardless on the substance they run on, printers are real things and they are used nevertheless.
The tendency to assume that everyone is running on high end laptops and cutting edge, network connected tablets is making me angry sometimes. Implementing features like this will make many programmers' life easier who need to be able to generate and print reports from their web applications, too.
It's not only about books and shrewd people who want to print a blog post and study/read that on a table or wherever.
jamesfinlayson 2 days ago |
Completely agree - I wrote a book that had some specific layout requirements in HTML, and while it was easier to get something up and running than LaTeX, getting the printing part right was very painful (not least of all because no browsers seem to support things like page numbering).
vintagedave 2 days ago |
Another Safari CSS failure for typography is drop-cap support, which is where the first letter of a paragraph is drawn larger and in capitals. It's a year since I looked at this, but the comments explaining the weird CSS say:
> This was actually remarkably difficult to get right: the trick is setting line-height and height to < 1em. If they were, eg, 7rem to match the font size then FireFox and Safari render very differently, with Safari showing a much taller gap in the text that the dropcap was centered in.
The same CSS (https://daveon.design/manuscript-vintagedave.css) applies typographical sentence spacing to sentences within a paragraph, which you can see on any article, eg: https://daveon.design/what-are-you-optimising-for.html
I would love to see more CSS support for _typography_: not letter spacing, but actual typographical layout. The article is dead-on accurate when it says we still can't reach what was normal in layout for six centuries.
NoMoreNicksLeft 2 days ago |
My personal gripe is "non-linear". These are pages you shouldn't be able to see just by flipping page-to-page... ideal for "choose your own adventure" type books.
Apple Books doesn't treat these like pages that display only after you tap/click a link though. They render them as a sort of popup above the regular page, styled differently and even with a different page width, as some sort of endnote-not-at-the-end functionality. If that non-linear page also has a link to tap/click on, too bad. Nothing in the spec even slightly suggests their interpretation of it, and the longer they make it the norm, the more impossible it will become to implement the correct functionality because other book titles will have become dependent on it.
__mharrison__ 2 days ago |
When I published my first book, I had to learn all about ebook best practices and ended up writing a book about the state of the art in 2012.
Sad to see how little we've moved on in twelve years.
On a semi-related note, Typst had similar issues but the devs are actively working on fixing issues like this.
lxgr 2 days ago |
> I set myself some pretty stiff criteria for the ebook – it needed to replicate the design of print edition as far as possible [...]
My hot(?) take as a reader: I want none of this.
Just give me proper semantics and let my reader (while respecting my personal preferences) figure out presentation. I'm quite tired of publishers thinking that they somehow need to "preserve the unique aesthetics" of a paper-based medium, despite that usually working out quite poorly.
Fonts that look great on paper don't necessarily on my eInk, LCD, or OLED display I'm reading the book on, for one thing. Margings/padding are usually a world of pain, and one of the first things I end up doing when opening a new ePub is often to disable all publisher formatting.
While I love the idea of an open format and really don't like the idea of being stuck in Amazon's walled garden, at least they figured that part out in their Kindle ecosystem from the beginning. With ePub, it's been hit and miss.