You can use Adobe InDesign to output cross-references in both PDF and HTML files and maintain both outputs using the same source material.
Adobe InDesign does an exceptional job of helping you create file-to-file cross-references when you prepare a book for export to PDF. InDesign also does a creditable job of automatically outputting an HTML file and a CSS, although you must do this output file-by-file, even when you’ve gathered these files together into an InDesign book.
One of the things that InDesign does not do is preserve those PDF cross-references when you output the file to HTML. Adobe states this up front. I had no illusions on this matter.
Output a .indd file that contains a cross-reference as HTML, and you’ll see HTML code that looks something like this:
<p class="paragraphs_paragraph-std"><a href="">
<a href=""> is what remains of the cross-reference bookmark, and that spanned-and-bold GetData is the text of the linked paragraph — in this case, a level-1 head that began the file that was being linked-to. It also happened to be the name of the target file.
Why Adobe could not fill the href with the filename + .html is a design decision on their part, and I won’t debate it.
But I can show you how to get around it.
It’s a Big Book
I created a substantial API document for a client, consisting of over 100 files, plus table of contents and a multi-page index. The document is loaded with “See Alsos” implemented as cross-references, each “See Also” pointing at a separate file. The guide was acceptable as a PDF, but the client wanted an HTML rendition of the same document. As is generally customary, all the HTML files would live in the same directory, and the CSS would live in a different directory at the same hierarchical level as the HTML files. I had no control over where in their server structure the client would choose to put the HTML and CSS — only the relationship of one to the other.
The issue was how to create the hyperlinked cross-references between HTML files living in the same directory — all those “See Alsos.”
InDesign Already Creates HTML Hyperlinks
After confirming that InDesign does not output cross-references as usable links and only as empty hrefs, I turned to InDesign’s Hyperlinks window.
The window is intended as a way for you to insert a hyperlink into a PDF, and it has some restrictions on the choices in the Link To pulldown.
- URL — Requires a fully qualified URL. That is, it prefixes your file name with
http://and produces an
<a href=in the HTML file. This choice is meant to link to a website, not to a web page. In my case, I couldn’t dictate that each file from the book have its own URL or dictate the structure from
- File — Produces an
<a href=in the HTML file that links to a file on the local machine or server. It prefixes the file name with a full pathname. (C:\users\documents, etc.).
- Email — Is inappropriate to my purpose of linking one HTML file to another.
- Page, Text Anchor — Both link to a .indd document, not an HTML file.
- Shared Destination — While the help system states that this selection will allow you to link to any document, my experiments showed that it would link only to any .indd document.
I tested the URL output because it produced the least and most regular file prefix — just http:// rather than a full (and varying) pathname. Its output still included that pesky http://
<p class="paragraphs_paragraph-std"><a href="http://test_link.html">
<span class="Hyperlink">Test Link</span></a></p>
Light Dawns Over Marblehead
There were several hundred cross-references in the whole document, but I could manually duplicate each one with a URL Hyperlink. It would be the work of a day or so, but once done, all those links would be in place.
Note: There may be a script way of duplicating the cross-references with Hyperlinks; I didn’t spend the time trying to create one. Doing this process manually also allowed me to see if there were any cases where my method wouldn’t work for me. I didn’t find any.
By good fortune, I had adopted a file naming convention for the book that began each file name with api_. Thus, each hyperlink that InDesign would create would be in the form
I could do a search-and-replace on the exported HTML files in Dreamweaver for
<a href="http://api_ and replace it with
- Removed the problem of InDesign’s insertion of the http://
- Would not remove any http:// links that really had to be there (they would not be followed by api_)
- Would let html file call html file within the same directory without a fully qualified URL
- Would not dictate where in the directory structure the files had to be located.
Now I had two sets of links in each document — conventional cross-references and Hyperlink references. I created two text conditions, PDF and HTML, and applied the conditions to the appropriate links.
With two document conditions, I could switch on PDF links for PDF output, and HTML links for HTML output, and I could maintain both sets of links in the same source material. I could remove the unused http:// in Dreamweaver during post-production, and the resulting HTML version of the book was hyperlinked in much the same way that the built-in cross-references created for PDFs.
TOC and Index
This client’s document is a work in progress, but it appears that the table of contents may be amenable to removing the http:// from <a href=”http://api_… but the index is a much more gnarly problem. The hrefs in the index are individual page numbers. This makes sense, because in a PDF index, it is the page numbers that are hyperlinked, not the indexed topics. I’m not sure yet whether there is any reasonable automated way to replace the page numbers with HTML bookmarks. My project may need to go without its index for now.
This HTML cross-reference method requires post-processing of the HTML files to remove the http://, but it also requires post-processing for stylesheet use. InDesign’s HTML output process creates a directory for each file called <filename>-web-resource. In that directory is a directory named CSS, and in the CSS directory, the export process puts a stylesheet named idGeneratedStyles.css based on the styles from your InDesign stylesheet.
All my pages use the same InDesign stylesheet definitions; but not every page uses all styles. Consequently, the idGeneratedStyles.css varies from exported file to exported file depending on which styles the file used.
After I exported the first file, I copied idGeneratedStyles.css to a CSS directory and renamed the file main.css.
In succeeding exports, I used the HTML export dialog box to tell InDesign both to generate a stylesheet from the styles used on the page and to use main.css. (In future output, I won’t need to do that; the files will just use main.css.)
I found that the export process copied the pre-existing main.css to the <filename>-web-resource folder and linked to it there rather than link to its original location. I had to make sure that I made any changes to the main.css stylesheet I was building, and not the main.css stylesheet that export had copied into the web-resources directory. I wanted one main.css file for all files to access.
As I exported each succeeding file, in the browser I could see the styles that main.css did not already include. I added those styles from idGeneratedStyles.css to main.css. Thus, main.css slowly accumulated the definitions required by each export and made them available for use in all the files. Before I left it, I modified each HTML file to point only to main.css.
In future HTML outputs, I will have to search-and-replace the link to the generated stylesheet and re-point it at main.css. There does not seem to be any good way to overcome InDesign’s practice of copying main.css into the web-resources directory. However, after working through the book file-by-file, I won’t have to go through the process of accumulating definitions in main.css (unless, for some reason, I add new definitions).
Admittedly, this is a work-around that requires duplicating the built-in cross-references as hyperlinks and setting up conditional text to switch between the two sets. It requires post-processing handwork once during the initial HTML output, and later (much more minimal handwork) whenever the HTML files need to be re-generated or new ones added. But the method does achieve a result that Adobe says InDesign does not do — output HTML hyperlinked cross-references to HTML files.
By building the duplicate links and conditions into the files as the files are written, and planning for the post-processing time needed to build a single CSS file, you can use InDesign to output both PDF and HTML and maintain both outputs using the same source material.