ISO/IEC JTC1 SC22 WG21 N3325 = 12-0015 - 2012-01-15
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Introduction
Issues
Approaches
Good HTML
HTML Standard Compliance
Semantic Font Markup
Semantic Block Tags
Quoted Text
Preformatted Text
Examples
Tables
Deleted and Inserted Text
WG21 Front Matter
Document Identification
Headings and the Table of Contents
Representing C++ Standards Concepts in HTML
Code and Code Blocks
Grammar Terms and Rules
Notes, etc.
Representing Library Function Specifications
Showing Edits
Quoting the Standard
Inserted and Deleted Text
Inserted and Deleted Paragraphs
Formatting the HTML Source
Literate Programming
Rendering Style
Color and Contrast
Tables
Examples
Examples
References
Scripts
quote_code.sh
block_code.sh
block_extract.sh
extract_code.sh
contents.sh
dynacontents.js
outline.sh
outline_with_names.sh
style.hinc
Makefile
At the Summer 2011 meeting, there were several problems with the readability of various presentations. Readability is another side of accessibility, the ability of a wide variety of readers to read the page under a wide variety of conditions.
This paper provides guidelines and tools for producing widely accessible WG21 papers. It is based on my experience in dealing with inaccessible web pages and my experience writing accessible web pages.
These guidelines are for the production of WG21 papers. While many of the concepts and techniques carry over into other uses, they are incomplete with respect to those other uses.
Contrast was the primary problem at the summer meeting. When contrast is low, readability is poor. Further, low contrast exaggerates focus problems.
Reliance on color is a significant problem. First, close to 10% of men are color deficient, which means they cannot see colors normally. There are several kinds of deficiency, but by far the most common is an inability to distinguish red and green. Second, many browsers support a "high contrast" mode, which generally ignores page-specified colors. Third, to save costs, WG21 papers are often printed without color. The net effect is that color differences, and particularly red versus green, is not sufficient to convey information.
Reliance on font face is a significant problem. The "high contrast" mode generally ignores page-specified fonts, So font differences are also not sufficient to convey information.
Reliance on long lines is a significant problem. Low-vision readers rely on being able to increase font size to read the text. Larger fonts mean relatively shorter lines. Smaller windows also achieve the effect of forcing shorter lines. Pages need to adapt to those shorter lines.
Reliance on external tools is not presently a problem, but could become one. Browsers behave differently. They are configured in various ways. They have different sets of extensions and plug-ins. All of this variety leads to problems when straying from plain HTML.
The primary approach to solving these problems is to rely on text to convey information, and secondarily, to enable that text to adapt to the reader's needs. One can decorate with style and color; make it easier to read with style and color; but one must make the text itself convey all needed information.
A consequence of a reliance on text is that pages should avoid technologies that displace or obscure text. Examples include embedding text in images and using Flash.
Text more effectively adapts to readers' needs when the semantic structure of the paper is separated from the presentational choices. In other words, the HTML elements should carry the paper's meaning, and separate CSS should specify presentation. Readers can alter the applied CSS, but altering the elements is much harder.
Plain HTML is the most accessible and most reliable way to convey information. So, we should encode documents with HTML elements that best represent semantics.
Reading is more comfortable when the author respects and accepts the users' choices in browser, settings, colors and sizes.
Finally, papers should avoid reliance on problematic technologies, like Javascript, Java, and video.
Use clean, well-structured HTML. Doing so reduces document construction and maintenance costs, as well as making documents easier to read.
Where possible, comply with all relevant standards. We cannot control where our documents go, so we should help them travel easily.
Avoid machine generation of HTML, as the results tend to work towards a particular paper-based layout rather than provide general readability. In particular, word processors, such as Microsoft Word, produce really bad HTML.
Never put style information within the body of the document.
[HTMLstyleinline]
Instead, uses the class
attribute
to give an element additional semantic information,
which can then be decorated with CSS specified in the document head.
Write documents with strict HTML 4.01 [HTML401] standards compilance.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
Write documents with only the ASCII character set. It is the common base on most systems, and by design, is sufficient to represent C++ source code.
<meta http-equiv="Content-Type" content="text/html;charset=US-ASCII">
Use character entities for non-ASCII letters. [HTMLentities]
name | value | name | value | name | value |
---|---|---|---|---|---|
& | "&" | ü | "ü" | | (non-breaking space) |
< | "<" | Ñ | "Ñ" | — | "—" (em dash) |
> | ">" | ó | "ó" | – | "–" (en dash) |
Use an HTML validator. The W3C provides one at validator.w3.org.
Browsers may ignore font specifications,
but they generally do not ignore the phrase markup elements
[HTMLphrase]
em
, strong
, dfn
,
code
, samp
, kbd
,
var
, cite
,
and abbr
.
So, one should use one of them rather than the font style elements
[HTMLfontstyle]
tt
, i
, b
,
big
, small
, strike
,
s
, and u
.
One should certainly not use
the font
element.
[HTMLfontstyle]
Normal emphasis should use the em
element,
rather than the i
element.
Strong emphasis should use the strong
element,
rather than the b
element.
The definition of a term should use the dfn
element,
rather than the i
element.
Citations should use the cite
element.
Abbreviations should use the abbr
element.
This element requires extra work to be effective,
and so may not find much application within WG21 papers.
Text that is variable, that is intended for substitution,
should use var
element,
rather than the i
element.
Grammar symbols fall directly into this category.
C++ identifiers, keywords, punctuation, and the like should use
one of the code
element.
Sample output should use the samp
element.
User input should use the kbd
element.
Such text that is variable, that is intended for substitution,
should also use the var
element.
Within C++ code, the characters
&
, <
, and >
must be quoted.
The script quote_code.sh
will convert C++ code into properly quoted HTML.
Browsers often use the same representation for more than one phrase element. Commonly, the common representations are in the following table.
representation | tags |
---|---|
normal | abbr acronym |
italic | cite dfn em i var |
bold | b strong |
fixed-width | code kbd samp tt |
Authors should excercise care to ensure that
these overloaded elements are used
in contexts where the intent is reasonably clear.
Fortunately, some of these can be mixed,
such as var
with code
.
Use elements to indicate document structure, not just for their effect on formatting.
When the semantics of a element are not fine enough, identify additional distinctions with class attributes. [HTMLclass]
There are two types of quotes: block quotes and inline quotes. [HTMLquote]
Block quotes use the blockquote
element
and denote paragraph-level quotations.
As such they always have a block-level element within them,
such as an explicit p
element.
Inline quotes use the q
element,
and generally enclose short quotations.
Use inline quotes in place of quotation marks.
Some browsers fail to add the quotation marks as specified,
so this element may require some more time before it is reliable.
Use the pre
element to enclose preformatted text.
[HTMLpre]
The pre
element has definitional problems.
In particular, the browser may or may not change to a fixed-width font,
which means the author can neither avoid nor rely on a fixed-width font.
Therefore, authors should always specify a fixed-width font
immediately within the preformatted text
and ensure that it is active throughout the block.
That is,
<pre>
line of wait for it text
followed by some indentation
</pre>
is not reliable. Instead, authors should specify
<pre>
<code>line of ... wait for it ... code
some of which is indented</code>
</pre>
Furthermore, while
<pre><code>
line of ... wait for it ... code
some of which is indented
</code></pre>
is cleanest, some browsers incorrectly [HTMLline] add an extra blank line at the beginning of the preformatted text.
In any event, preformatted text does not wrap lines, which makes them very difficult to read when the line width is greater than the window width. (This problem happens when either characters are large or windows are narrow.) Therefore, authors should strive to keep preformatted lines short.
A significant part of WG21 documents are examples.
Represent examples with class=example
applied to p
paragraphs,
pre
preformatted text or
div
document divisions.
Divisions contain any number of paragraphs.
For example, the example
int main() {
return 0;
}
Is represented as
<pre class="example">
<code>int main() {
return 0;
}</code>
</pre>
The script block_code.sh will convert C++ code into properly quoted, preformatted, example code block.
Tables
[HTMLtable]
may consist of a caption (caption
),
a head (thead
),
a body (tbody
), and
a foot (tfoot
).
The last three elements contain rows.
The head and foot elements
enable browsers to duplicate headings and footings
when splitting a table across multiple pages.
<table>
<caption>Common Phrase Representations</caption>
<thead>
<tr><th>representation</th><th>tags</th></tr>
</thead>
<tbody>
<tr><td>normal</td><td><code>abbr acronym</code></td></tr>
<tr><td>italic</td><td><code>cite dfn em i var</code></td></tr>
<tr><td>bold</td><td><code>b strong</code></td></tr>
<tr><td>fixed-width</td><td><code>code kbd samp tt</code></td></tr>
</tbody>
</table>
Avoid wide tables; these tend to get truncated when printed. Test table width by deliberately making the browser window very narrow.
HTML provides direct representation of deleted and inserted text. [HTMLdelins] These should be used in preference to ad hoc mechanisms.
The HTML standard intended these elements for showing modifications to the document itself. However, that is rarely a problem with WG21 papers. Instead they need to show edits to the working draft, and this repurposing of the elements is reasonable.
For example, to achieve
This text was deleted inserted today.
use
This text was <del>deleted</del> <ins>inserted</ins> today.
and do not use
This text was <span class="del">deleted</span>
<span class="ins">inserted</span> today.
and certainly not
This text was <strike>deleted</strike>
<u>inserted</u> today.
and especially not
This text was <span style="color:red;">deleted</span>
<span style="color:green;">inserted</span> today.
When the deletion occurs before an insertion, readers can use the deletion to set the context for the insertion. So, when paired, the deletion should come before its insertion.
Unless spacing is critical to the changes, deletions and insertions should be spaced. However, in the presence of changing punctuation, non-spacing markup is preferable to excessive markup, particularly when readers may not notice it. For example,
This text <del>glues</del><ins>joins</ins> words.
This text <del>modifies</del><ins> changes</ins> spacing.
This text <del>highlights</del> <ins>clarifies</ins> changes.
Red<ins>, yellow</ins> and green are hard to distinguish.
yields
This text gluesjoins words.
This text modifies changes spacing.
This text highlights clarifies changes.
Red, yellow and green are hard to distinguish.
The del
and ins
elements
are supposed to act as either block-level or inline-level elements,
however some browsers fail to render them properly as block-level elements.
Therefore, authors should use these elements as inline elements only.
(This workaround is most annoying for tables and lists.)
The front matter for WG21 documents includes document identification and possibly a table of contents or a revision history.
The document identification includes a title,
which is specified
in the title
element within the head
element
and in the h1
element at the top of the body
element.
<title>HTML for C++ Standards Documents</title>
</head>
<body>
<h1>HTML for C++ Standards Documents</h1>
Follow the title with the document identification numbers, which is composed of "ISO/IEC JTC1 SC22 WG21", the WG21 paper number, the INCITS paper number, and the ISO date.
<p>
ISO/IEC JTC1 SC22 WG21 N3325 = 12-3325 - 2012-01-15
</p>
Follow the title with the authors. Email addresses are optional.
<address>
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
</address>
Other optional front matter includes a table of contents and a revision history.
When headings follow a simple format, they can be easily and automatically converted into a table of contents. The format consists of a single line containing a heading element and directly within that an anchor element. The anchor provides not a reference, but a name. That name must be unique within the file. Using the standard's own tagging system is often unique, but not always.
For example, the header,
30.6.6 Class template
future
[futures.unique_future]
is encoded on one source line as
<h3><a name="futures.unique_future">30.6.6 Class template <code>future</code> [futures.unique_future]</a></h3>
The script contents.sh generate a table of contents. The resulting file can be simply included into the HTML source.
Alternatively, one can use Javascript within the HTML itself to dynamically generate the contents. The script dynacontents.js, courtesy of Jeffrey Yasskin, does this task. It assumes that the user has previously included
<script
src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"
type="text/javascript">
</script>"
It also assumes an p
element, somewhere in the document,
with the id "toc", which it will fill in with the table of contents.
While the table of contents serves as an outline, a more specific command-line tool that emits the outline can be helpful during development. There are two such scripts, one that emits just the headings and one that also emits the anchor names. They are available as outline.sh and outline_with_names.sh, respectively.
Conventions on the use of HTML in representing concepts of the C++ standard will help in cooperative editing, sharing of helpful tools, and automatic translation into the LaTeX source of the standard itself.
Code should use the code
element.
Non-normative code within the standard
should use the var
phrase element.
Examples include parameter names, expository member variables,
standard meta-variables, and example implementations.
The standard often marks comments with an italic font.
Mark these with the em
element.
Emphasized code must use the strong
element,
because normal emphasis is usually visually indistinct from variable text.
Avoid long lines in code blocks, as they may interfere with the readablity of the document.
For example, the partial description of a standard counter
might look something like the following.
template< typename T >
class counter
{
// expository fields:
private:
T value;
public:
// construct and destruct:
counter() : value( 0 ) { }
counter(const counter& d) = default;
~counter() = default;
// operations:
void inc( T b) { value += b; }
T get() { return value; }
}
It is encoded as follows.
<pre class="example">
<code>template< typename T >
class counter
{
// <em>expository fields:</em>
<var>private:</var>
<var>T value;</var>
public:
// <em>construct and destruct:</em>
counter() <var>: value( 0 ) { }</var>
counter(const counter& <var>d</var>) <strong>= default</strong>;
~counter() <strong>= default</strong>;
// <em>operations:</em>
void inc( T <var>b</var>) <var>{ value += b; }</var>
T get() <var>{ return value; }</var>
}</code>
</pre>
The C++ grammar has the structure of a descriptive list, several terms each of which may have several definitions. We exploit that parallel structure by representing C++ grammar rules with descriptive lists.
Grammar terms are represented denoted by
a var
variable-phrase element.
When a grammar term is defined,
it is contained within by
a dt
descriptive-term element,
and marked by dfn
definition phrase element.
(The colon outside the dfn
element
makes automatic indexing easier.)
Each substitution rule is denoted by
a dd
descriptive-definition element.
The optional marker is denoted by
a sub
subscript element
within the var
element.
Literal code is denoted by the code
phrase element.
The grammar
static_assert (
constant-expression ,
string-literal ) ;
is encoded as
<dl>
<dt><dfn>declaration-seq</dfn>:</dt>
<dd><var>declaration</var>
<var>declaration-seq<sub>opt</sub></var></dd>
<dt><dfn>static_assert-declaration</dfn>:</dt>
<dd><code>static_assert (</code>
<var>constant-expression</var> <code>,</code>
<var>string-literal</var> <code>) ;</code></dd>
</dl>
Represent notes, footnotes and examples by surrounding them with markers. These markers have the form
[Footnote: Here is some non-normative text. —end footnote]
and are encoded as
[<i>Footnote:</i>
Here is some non-normative text.
—<i>end footnote</i>]
Note that here we use the font element i
because it does not really fit any of the phrase markers
and because it makes searching for such uses easier.
While not part of the final standard, rationale, editor's notes and notes to the editor can also be represented this way.
Comments on the paper itself,
and particularly notes on work still to be done
can be marked the same way,
except using the b
element instead of the i
element.
This change enables rapid searching for unfinished parts of the document.
The library has special formatting requirements
for representing functions and their attributes.
Each function prototype is contained within
p class="function"
element.
The attribute paragraphs are all contained
dl class="attribute"
element.
Each attribute is labeled with a dt
element
and has its body in a dd
element.
For example,
the function definition
dynarray(size_type c);
The constructor parameter shall be greater than zero.
May or may not invoke the global operator new
.
is represented as
<p class="function">
<code>dynarray(size_type <var>c</var>);</code>
</p>
<dl class="attribute">
<dt>Requires:</dt>
<dd><p>
The constructor parameter shall be greater than zero.
</p></dd>
<dt>Effects:</dt>
<dd><p>
May or may not invoke the global <code>operator new</code>.
</p></dd>
</dl>
In the end, papers are effective only when they edit the working draft. This section explains how to do that.
The first step in editing the standard is to quote the standard.
For that we use the blockquote
element
with class="std"
.
Each quoted portion of the standard
must be preceeded by a paragraph
indicating where in the standard it comes from.
The section may be known from context,
but if not, it should be stated explicitly.
So, the quote appears as
Section 1.8 [intro.object] paragraph 5 says:
Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size. An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.
and is encoded as
<p>
Section 1.8 [intro.object] paragraph 5 says:
</p>
<blockquote class="std">
<p>
Unless it is a bit-field (9.6),
a most derived object shall have a non-zero size
and shall occupy one or more bytes of storage.
Base class subobjects may have zero size.
An object of trivially copyable or standard-layout type (3.9)
shall occupy contiguous bytes of storage.
</p>
</blockquote>
One can show edits to a paragraph by combining the quoting of the standard with the delete and insert markup described above. So, an edit appears as
Edit section 1.8 [intro.object] paragraph 5 as follows.
Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes
of storage. Base class subobjects may have zero size. An object of trivially copyable, trivially movable or standard-layout type (3.9) shall occupy contiguous bytes of storage.
and is encoded as
<p>
Edit section 1.8 [intro.object] paragraph 5 as follows.
</p>
<blockquote class="std">
<p>
Unless it is a bit-field (9.6),
a most derived object shall have a non-zero size
and shall occupy one or more bytes <del>of storage</del>.
Base class subobjects may have zero size.
An object of trivially copyable<ins>, trivially movable</ins>
or standard-layout type (3.9)
shall occupy contiguous bytes of storage.
</p>
</blockquote>
When deleting or inserting whole paragraphs or sections,
the del
and ins
elements
need not be used,
but the introductory text
should clearly indicate the edit.
In addition, the blockquote
elements
use class="stddel"
or class="stdins"
, respectively.
So, full paragraph deletions and insertions
appear as
Delete paragraph 12 of 2.14.5 String literals [lex.string].
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.
After paragraph 12 of 2.14.5 String literals [lex.string], insert a new paragraph.
All string literals are distinct; their characters never share addresses.
and are encoded as
<p>
Delete paragraph 12 of 2.14.5 String literals [lex.string].
</p>
<blockquote class="stddel">
<p>
Whether all string literals are distinct
(that is, are stored in nonoverlapping objects)
is implementation-defined.
The effect of attempting to modify a string literal is undefined.
</p>
</blockquote>
<p>
After paragraph 12 of 2.14.5 String literals [lex.string],
insert a new paragraph.
</p>
<blockquote class="stdins">
<p>
All string literals are distinct;
their characters never share addresses.
</p>
</blockquote>
The format of the HTML source itself can improve its interaction with tools.
Starting each sentence on a new line
improves the stability of diff
,
and hence of source code version control systems.
The same applies to
putting block-level elements on lines separate from live text.
When editing the source, separating block-level elements makes them more quickly identifiable.
More regularity in the HTML source eases tools for converting HTML source to other forms, like the LaTeX of the standard itself.
The C++ standard's papers are a good application of literate programming. [LPcom] [LPwiki] Particularly when a papers includes normative declarations or sample implementations, an automatic process for extracting the code from the paper itself helps ease adoption concerns.
The essential idea is to identify code to be extracted with a distinct class,
e.g. "extract"
,
and then remove everything but that within those code
elements.
That process is eased considerably
when all code text is on lines separate from other text.
Typically, this is accomplished with HTML of the form:
<pre class="extract"><code class="extract">
echo "Hello, World!"
</code></pre>
Within the code,
all HTML elements should be removed,
which enables links, phrase tags, and other markup
within the code.
Further, the critical HTML character entities,
<
, >
,
&
, and
,
must be recognized and substituted.
For presentational purposes,
it is also helpful to identify the pre
element
containing the code for extraction.
Again, use the same class as above, e.g. "extract"
.
The script block_extract.sh will convert C++ code into properly quoted, preformatted, extraction code block.
The HTML files can contain either a single code file, as in N2648, or multiple code files, as in N2427. In the latter case, the multiple files are actually generated from a single-file shell script. The scripts in this paper follow that approach.
The script extract_code.sh will extract the code from the HTML source of this paper. Simply execute the resulting shell script to get copies of the scripts.
Once the paper is well structured and independent of the presentation,
we must address creating a readable presentation.
We encode that presentation in a style
element
within the head
element of the document.
(Alternately, we could create a standard location
for a separately read CSS file.)
The proposed style
element
is style.hinc
in the Scripts section.
Color and contrast must meet specific technical requirements. These are embodied in the Web Content Accessibility Guidelines (WCAG). [WCAG] In particular, the intensity of the foreground and background must be sufficiently different. In addition the hue of the foreground and background should be sufficiently different. Web pages exist to test colors against the various criteria. [Snook] Further, consideration must be given to red-green color deficiency.
By far, the most common use of color in WG21
is to mark inserted and deleted text.
The normal convention
is to use red for deleted text and green for inserted text.
However, this color combination is problematic for red-green deficient readers.
Instead, we use magenta in place of red.
The added blue to the color makes it visually distinct from green.
The other typical problem with the colors chosen
is that they are too bright to provide good contrast
with the typical light background.
So, these colors need to be resonably dark,
but still light enough to be distinct.
The foreground colors
#005100
green and #8B0040
red-magenta
meet the criteria against a fairly broad range of light backgrounds.
Unfortunately, once we specify a foreground color,
we must specify a specific background color.
A white background reduces printing costs.
However, these colors alone are not sufficient to identify inserted and deleted text. For that we must add text decoration. In particular, we follow existing convention and mark deleted text with a line struck through and inserted text with an underline. Now, even in the absence of color, deletions and insertions are distinct.
Earlier, we described the need to
mark whole quoted paragraphs of the standard as deleted or inserted.
We do this by changing the background for the paragraph.
In particular, deleted quotes
have a #FFEBFF
light magenta background
and inserted quotes
have a #C8FFC8
light green background.
Regular quoted paragraphs of the standard
have a #F1F1F1
light grey background.
Extracted code
has a #F5F6A2
light yellow background.
Finally, each of these backgrounds
is surrounded by a thin, slightly darker, border.
This border provides an attractive edge to the quote.
More importantly,
when browsers ignore color, as in high-contrast mode,
they typically do not ignore borders.
These thin subtle borders become very visible when the color is lost.
The default formatting of tables makes identifying table cells difficult. To address this problem and to be consistent with the formatting of many (but not all) of the standard's tables, we make several formatting choices.
Cell text is vertically aligned to the top, which makes identifying rows easier.
Cell text is horizontally aligned to the left, which makes identifying columns easier. (Authors may choose to use right alignment for numeric columns.)
Cells are given a little bit of extra spacing.
Use a thin borders around the table, but not individual cells or the caption.
For examples, we just indent a bit.
For extracted code, we indent a bit and use the background color for extracted code.
References to WG21 papers can simply use the N-number. References to WG14 papers can simply use "WG14" and the N-number. These references should link to the appropriate documents, via HTML like the following.
This paper analyses the compatibility between the draft standards,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3035.pdf">
N3035</a> and WG14
<a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1425.pdf">
N1425</a> with respect to alignment.
All other references should be elaborated in a references section, such as this one. The purpose of the references section is to enable following references from printed documents. NOTE: when there is a reference section, it is unclear whether references should link to the reference section entry or link directly to the referrent.
The remainder of this section lists references for this document.
This section contains several sh
/sed
scripts
supporting the methods in this document.
The scripts are contained within a sh
script
that generates those files.
This script quotes code. The result may be used within paragraphs.
cat <<"EOF" >quote_code.sh
exec sed -e '
1 i<code>
s|&|\&|g
s|<|\<|g
s|>|\>|g
$ a</code>
' "$@"
EOF
This script creates an example block of code.
cat <<"EOF" >block_code.sh
exec sed -e '
1 i<pre class="example">
s|&|\&|g
s|<|\<|g
s|>|\>|g
1 s|^|<code>|
$ s|$|</code>|
$ a</pre>
' "$@"
EOF
This script creates an example block of code intended for extraction.
cat <<"EOF" >block_extract.sh
exec sed -e '
1 i<pre class="extract"><code class="extract">
s|&|\&|g
s|<|\<|g
s|>|\>|g
$ a</code></pre>
' "$@"
EOF
This script extracts code from an HTML source. It can serve as the inverse function of the above, is intended to extract more generally annotated code. This script takes the class name as the first parameter.
cat <<"EOF" >extract_code.sh
class=$1
shift
exec sed -e '
1,/<code class="'$class'">/ d
/<\/code>/,/<code class="'$class'">/ d
/<\/code>/,$ d
s|<[^<>]*>||g
s|<|<|g
s|>|>|g
s| | |g
s|&|\&|g
' "$@"
EOF
This script creates a table of contents. The first parameter to the script is the depth of headings to include in the contents.
cat <<"EOF" >contents.sh
usage()
{
echo "usage: $0 <depth in [2-6]> [<file>...]" 1>&2
}
if test $# -lt 1
then
usage
exit 1
fi
case $1 in
[2-6])
DEPTH=$1
shift
;;
*)
usage
exit 1
;;
esac
IN1="\ \ \ \ "
IN2="${IN1}{$IN1}"
IN3="${IN2}{$IN1}"
IN4="${IN3}{$IN1}"
sed -e '
1 i<p>
$ a</p>
/<h[2-'${DEPTH}']>/ ! d
s|name="|href="#|
s|</h[2-6]>|<br>|
s|<h2>||
s|<h3>|'${IN1}'|
s|<h4>|'${IN2}'|
s|<h5>|'${IN3}'|
s|<h6>|'${IN4}'|
' "$@"
EOF
This script creates a table of contents dynamically within the web page. Place the script within the head of the HTML. It assumes that the user has previously included
<script
src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"
type="text/javascript">
</script>"
It also assumes an div
element, somewhere in the document,
with the id "toc", which it will fill in with the table of contents.
The browser must support Javascript, and the browser must be network accessibile, for the contents to appear.
Thanks to Jeffrey Yasskin for the code.
cat <<"EOF" >dynacontents.js
<script type="text/javascript">$(function() {
var next_id = 0
function find_id(node) {
// Look down the first children of 'node' until we find one
// with an id. If we don't find one, give 'node' an id and
// return that.
var cur = node[0];
while (cur) {
if (cur.id) return curid;
if (cur.tagName == 'A' && cur.name)
return cur.name;
cur = cur.firstChild;
};
// No id.
node.attr('id', 'gensection-' + next_id++);
return node.attr('id');
};
// Put a table of contents in the #toc nav.
// This is a list of <ol> elements, where toc[N] is the list for
// the current sequence of <h(N+2)> tags. When a header of an
// existing level is encountered, all higher levels are popped,
// and an <li> is appended to the level
var toc = [$("<ol/>")];
$(':header').not('h1').each(function() {
var header = $(this);
// For each <hN> tag, add a link to the toc at the appropriate
// level. When toc is one element too short, start a new list
var levels = {H2: 0, H3: 1, H4: 2, H5: 3, H6: 4};
var level = levels[this.tagName];
if (typeof level == 'undefined') {
throw 'Unexpected tag: ' + this.tagName;
}
// Truncate to the new level.
toc.splice(level + 1, toc.length);
if (toc.length < level) {
// Omit TOC entries for skipped header levels.
return;
}
if (toc.length == level) {
// Add a <ol> to the previous level's last <li> and push
// it into the array.
var ol = $('<ol/>')
toc[toc.length - 1].children().last().append(ol);
toc.push(ol);
}
var header_text = header.text();
toc[toc.length - 1].append(
$('<li/>').append($('<a href="#' + find_id(header) + '"/>')
.text(header_text)));
});
$('#toc').append(toc[0]);
})
</script>
EOF
This script creates an outline of the document.
cat <<"EOF" >outline.sh
SPC="[ ]"
SPCSOPT="${SPC}*"
IN1=" "
IN2="${IN1}${IN1}"
IN3="${IN2}${IN1}"
IN4="${IN3}${IN1}"
exec sed -e "
/<h[1-6]>/ ! d
s/${SPCSOPT}<h1>//
s/${SPCSOPT}<h2>//
s/${SPCSOPT}<h3>/${IN1}<h3>/
s/${SPCSOPT}<h4>/${IN2}<h4>/
s/${SPCSOPT}<h5>/${IN3}<h5>/
s/${SPCSOPT}<h6>/${IN4}<h6>/
s/<[^>]*>//g
" "$@"
EOF
This script creates an outline of the document, including the anchor names.
cat <<"EOF" >outline_with_names.sh
SPC="[ ]"
SPCSOPT="${SPC}*"
SPCSREQ="${SPC}${SPCSOPT}"
DQT='"'
QUOTE='\("[^"]*"\)'
IDENT='\([^ >]*\)'
ANAME="<a${SPCSREQ}name="
ENDA="${SPCSOPT}>"
IN1=" "
IN2="${IN1}${IN1}"
IN3="${IN2}${IN1}"
IN4="${IN3}${IN1}"
LBL1='"[^"]\{0,5\}"'
LBL2='"[^"]\{6,13\}"'
LBL3='"[^"]\{14,21\}"'
exec sed -e "
/<h[1-6]>/ ! d
s|${SPCSOPT}<h1>||
s|${SPCSOPT}<h2>||
s|${SPCSOPT}<h3>|${IN1}<h3>|
s|${SPCSOPT}<h4>|${IN2}<h4>|
s|${SPCSOPT}<h5>|${IN3}<h5>|
s|${SPCSOPT}<h6>|${IN4}<h6>|
s|${ANAME}${QUOTE}${ENDA}|\1 |
s|${ANAME}${IDENT}${ENDA}|"'"\1"'" |
s|\(.*\)${QUOTE} \(.*\)$|\2 \1\3|
s|\(${LBL1}\)|\1 |
s|\(${LBL2}\)|\1 |
s|<[^>]*>||g
" "$@"
EOF
This style element implements the style choices described above. It is intended for inclusion in WG21 papers.
cat <<"EOF" >style.hinc
<style type="text/css">
body { color: #000000; background-color: #FFFFFF; }
del { text-decoration: line-through; color: #8B0040; }
ins { text-decoration: underline; color: #005100; }
p.example { margin-left: 2em; }
pre.example { margin-left: 2em; }
div.example { margin-left: 2em; }
code.extract { background-color: #F5F6A2; }
pre.extract { margin-left: 2em; background-color: #F5F6A2;
border: 1px solid #E1E28E; }
p.function { }
.attribute { margin-left: 2em; }
.attribute dt { float: left; font-style: italic;
padding-right: 1ex; }
.attribute dd { margin-left: 0em; }
blockquote.std { color: #000000; background-color: #F1F1F1;
border: 1px solid #D1D1D1;
padding-left: 0.5em; padding-right: 0.5em; }
blockquote.stddel { text-decoration: line-through;
color: #000000; background-color: #FFEBFF;
border: 1px solid #ECD7EC;
padding-left: 0.5empadding-right: 0.5em; ; }
blockquote.stdins { text-decoration: underline;
color: #000000; background-color: #C8FFC8;
border: 1px solid #B3EBB3; padding: 0.5em; }
table { border: 1px solid black; border-spacing: 0px;
margin-left: auto; margin-right: auto; }
th { text-align: left; vertical-align: top;
padding-left: 0.8em; border: none; }
td { text-align: left; vertical-align: top;
padding-left: 0.8em; border: none; }
</style>
EOF
This paper was put together with make
,
so as to avoid manual conversions on all the examples.
It may be useful as a starting point for other papers.
The extensions used in the Makefile
are as follows.
extension | file use | comment |
---|---|---|
.hsrc | source | an HTML source file with include directives |
.hinc | source | an included HTML source file |
.cc | source | a C/C++ source code file |
.sh | source | a Bourne shell source code file |
.html | product | a complete HTML file |
.qinc | intermediate | a quoted version of a .hinc |
.vinc | intermediate | a verbatim <pre> version of a .hinc |
.cinc | intermediate | a contents file for an .hsrc file |
.qcod | intermediate | a verbatim version of a source code file |
.vcod | intermediate | a verbatim version of a .qcod file |
.qext | intermediate | a code extraction version of a source code file |
.vext | intermediate | a version of a .qext file |
.xext | intermediate | the code extracted from an HTML file |
cat <<"EOF" >Makefile
default : help
help :
@echo "make help -- do nothing but print this message"
@echo "make variables -- do nothing but show important build variables"
@echo "make outline -- show the paper outline"
@echo "make documents -- build the HTML documents"
@echo "make codefiles -- build the code files"
@echo "make all -- build the documents and code files"
@echo "make test -- test that extracted code files match sources"
@echo "make clean -- remove the documents and intermediate files"
INTERMEDIATE = *.qinc *.vinc *.cinc *.qcod *.vcod *.qext *.vext *.xext *.d
CPP = cpp -MMD -MP -w -P -C -traditional-cpp
DIFF = for f in *; do echo comparing $$f; diff ../$$f $$f; done
%.html : %.hsrc
$(CPP) -MT $@ $< $@
%.qinc : %.hinc
sh quote_code.sh $< > $@
%.vinc : %.hinc
sh block_code.sh $< > $@
%.cinc : %.hsrc
sh contents.sh 6 $< > $@
%.qcod : %.cc
sh block_code.sh $< > $@
%.vcod : %.qcod
sh block_code.sh $< > $@
%.qext : %.sh
sh block_extract.sh $< > $@
%.qext : %.js
sh block_extract.sh $< > $@
%.qext : %.hinc
sh block_extract.sh $< > $@
%.vext : %.qext
sh block_code.sh $< > $@
%.xext : %.html
sh extract_code.sh extract $< > $@
Makefile.qext : Makefile
sh block_extract.sh $< > $@
SOURCES := $(shell echo *.hsrc)
PREBUILD := $(shell sh prebuild.sh $(SOURCES))
DOCUMENTS := $(SOURCES:.hsrc=.html)
CODEFILES = htmlcppstd.xext
variables :
@echo "SOURCES = $(SOURCES)"
@echo "DOCUMENTS = $(DOCUMENTS)"
@echo "PREBUILD = $(PREBUILD)"
outline :
sh outline_with_names.sh htmlcppstd.hsrc
$(DOCUMENTS) : $(PREBUILD)
documents : $(DOCUMENTS)
codefiles : $(CODEFILES)
all : documents codefiles
testing :
mkdir testing
test : htmlcppstd.xext testing
cd testing ; sh ../htmlcppstd.xext ; $(DIFF)
clean :
rm -rf testing $(DOCUMENTS) $(INTERMEDIATE)
-include *.d
EOF
cat <<"EOF" >prebuild.sh
SPC="[ ]"
SPCSOPT="${SPC}*"
INCLDIR="^${SPCSOPT}#${SPCSOPT}include${SPCSOPT}"
exec sed -e '
/#include/ ! d
/\.hinc/ d
s/'"${INCLDIR}"'"\(.*\)".*/\1/
' "$@"
EOF