txt2html README

 txt2html - Text to HTML converter - Korn Shell 93 Script Function
 Copyright (C) 2006-2015 Dana French 
 Home Page: http://www.mtxia.com/fancyIndex/Tools/Scripts/Korn/K93_Unix/

Intro

txt2html is a plain text to HTML converter written as a Korn Shell 93 script function. It succesfully converts subtle text markup to lists, bold, italics, tables and headings to their corresponding HTML markup without having to write unreadable source text files.

No installation is necessary, txt2html can be utilized from the command line or as a function called from another shell script. The program "txt2html" is itself a Korn Shell 93 script.

txt2html someFile.txt

txt2html includes several command-line arguments; see them using

txt2html -?

This README file contains all the marks that txt2html is able to understand so that it serves also as a demo. So, to see how this file will look after being converted by txt2html, you can just do

txt2html < txt2html.README > README.html

and see its effects.

Text paragraphs

A text paragraph is any group of lines containing text delimited by one or more blank lines, provided that none of them beings with a blank space. So, you just write lines as usual (wrapping or not), and separates paragraphs as in a word processor.

Headings

A line is understand as a heading if it's immediately followed by another one that contains only a repetition of a special character (see 'Text paragraphs' and 'Headings' for an example). There are three heading levels depending on this special character: if it's a line of = (equal sign), it's a first level heading, used for titles and tagged with h1 HTML tags. If it's - (hyphen), it's a second level heading, and if it's ~ (tilde), a third level one. This document shows the three heading levels. It's suggested that the first level heading is used only once, as it's magically taken as the title for the HTML page, if one is not overriden as a command line argument.

Text effects

If some text is surrounded by asterisks, as \this one\, it's marked as bold (you probably wrote text this way in email to emphasize something). As well, text surrounded by the _ symbol (underscore), as \this one\, is marked as italic. Bold can also be marked up surrounding the text with three apostrophes (\this way\) and italics with two (\this way\). If you ever used a WikiWikiWeb system you'll be familiar with these ones.

Other special text is automatically recognized, as URLs (so that the URL http://www.mtxia.com should be clickable). Text beginning with ./ is interpreted as relative URLs, so ./index.html should also be clickable.

txt2html can also be useful when documenting source code, as function names like printf() or variables like $username are also highlighted. There are command line arguments to make the parenthesis and / or leading dollar to disappear from the output document.

URLs are simply substituted as shown above; if an URL is followed by a phrase surrounded by parentheses (just like you naturally would do to explain the contents of a web), this phrase is used as the link text, as in this example pointing to http://www.mtxia.com/fancyIndex/Tools/Scripts/Korn/K93_Unix/txt2html.html (the txt2html Home Page).

Lists

txt2html is powerful rendering lists. There are three types of lists: unnumbered ones (bulleted), numbered ones and definition lists. They are recognized as lines starting with a blank (space or tab) immediately followed by an special character.

  • Unnumbered lists start with some blanks, followed by an asterisk, followed by another blank. If the following lines are space indented, they are assumed as part of the same list element. The asterisk can also be a - (hyphen).
  • Lists can have multiple levels. To add another level,
    • Just indent a bit deeper,
      • and have hours of fun
        • nesting.
      • unindent 1 level
    • unindent a 2nd level
  • Numbered lists are marked up almost the same, just by substuting the asterisk by a # (sharp) or 1 (number one).
  • Definition lists are marked up almost the same, but delimiting the definition term from the definition itself by a colon.

List examples

Unnumbered list:

  • First element. Elements at the same level must be indented by the same number of spaces.
  • The second one.
    • The second element has one sub-element.
    • And another...
      • that, itself, has another one
    • unindent 1 level
  • The third one...
    • Has another extremely long sub-element to show that long ones are rendered correctly. Please note that the elements of a list cannot be separated by blank lines or they will be interpreted as different lists.
  • The 4th and final one...
    • And its final child.

Ordered list:

  1. First element.
  2. The second one.
    1. The second element has one sub-element.
    2. And another...
      1. that, itself, has another one
    3. unindent 1 level
  3. The third one...
    1. Has another extremely long sub-element to show that long ones are rendered correctly. Please note that the elements of a list cannot be separated by blank lines or they will be interpreted as different lists.
    2. And another sub-element, to show this is not a cut & paste from the unsorted example.
  4. The 4th and final one. Note also that ordered and unsorted lists cannot be combined.

Definition list:

first
the first element and this is the second line of the first definition list and it will wrap around the full line of the browser so that it is visible across multiple lines
second
the second element
third
the third element

Preformatted text

A text that should be rendered as is should be written with at least a blank in the beginning of all lines. This can be an example:

 int main(int argc, char * argv[])
 {
	/* an example of useless C code */
	return(0);
 }

If you ever wrote any Perl POD documentation, you'll be familiar with this.

If you write preformatted text and its first line collisions with list definitions (i.e. text with lines beginning with blanks and an asterisk or sharp) just insert a line containing only spaces before it.

Cites

If you want to quote a (possibly long) paragraph of text, use a blank followed by a " (double quote) in its first line, as in the following example:

"BRAIN, n. An apparatus with which we think what we think. That which distinguishes the man who is content to _be_ something from the man who wishes to _do_ something. A man of great wealth, or one who has been pitchforked into high station, has commonly such a headful of brain that his neighbors cannot keep their hats on. In our civilization, and under our republican form of government, brain is so highly honored that it is rewarded by exemption from the cares of office." -- Ambrose Bierce

The leading double quote remains as part of the cited paragraph.

HTML

If you need to insert HTML as is (for rendering, say, images or complicated layouts), you can also do it. Anything between two < symbols and two > symbols will be passed without any further processing. So, to insert an image, just do this:

<<
<center>
<img src=http://www.mtxia.com/icons/mtxia.gif alt="Mt Xia Logo">
</center>
>>

Mt Xia Logo

Passthrough code can also be inline as in this example.

Any other HTML outside this boundaries is escaped.

Tables

But where txt2html is really awesome is rendering tables. They are created using the + (plus) sign for corners, the - (hyphen) for horizontal lines and the | (pipe) for vertical lines. So this is a table:

Band Name
second Band Name
third Band Name
Album Name
second Album Name
third Album Name
Number of Songs
second Songs
third Songs
Dead Can Dance
second line
A Passage in Time
second passage
16
216
Bel Canto White-Out Conditions 10
Depeche Mode Speak and Spell 16
Love Spirals Downwards Temporal 13

One or more header rows can be imbedded in a table by marking the header row with an exclamation point (!) immediately following the first pipe "|!" designating a data row. Only one "!" is necessary in the first cell, however every cell in a header row may be designated using a "!" for consistency, if desired. A header row may also be marked by using an asterisk (*) instead of a plus sign (+) to mark the cell divisions on the table border line above each data cell.

Band Name
second Band Name
third Band Name
Album Name
second Album Name
third Album Name
Number of Songs
second Songs
third Songs
Dead Can Dance A Passage in Time 16
Bel Canto White-Out Conditions 10
Depeche Mode Speak and Spell 16
Love Spirals Downwards Temporal 13

The following is a table with multiple header lines identified.

Head 1 Head 2 Head 3 Head 4
Cell 1-1 Cell 1-2 Cell 1-3 Cell 1-4
Cell 2-1 Cell 2-2 Cell 2-3 Cell 2-4
Cell 3-1 Cell 3-2 Cell 3-3 Cell 3-4
! Head 5 Head 6 Head 7 Head 8
Cell 4-1 Cell 4-2 Cell 4-3 Cell 4-4
Cell 5-1 Cell 5-2 Cell 5-3 Cell 5-4
Cell 6-1 Cell 6-2 Cell 6-3 Cell 6-4

Separators

A separator line (horizontal ruler) can be inserted by typing four or more hash marks (#) on a line. To the end of this document there should be a separator, above my signature.


Dana French http://www.mtxia.com