Tool of the day: lynx

lynx is a browser that renders html files as text, optionally using color and bolding. A client needed a word count of a bunch of html files and after installing lynx I used it in batch mode:
lynx *.html --dump --nolist | wc
The –dump option dumps the returned text rather than present it in an interactive fashion; the –nolist option prevents all of the anchor tags from being listed at the end of each document. The pipe character pipes the result of the function to the wc (word count) function that displays the number of lines, words and total characters of what it was fed. Following the UNIX principle of “small tools, loosely joined” this function could easily be included in a larger one, with perhaps sed or cut to isolate and format just the one number of interest, or the –word option added to the wc command to return only the word count. There are many ways to do it, another very Good Thing.

, , , ,

No comments yet.

Leave a Reply

Powered by WordPress. Designed by Woo Themes

This work by Ted Roche is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States.