lynx is a browser that renders html files as text, optionally using color and bolding. A client needed a word count of a bunch of html files and after installing lynx I used it in batch mode:
lynx *.html --dump --nolist | wc
The –dump option dumps the returned text rather than present it in an interactive fashion; the –nolist option prevents all of the anchor tags from being listed at the end of each document. The pipe character pipes the result of the function to the wc (word count) function that displays the number of lines, words and total characters of what it was fed. Following the UNIX principle of “small tools, loosely joined” this function could easily be included in a larger one, with perhaps sed or cut to isolate and format just the one number of interest, or the –word option added to the wc command to return only the word count. There are many ways to do it, another very Good Thing.
Tag Archives | GNU
Happy Halloween
Treats this year included getting some of my annual charitable contributions done before the 31st of December. A new contribution this year was to become a member of the Free Software Foundation, the group responsible for the GNU Public License, along with many other good works. Help them to help you. I did.