On Mon, 27 Jan 1997, Fons Rademakers wrote: > I am sure there must be standard programs out there that > do exactly that: stripping HTML out of a file. Probably > some small Perl script will do. Anybody has such a script? > > Cheers, Fons. > > I found such a script converting html to ascii. It seems to be not copyrighted :-) Frank _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ _/_/_/ eMail: Frank.Ziegler@cern.ch _/_/_/ _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ -------- start --------- #!/bin/sh # # Name: html2ascii # # Description: # Convert HTML files to ascii. Makes empty lines empty. # # Exit Values: # 0 Successful completion. # 1 # # Inputs: # 0 or more files (stdin is read for 0 files) # # Outputs: # # Environment: # DEBUG is list of scripts to debug # DEBUG_log is log file for debugging output # # Modified by: # 11/01/95 Howard Fear Original # # Notes: # 01/01/96 Howard Fear Change LYNX to be correct on your system. # LYNX=/usr/local/bin/lynx set -h # remember functions self="`basename $0`" die () { echo ${1:+"$self: $*"} >&2; kill -TERM $$; } warn () { echo ${1:+"$self: $*"} >&2; } usage () { echo "usage: $self files..." >&2; exit 1; } : ${TMPDIR:=/tmp} trap 'rm core >/dev/null 2>&1; exit 1' 0 15 case ":${DEBUG:=$debug}:" in *:$self:* | :all: | :true: | :on: ) set -vx test "$DEBUG_LOG" && exec 2>>$DEBUG_LOG ;; esac # use stdin if no files present test $# -eq 0 && set -- - for i do case "$i" in - ) # process stdin # lynx occasionally core dumps so we remove core as well trap "rm $TMPDIR/$$.html core >/dev/null 2>&1" 0 15 touch $TMPDIR/$$.html || { warn "Could not create $TMPDIR/$$.html" continue } while read line; do echo "$line" >> $TMPDIR/$$.html; done filename="$TMPDIR/$$.html" ;; * ) filename="$i" ;; esac ( $LYNX -dump $filename || die "Could not index file $filename" ) \ | sed -e 's/[ ]*$//' echo done exit 0 ------ schnipp -----
This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:26:17 MET