1. DLP Flash Christmas Competition + Writing Marathon 2024!

    Competition topic: Magical New Year!

    Marathon goal? Crank out words!

    Check the marathon thread or competition thread for details.

    Dismiss Notice
  2. Hi there, Guest

    Only registered users can really experience what DLP has to offer. Many forums are only accessible if you have an account. Why don't you register?
    Dismiss Notice
  3. Introducing for your Perusing Pleasure

    New Thread Thursday
    +
    Shit Post Sunday

    READ ME
    Dismiss Notice

FFN HTML Stripper

Discussion in 'General Discussion' started by Cervus, Feb 22, 2006.

  1. Cervus

    Cervus Raptured to Hell

    Joined:
    Aug 29, 2005
    Messages:
    849
    Location:
    Manchester, England.
    In a different thread bornagainpenguin and david9 were looking for a program capable of converting ff.net’s html files to plain text files. Because I was bored I decided to try my hand at making a program that could do this. I’ve now finished and am making it available for download.

    Screenshots
    http://www.geocities.com/cervus0123/Screen1.jpg
    http://www.geocities.com/cervus0123/Screen2.jpg
    http://www.geocities.com/cervus0123/Screen3.jpg

    The FFN HTML Stripper can be downloaded from the link below.

    http://rapidshare.de/files/13904314/FFN_HTML_Stripper_v1.zip.html
    (Click the free button on the page that loads, then enter the code shown into the text box and click the download button next to it.)

    The program can convert chapters downloaded with the De-FFNet-izer (download available at http://deffnet.blogspot.com/ ). It doesn’t work on fics manually downloaded from ff.net at this point.

    The De-FFNet-izer html files can be converted to an rtf file or a basic htm webpage.

    When converted to an rtf file the original paragraph structure of the fic is retained. Emboldened, italicized, or underlined text, however, isn’t.

    When converted to an HTM file all unneeded HTML and JAVA tags are removed, original paragraph structure is retained, and any emboldened, italicized, or underlined text is also retained.

    Once converted to .htm these files are totally independent from the css stylesheets, images, or java scripts saved on your computer by the De-FFNet-izer. This means that they can now be deleted along with the original .html files (the FFN HTML Stripper saves files as .htm so as to make differentiating between De-FFNet-izer files and the newly stripped files easier. DO NOT DELETE THE .HTM FILES). This means that as well as stripping any unneeded html and java tags it also takes only half the hard drive space store them.

    I’ve done some testing on the fics in bornagainpenguin’s achieve. Before being stripped all files in the Azkaban section had a total file size of 10.8 megabytes, once stripped, however, the total file size was reduced to 5.32 megabytes. The Dark Arts section was a total of 11.7mb, once converted it was 5.49 megabytes. The Dark but Lightsided section had a total file size of 20.8 megabytes, which was reduced to a total of 12.4 megabytes. Similarly when converted to an rtf file all six chapters of my fic were reduced from a total file size of 692kb to 316kb.


    HOW TO USE.
    First download and install the program (Duh!). The program can be downloaded here and is only 333kb.

    Once installed a shortcut will be placed on your start menu. Use this to start the program.

    Once the application is loaded click the ‘Open’ button, locate the file you want stripped or converted and again click open (or double click on the file). If a file as been opened the html source code will appear in the box on the right.

    Now simply click on ‘strip to .htm’ or ‘strip to .rtf’ and in the save box that appears locate the directory you wish to save to, give it a filename, and click save. (Personally, to save time I save the new file to the same directory as the original and once all the chapters are stripped delete the .html files, css stylesheet, java scripts, and images for that fic.)

    KNOWN ISSUES.
    Sometimes 'codes' such as “ will appear in place of a grammar character. This is because ff.net doesn’t use one specific character set, but many different ones. If this happens in any of your chapters please reply here giving the code that appears as well as, if possible, the FF.net story ID and the chapter number of the story it appears in. I believe I’ve eliminated a good majority of these ‘codes’, but there may be more in fics the program hasn’t been tested on.

    Rtf files don’t retain bold, italicised, or underlined text.

    Only one file can be converted at a time.

    Only works on files downloaded with the De-FFNet-izer.

    Uninstall
    FFN HTML Stripper can be removed from your system at anytime through the Add or Remove programs section of the Control Panel.

    If anybody wishes to comment or ask a question feel free to post them here.

    Cervus.
     
  2. Dark Syaoran

    Dark Syaoran No. 4 Admin

    Joined:
    Jun 4, 2005
    Messages:
    6,141
    Gender:
    Male
    Wow... good stuff Cervus.
     
  3. LINKed up

    LINKed up Chief Warlock DLP Supporter

    Joined:
    Jan 28, 2006
    Messages:
    1,406
    Location:
    A certain place in a certain area of space-time, a
    Nice job there Cervus. Very nice job.
     
  4. Tinder

    Tinder Seventh Year

    Joined:
    Feb 3, 2006
    Messages:
    262
    Nicely explained, but this topic has a slightly misleading title. Not that I was getting my hopes up or anything...
     
  5. Xanatos

    Xanatos Professor

    Joined:
    Aug 26, 2005
    Messages:
    470
    Location:
    Edinburgh
    :( i thought it was something else too
     
  6. david9

    david9 Banned

    Joined:
    Jan 30, 2006
    Messages:
    224
    i emailed the author of deffn, he said there would be a way to rip it straight to plain text in the future, but he also said that wasn't going to be done for a few months...