1. DLP Flash Christmas Competition + Writing Marathon 2024!

    Competition topic: Magical New Year!

    Marathon goal? Crank out words!

    Check the marathon thread or competition thread for details.

    Dismiss Notice
  2. Hi there, Guest

    Only registered users can really experience what DLP has to offer. Many forums are only accessible if you have an account. Why don't you register?
    Dismiss Notice
  3. Introducing for your Perusing Pleasure

    New Thread Thursday
    +
    Shit Post Sunday

    READ ME
    Dismiss Notice

Is there anything to convert FF.net's pages into plain text?

Discussion in 'General Discussion' started by david9, Feb 12, 2006.

  1. david9

    david9 Banned

    Joined:
    Jan 30, 2006
    Messages:
    224
    While at work or in class I get really bored and need something to read. This is where the handy dandy notes function of my iPod comes in. I can convert, using this, plain text files to a format (basically plain text, with hyperlinks between the parsed chapters) that I am able to read on my iPod.

    The problem is that if I try just a plain html stripper, I will end up with the following text on the top of every page, which is very annoying:

    Has anybody created a tool to convert FF's pages to plain text or am I just SOL?

    Edit: Before anybody says something, yes I realize I could just copy/paste all the text into notepad, but when you have a couple hundred files you want to convert, it would take forever.
     
  2. bornagainpenguin

    bornagainpenguin DLP Archivist

    Joined:
    Oct 27, 2005
    Messages:
    780
    Location:
    Illinois
    Re: Is there anything to convert FF.net's pages into plain t

    Hear! Hear! This is one of the main reasons I had to upgrade to a newer Palm recently I got tired of having to basically read the begining and the end of every chapter of a fic I was converting for use in my eBook readers, and if I left all the extra crap in the file I ran out of space waaaay too quickly! These days I just keep it all there knowing that I have plenty of space on my 256mbs SD card, but it'd be nice to be able to automate this stuff and make nice little eBooks of my favorite fics without taking hours to do so...per fic...

    --bornagainpenguin (who is still a beleiver in hard-disk real estate having started out on 800kb floppies and recalls when a 1.0 gigabyte hard drive was considered HUGE... :lol: )
     
  3. david9

    david9 Banned

    Joined:
    Jan 30, 2006
    Messages:
    224
    Re: Is there anything to convert FF.net's pages into plain t

    Well I have a bit more space (40gigs), but the main problem is screen space because the header ends up taking up a couple pages..
     
  4. Zevrillion

    Zevrillion Founder Retired Staff

    Joined:
    Apr 2, 2005
    Messages:
    753
    Location:
    New Zealand
    I just copy and paste, but it only takes like 4 minutes to copy a 30 chapter long story. So it's not that big of a problem.
     
  5. bornagainpenguin

    bornagainpenguin DLP Archivist

    Joined:
    Oct 27, 2005
    Messages:
    780
    Location:
    Illinois
    Yeah but a simple cut and paste usually loses all the linebreaks and other assorted formatting. Back when I was still all gung-ho about making the perfect eBooks I used to open the html file as a text file and do my best to cut off the top and buttom of the html and remove the javascripts. These days I don't torment myself like that anymore but its still annoying to be in the middle of a great fic and suddenly lose my train of thought when the next chapter starts and there's all this header crap.

    Does no one know?

    --bornagainpenguin (who'd ask this in the anime forums he's a member of but he left the group when the administrator started threatening to ban his IP because he was annoyed at him blocking the ad banners... bornagainpenguin doesn't take to threats well. :twisted: Besides I was one of maybe twenty members who showed up there quite a falling for a group that used to have well over fifty regulars and posted well over hundreds of messages a day...)
     
  6. Zevrillion

    Zevrillion Founder Retired Staff

    Joined:
    Apr 2, 2005
    Messages:
    753
    Location:
    New Zealand

    I copy and paste into word. I use "documents to go" on my palm.

    Screenshot
     
  7. bornagainpenguin

    bornagainpenguin DLP Archivist

    Joined:
    Oct 27, 2005
    Messages:
    780
    Location:
    Illinois
    Nice. It still doesn't fix the lost character formatting and the linebreaks tho' and that can be the difference between understanding something and totally mising the point. I actually have an OEM copy of Documents to Go for my Palm Zire 72S...I've never used it because I do not own MS-Word. I actually tend to use iSilioX more than anything these days, but I was a fan of MobiPocket back when I still had my iPaq 1915. I didn't bother to convert whjen I used that because the PocketPC edition was capable of reading the HTML files directly.

    --bornagainpenguin
     
  8. david9

    david9 Banned

    Joined:
    Jan 30, 2006
    Messages:
    224
    I have about 1200 files that I've downloaded off FF.net(using a script), I don't want to have to spend 3 hours going through each one, then on top of that having to go back and fix things like spacing, line breaks, paragraph indentations,, FF's stupid quotes “ and ” instead of ", the ... character FF uses, the weird dash sign they use, etc that doesn't show up correctly.
     
  9. Zevrillion

    Zevrillion Founder Retired Staff

    Joined:
    Apr 2, 2005
    Messages:
    753
    Location:
    New Zealand
    Never noticed those things.
     
  10. bornagainpenguin

    bornagainpenguin DLP Archivist

    Joined:
    Oct 27, 2005
    Messages:
    780
    Location:
    Illinois
    Gah! I thought I was the only one who noticed that! Thank Ghu for metapad's search and replace feature and its compatibility with my wheel mouse! Again though, I'd love it if there were a way to automate this type of thing.

    Anyone?

    --bornagainpenguin
     
  11. david9

    david9 Banned

    Joined:
    Jan 30, 2006
    Messages:
    224
    I use a macro for the weird characters, but then again I use textpad not metapad.
     
  12. Cervus

    Cervus Raptured to Hell

    Joined:
    Aug 29, 2005
    Messages:
    849
    Location:
    Manchester, England.
    I've had a few days off work (and got bored) so I thought I dust off the old programming fingers and see if I could write a program to do this. It only works on chapters downloaded via De-FFNet-izer, though.

    It basically changes this, Before
    Into this, After
    and then saves it as a .txt file.

    Sorry about the picture quality but I didn't want the file size to be too large.

    Unfortunately all the dashes are a necessary evil, I can't keep the text formatted into paragraphs any other way. It also, unfortunately, only does one file at a time. It replaces the page break bars with a line like so; -_-_-_-_-. None of ff.net's HTML remains anywhere in the text.

    If it would be any use to you I can upload it once I've made some finishing touches, if not it's not it doesn't matter. It kept me from being bored and that's all that counts.
     
  13. bornagainpenguin

    bornagainpenguin DLP Archivist

    Joined:
    Oct 27, 2005
    Messages:
    780
    Location:
    Illinois
    I'd like to give it a go, if you don't mind emailing it or posting a link to it online! If nothing else I might be able to do a search and replace much easier than before! The only thing is (and yes I know that david9 wanted it that way, but) I don't like the fact that by converting to plain text you lose formatting intalics and bolding text. Is there anyway you could make stripping those attributes optional? As in one pass for HTML cleanup (remove FF.not javascript and assorted junk while leaving the file as basic html) the other would be an option to do a full scan trhough and output as plain text?

    --bornagainpenguin (who is happy that you chose this to do while bored!)
     
  14. Cervus

    Cervus Raptured to Hell

    Joined:
    Aug 29, 2005
    Messages:
    849
    Location:
    Manchester, England.
    I suppose I could do that. I'll make it so that you can choose between stripping everything and saving it as a .txt file or leaving some HTML behind and saving it as an HTML file. I think with the HTML remaining it won't need all those dashes to keep the paragraphs formatted either, so it'll look a lot neater.

    Give me a couple of days and I'll see what I can do.
     
  15. bornagainpenguin

    bornagainpenguin DLP Archivist

    Joined:
    Oct 27, 2005
    Messages:
    780
    Location:
    Illinois
    Okay, thanks!

    --bornagainpenguin
     
  16. Lord Ravenclaw

    Lord Ravenclaw DLP Overlord Admin DLP Supporter

    Joined:
    Apr 2, 2005
    Messages:
    4,372
    Location:
    Denver, CO
    I'll look into writing a simple PHP app for it.
     
  17. Cervus

    Cervus Raptured to Hell

    Joined:
    Aug 29, 2005
    Messages:
    849
    Location:
    Manchester, England.
    I've managed to make it keep the text formatted into paragraphs without the need for all the dashes.

    It now looks like this
    The screen shot was taken from WordPad. Except for NotePad it's kept the formatting with every program I've tried it with, notepad still displays it as one continuous block of text though.

    The only thing that is causing me problems now is the speech marks ff.net uses. When saved after being stripped of all the HTML and converted to a text document the speech marks show up like “ for opening speech and â€￾ (followed by a little square box thing that isn't showing up on this site) for closing speech. I'm trying figure it out now.

    I've still not coded it to save as a basic HTML, though it shouldn't take too long to do. As soon as I've figured this speech mark problem out I'll start with the basic HTML page and then upload it somewhere.
     
  18. david9

    david9 Banned

    Joined:
    Jan 30, 2006
    Messages:
    224
    I made a "stripped" template for DeFFNetizer which just contains the following:
    Still trying to find a program that will convert
    ,
    , and

    to line breaks and strip everything else. Also for some reason some stories keep those stupid quotes and apostrophes, and others dont even when using DeFFN... Nothing is simple isi it?
     
  19. Cervus

    Cervus Raptured to Hell

    Joined:
    Aug 29, 2005
    Messages:
    849
    Location:
    Manchester, England.
    I've managed to convert the
    </br> and

    tags so that any text that follows starts on a new line, I've coded it to do two carriage returns so that there's a gap between paragraphs. I've managed to change the single quote to one that is not so stupid as well.

    The problem that's presented itself is with one set of speech marks. Some of the fics on ff.net use strange speech marks that I'm having trouble with. My own fic, as well as Dark Syaoran's, are two of these. Twists of Fate by Lanindur Du'Undarian uses a different set of speech marks that are fine and easily written to a text file, though. Then to complicate matters even further, Harry's Madness by SilverLocke980 uses a different set of speech marks again, these have to be changed before the text file is saved. I swear ff.net is running some sort of absurd speech mark collection website.

    So far I've ran across three different sets of speech marks and for all I know there could be more.
     
Loading...