PDA

View Full Version : Is there anything to convert FF.net's pages into plain text?


david9
02-12-2006, 06:52 PM
While at work or in class I get really bored and need something to read. This is where the handy dandy notes function of my iPod comes in. I can convert, using this (http://www.ambience.sk/ipod-ebook-creator/), plain text files to a format (basically plain text, with hyperlinks between the parsed chapters) that I am able to read on my iPod.

The problem is that if I try just a plain html stripper, I will end up with the following text on the top of every page, which is very annoying:

Just In . C2 . Search . Forums . Directory . Dictionary Log In . Register
Books Harry Potter Stupid Is As Stupid Does B s : A A A
Author: sugerplumfairy
Fiction Rated: T - English - General/Romance - Reviews: 393 - Published: 05-26-05 - Updated: 01/28/06 - 15 days ago

Has anybody created a tool to convert FF's pages to plain text or am I just SOL?

Edit: Before anybody says something, yes I realize I could just copy/paste all the text into notepad, but when you have a couple hundred files you want to convert, it would take forever.

bornagainpenguin
02-12-2006, 07:03 PM
Before anybody says something, yes I realize I could just copy/paste all the text into notepad, but when you have a couple hundred files you want to convert, it would take forever.

Hear! Hear! This is one of the main reasons I had to upgrade to a newer Palm recently I got tired of having to basically read the begining and the end of every chapter of a fic I was converting for use in my eBook readers, and if I left all the extra crap in the file I ran out of space waaaay too quickly! These days I just keep it all there knowing that I have plenty of space on my 256mbs SD card, but it'd be nice to be able to automate this stuff and make nice little eBooks of my favorite fics without taking hours to do so...per fic...

--bornagainpenguin (who is still a beleiver in hard-disk real estate having started out on 800kb floppies and recalls when a 1.0 gigabyte hard drive was considered HUGE... :lol: )

david9
02-12-2006, 07:24 PM
Before anybody says something, yes I realize I could just copy/paste all the text into notepad, but when you have a couple hundred files you want to convert, it would take forever.

Hear! Hear! This is one of the main reasons I had to upgrade to a newer Palm recently I got tired of having to basically read the begining and the end of every chapter of a fic I was converting for use in my eBook readers, and if I left all the extra crap in the file I ran out of space waaaay too quickly! These days I just keep it all there knowing that I have plenty of space on my 256mbs SD card, but it'd be nice to be able to automate this stuff and make nice little eBooks of my favorite fics without taking hours to do so...per fic...

--bornagainpenguin (who is still a beleiver in hard-disk real estate having started out on 800kb floppies and recalls when a 1.0 gigabyte hard drive was considered HUGE... :lol: )Well I have a bit more space (40gigs), but the main problem is screen space because the header ends up taking up a couple pages..

Zevrillion
02-14-2006, 11:31 AM
I just copy and paste, but it only takes like 4 minutes to copy a 30 chapter long story. So it's not that big of a problem.

bornagainpenguin
02-14-2006, 12:29 PM
I just copy and paste, but it only takes like 4 minutes to copy a 30 chapter long story. So it's not that big of a problem.

Yeah but a simple cut and paste usually loses all the linebreaks and other assorted formatting. Back when I was still all gung-ho about making the perfect eBooks I used to open the html file as a text file and do my best to cut off the top and buttom of the html and remove the javascripts. These days I don't torment myself like that anymore but its still annoying to be in the middle of a great fic and suddenly lose my train of thought when the next chapter starts and there's all this header crap.

Does no one know?

--bornagainpenguin (who'd ask this in the anime forums he's a member of but he left the group when the administrator started threatening to ban his IP because he was annoyed at him blocking the ad banners... bornagainpenguin doesn't take to threats well. :twisted: Besides I was one of maybe twenty members who showed up there quite a falling for a group that used to have well over fifty regulars and posted well over hundreds of messages a day...)

Zevrillion
02-14-2006, 01:44 PM
I just copy and paste, but it only takes like 4 minutes to copy a 30 chapter long story. So it's not that big of a problem.

Yeah but a simple cut and paste usually loses all the linebreaks and other assorted formatting. Back when I was still all gung-ho about making the perfect eBooks I used to open the html file as a text file and do my best to cut off the top and buttom of the html and remove the javascripts. These days I don't torment myself like that anymore but its still annoying to be in the middle of a great fic and suddenly lose my train of thought when the next chapter starts and there's all this header crap.

Does no one know?

--bornagainpenguin (who'd ask this in the anime forums he's a member of but he left the group when the administrator started threatening to ban his IP because he was annoyed at him blocking the ad banners... bornagainpenguin doesn't take to threats well. :twisted: Besides I was one of maybe twenty members who showed up there quite a falling for a group that used to have well over fifty regulars and posted well over hundreds of messages a day...)


I copy and paste into word. I use "documents to go" on my palm.

Screenshot (http://hem.bredband.net/andreas_sembrant/screenshot.v.1.png)

bornagainpenguin
02-14-2006, 02:45 PM
I copy and paste into word. I use "documents to go" on my palm.

Screenshot (http://hem.bredband.net/andreas_sembrant/screenshot.v.1.png)

Nice. It still doesn't fix the lost character formatting and the linebreaks tho' and that can be the difference between understanding something and totally mising the point. I actually have an OEM copy of Documents to Go for my Palm Zire 72S...I've never used it because I do not own MS-Word. I actually tend to use iSilioX more than anything these days, but I was a fan of MobiPocket back when I still had my iPaq 1915. I didn't bother to convert whjen I used that because the PocketPC edition was capable of reading the HTML files directly.

--bornagainpenguin

david9
02-14-2006, 10:33 PM
I just copy and paste, but it only takes like 4 minutes to copy a 30 chapter long story. So it's not that big of a problem.I have about 1200 files that I've downloaded off FF.net(using a script), I don't want to have to spend 3 hours going through each one, then on top of that having to go back and fix things like spacing, line breaks, paragraph indentations,, FF's stupid quotes and instead of ", the ... character FF uses, the weird dash sign they use, etc that doesn't show up correctly.

Zevrillion
02-15-2006, 08:50 AM
I just copy and paste, but it only takes like 4 minutes to copy a 30 chapter long story. So it's not that big of a problem.I have about 1200 files that I've downloaded off FF.net(using a script), I don't want to have to spend 3 hours going through each one, then on top of that having to go back and fix things like spacing, line breaks, paragraph indentations,, FF's stupid quotes and instead of ", the ... character FF uses, the weird dash sign they use, etc that doesn't show up correctly.

Never noticed those things.

bornagainpenguin
02-15-2006, 09:25 AM
I have about 1200 files that I've downloaded off FF.net(using a script), I don't want to have to spend 3 hours going through each one, then on top of that having to go back and fix things like spacing, line breaks, paragraph indentations,, FF's stupid quotes and instead of ", the ... character FF uses, the weird dash sign they use, etc that doesn't show up correctly.

Gah! I thought I was the only one who noticed that! Thank Ghu for metapad (http://www.liquidninja.com/metapad/)'s search and replace feature and its compatibility with my wheel mouse! Again though, I'd love it if there were a way to automate this type of thing.

Anyone?

--bornagainpenguin

david9
02-15-2006, 02:38 PM
I have about 1200 files that I've downloaded off FF.net(using a script), I don't want to have to spend 3 hours going through each one, then on top of that having to go back and fix things like spacing, line breaks, paragraph indentations, FF's stupid quotes and instead of ", the ... character FF uses, the weird dash sign they use, etc that doesn't show up correctly.

Gah! I thought I was the only one who noticed that! Thank Ghu for metapad (http://www.liquidninja.com/metapad/)'s search and replace feature and its compatibility with my wheel mouse! Again though, I'd love it if there were a way to automate this type of thing.

Anyone?

--bornagainpenguinI use a macro for the weird characters, but then again I use textpad not metapad.

Cervus
02-19-2006, 12:58 AM
I've had a few days off work (and got bored) so I thought I dust off the old programming fingers and see if I could write a program to do this. It only works on chapters downloaded via De-FFNet-izer, though.

It basically changes this, Before (http://www.geocities.com/cervus0123/Before.jpg)
Into this, After (http://www.geocities.com/cervus0123/After.jpg)
and then saves it as a .txt file.

Sorry about the picture quality but I didn't want the file size to be too large.

Unfortunately all the dashes are a necessary evil, I can't keep the text formatted into paragraphs any other way. It also, unfortunately, only does one file at a time. It replaces the page break bars with a line like so; -_-_-_-_-. None of ff.net's HTML remains anywhere in the text.

If it would be any use to you I can upload it once I've made some finishing touches, if not it's not it doesn't matter. It kept me from being bored and that's all that counts.

bornagainpenguin
02-19-2006, 02:00 AM
I've had a few days off work (and got bored) so I thought I dust off the old programming fingers and see if I could write a program to do this. It only works on chapters downloaded via De-FFNet-izer, though.

It basically changes this, Before (http://www.geocities.com/cervus0123/Before.jpg)
Into this, After (http://www.geocities.com/cervus0123/After.jpg)
and then saves it as a .txt file.

Sorry about the picture quality but I didn't want the file size to be too large.

Unfortunately all the dashes are a necessary evil, I can't keep the text formatted into paragraphs any other way. It also, unfortunately, only does one file at a time. It replaces the page break bars with a line like so; -_-_-_-_-. None of ff.net's HTML remains anywhere in the text.

If it would be any use to you I can upload it once I've made some finishing touches, if not it's not it doesn't matter. It kept me from being bored and that's all that counts.

I'd like to give it a go, if you don't mind emailing it or posting a link to it online! If nothing else I might be able to do a search and replace much easier than before! The only thing is (and yes I know that david9 wanted it that way, but) I don't like the fact that by converting to plain text you lose formatting intalics and bolding text. Is there anyway you could make stripping those attributes optional? As in one pass for HTML cleanup (remove FF.not javascript and assorted junk while leaving the file as basic html) the other would be an option to do a full scan trhough and output as plain text?

--bornagainpenguin (who is happy that you chose this to do while bored!)

Cervus
02-19-2006, 02:14 AM
I suppose I could do that. I'll make it so that you can choose between stripping everything and saving it as a .txt file or leaving some HTML behind and saving it as an HTML file. I think with the HTML remaining it won't need all those dashes to keep the paragraphs formatted either, so it'll look a lot neater.

Give me a couple of days and I'll see what I can do.

bornagainpenguin
02-19-2006, 02:16 AM
Give me a couple of days and I'll see what I can do.

Okay, thanks!

--bornagainpenguin

Lord Ravenclaw
02-19-2006, 02:39 AM
I'll look into writing a simple PHP app for it.

Cervus
02-20-2006, 10:23 AM
I've managed to make it keep the text formatted into paragraphs without the need for all the dashes.

It now looks like this (http://www.geocities.com/cervus0123/After2.jpg)
The screen shot was taken from WordPad. Except for NotePad it's kept the formatting with every program I've tried it with, notepad still displays it as one continuous block of text though.

The only thing that is causing me problems now is the speech marks ff.net uses. When saved after being stripped of all the HTML and converted to a text document the speech marks show up like “ for opening speech and ” (followed by a little square box thing that isn't showing up on this site) for closing speech. I'm trying figure it out now.

I've still not coded it to save as a basic HTML, though it shouldn't take too long to do. As soon as I've figured this speech mark problem out I'll start with the basic HTML page and then upload it somewhere.

david9
02-20-2006, 11:34 AM
I've managed to make it keep the text formatted into paragraphs without the need for all the dashes.

It now looks like this (http://www.geocities.com/cervus0123/After2.jpg)
The screen shot was taken from WordPad. Except for NotePad it's kept the formatting with every program I've tried it with, notepad still displays it as one continuous block of text though.

The only thing that is causing me problems now is the speech marks ff.net uses. When saved after being stripped of all the HTML and converted to a text document the speech marks show up like “ for opening speech and ” (followed by a little square box thing that isn't showing up on this site) for closing speech. I'm trying figure it out now.

I've still not coded it to save as a basic HTML, though it shouldn't take too long to do. As soon as I've figured this speech mark problem out I'll start with the basic HTML page and then upload it somewhere.I made a "stripped" template for DeFFNetizer which just contains the following:

%(chapname)s

%(body)s


Still trying to find a program that will convert
,
, and

to line breaks and strip everything else. Also for some reason some stories keep those stupid quotes and apostrophes, and others dont even when using DeFFN... Nothing is simple isi it?

Cervus
02-20-2006, 07:25 PM
Still trying to find a program that will convert
,
, and

to line breaks and strip everything else. Also for some reason some stories keep those stupid quotes and apostrophes, and others dont even when using DeFFN... Nothing is simple isi it?

I've managed to convert the
</br> and

tags so that any text that follows starts on a new line, I've coded it to do two carriage returns so that there's a gap between paragraphs. I've managed to change the single quote to one that is not so stupid as well.

The problem that's presented itself is with one set of speech marks. Some of the fics on ff.net use strange speech marks that I'm having trouble with. My own fic, as well as Dark Syaoran's, are two of these. Twists of Fate by Lanindur Du'Undarian uses a different set of speech marks that are fine and easily written to a text file, though. Then to complicate matters even further, Harry's Madness by SilverLocke980 uses a different set of speech marks again, these have to be changed before the text file is saved. I swear ff.net is running some sort of absurd speech mark collection website.

So far I've ran across three different sets of speech marks and for all I know there could be more.