Tech – for Everyone

Tech Tips and Tricks & Advice – written in plain English.

Extracting text from Web pages*

Today’s quick tip was inspired by a reader question. The gentleman used to use an old technique to “print” webpages to text files so that he could edit and incorporate the text into his documents, and he wanted to know if he could still do this, but in a more modern way.
I would like to take a moment here to remind my readers that I do answer questions sent to me; and also that if I believe the question-and-answer will benefit “everyone”, you could very well see it posted here.

Q: How do I copy the text on a webpage to my document?
A: There is actually a couple of different ways to do this, including the old “print-to-file” method that DOS users remember. The trick is to get just the text and information you want, and not all the advertising and hyperlinks and graphics/logos that most webpages incorporate.

1) If all you need is a small portion of text from a webpage, the easiest way to get it from your browser to your word processor is to ‘highlight’ the sentence (or paragraph) on the webpage, press Ctrl+C to Copy, click on the place in your document that you’d like to insert the text and hit Ctrl+V to Paste the selection into your document (you may have to change the font and text size to match the rest of your document’s format).
Sometimes, it can be a little tricky — working in the browser — getting your cursor to change from an arrow (navigation) to the vertical bar and selecting the page’s text. But rest assured that you can ‘select’ the text on a webpage. Usually you have to get the point of the arrow very close the edge of the first letter, and make small, gentle mouse movements until the cursor changes. You could also try clicking in an easier part of the text, and use your arrow keys to move the cursor to where you want it.
(As a writer, I simply must express my hope that you will pay some mind to the concept of Copyrights, and original work, and properly attribute your “borrowed” material.)

2) But if you want all the information on the webpage, and you want it to be available as a file you can reference at your leisure, the Copy>Paste method is not the best and another technique will serve you better.
Some people prefer to download the webpages in a method called “Offline webpages”, which is a whole ‘nother topic. Offline gives you the whole webpage — logos/graphics, links, ads — as if you were connected to the Internet, and this is more info than we need for today’s topic… we just want the text.

In Firefox and the older Internet Explorer 6 (Please, folks; IE 6 is quite probably the most hacked program ever written– update to IE7, or use an “alternative” browser), you can click on the “File” menu on your browser’s toolbar. IE7 users (who haven’t re-enabled the old Menu bar) should click on the “Page” button. Whichever manner you used, now click on “Save As”.
pgopts.jpg

Now the Save As window will open, and here is where we will make our important decisions.
sa.jpg

As usual, you will be presented with the ability to select the “where” the file will be Saved, and give it a name. But the primary thing is to select the “Save as type”, so that we will have a file we can use as we want to– in this case, a text file (.txt).
Once the webpage is Saved as a text file, you will be able to Open it with any word processor. And you will be able to edit it to your heart’s content.. and it will be available whenever you need it.

*If you decide to Save the webpage as one of the other options in the “file type” (or, made a mistake here) selection, and Save the page as an *.htm,*html file or even a “archive”, you will still be able to Open it with a word processor [by default, it will open with your browser] and edit it… it will just contain a whole bunch of junk-looking code, as well as the text you want.

Today’s free link: I am not a real big fan of free all-in-one “optimization” programs, but I do have one that I like and can recommend. Advanced WindowsCare Personal. From publisher: “is a comprehensive PC care utility that takes an one-click approach to help protect, repair and optimize your computer. It provides an all-in-one and super convenient solution for PC maintenance and protection.” (Vista compatible.)

Copyright 2007-8 © Tech Paul. All rights reserved.jaanix post to jaanix

Share this post :

August 26, 2008 - Posted by | advice, computers, how to, IE 7, Internet, software, tech | , , , , , , ,

3 Comments »

  1. Hey Paul,

    One more tip to add to your advice on copying and pasting web page text – when pasting in a word processor, open “Edit” in the task bar – select “Paste Special” and in the Paste Special window select “Unformatted Text” then click “OK”. In this way, you’ll save a lot of work having to reformat to match the existing content of the document.

    Thanks for the great advice.

    BM

    Like

    Comment by Bill Mullins | August 26, 2008 | Reply

  2. I love my Firefox Abduction Addon to get a “screenshot” of large webpages. Doesn’t work for the text to text but captures the entire page for future reference and/or cropping.

    I’ll often copy/paste into Notepad to remove html code and/or formatting I don’t need, then on to whatever editor I’m using.

    Like

    Comment by gadzooks64 | August 26, 2008 | Reply

  3. Gadzooks64–
    Yes, Abduction is a winner.

    The Copy >Paste >edit >Copy >Paste works, but.. tools like EverNote or a “clipboard extender” may be a better way to go if you find yourself doing this often.

    Like

    Comment by techpaul | August 26, 2008 | Reply


Post your Comment/Question

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: