How To Extract Text From Web Pages*
Today’s quick tip was inspired by a reader question. The gentleman used to use an old technique to “print” webpages to text files so that he could edit and incorporate the text into his documents, and he wanted to know if he could still do this, but in a more modern way.
I would like to take a moment here to remind my readers that I do answer questions sent to me; and also that if I believe the question-and-answer will benefit “everyone”, you could very well see it posted here.. like today’s.
Q: How do I copy the text on a webpage to my document?
A: There is actually a couple of different ways to do this, including the old “print-to-file” method that DOS users remember. The trick is to get just the text and information you want, and not all the advertising and hyperlinks and graphics/logos that most webpages incorporate.
1) If all you need is a small portion of text from a webpage, the easiest way to get it from your browser to your word processor is to ‘highlight’ the sentence (or paragraph) on the webpage, press Ctrl+C to Copy, click on the place in your document that you’d like to insert the text and hit Ctrl+V to Paste the selection into your document (you may have to change the font and text size to match the rest of your document’s format).
Sometimes, it can be a little tricky — working in the browser — getting your cursor to change from an arrow (navigation) to the vertical bar and selecting the page’s text. But rest assured that you can ’select’ the text on a webpage. Usually you have to get the point of the arrow very close the edge of the first letter, and make small, gentle mouse movements until the cursor changes. You could also try clicking in an easier part of the text, and use your arrow keys to move the cursor to where you want it.
(As a writer, I simply must express my hope that you will pay some mind to the concept of Copyrights, and original work, and properly attribute your “borrowed” material.)
2) But if you want all the information on the webpage, and you want it to be available as a file you can reference at your leisure, the Copy>Paste method is not the best, and another technique will serve you better.
Some people prefer to download the webpages in a method called “Offline webpages”, which is a whole ‘nother topic. Offline gives you the whole webpage — logos/graphics, links, ads — as if you were connected to the Internet, and this is more info than we need for today’s topic… we just want the text.
In Firefox and the older Internet Explorer 6 (Please, folks; IE 6 is quite probably the most hacked program ever written– update to IE8, or use an “alternative” browser), you can click on the “File” menu on your browser’s toolbar. IE7 users (who haven’t re-enabled the old Menu bar) should click on the “Page” button. Whichever manner you used, now click on “Save As”.
Now the Save As window will open, and here is where we will make our important decisions.
As usual, you will be presented with the ability to select the “where” the file will be Saved, and give it a name. But the primary thing is to select the “Save as type”, so that we will have a file we can use as we want to– in this case, a text file (.txt).
Once the webpage is Saved as a text file, you will be able to Open it with any word processor. And you will be able to edit it to your heart’s content.. and it will be available whenever you need it.
Note: If you decide to Save the webpage as one of the other options in the “file type” (or, made a mistake here) selection, and Save the page as an *.htm,*html file or even a “archive”, you will still be able to Open it with a word processor [by default, it will open with your browser, so right-click on the file and choose “Open with” and then click your word processor] and edit it… it will just contain a whole bunch of junk-looking code, as well as the text you want.
Today’s free download: I am not a real big fan of free all-in-one “optimization” programs, but I do have one that I like, use (occasionally), and can recommend. Advanced WindowsCare Personal From publisher: “is a comprehensive PC care utility that takes an one-click approach to help protect, repair and optimize your computer. It provides an all-in-one and super convenient solution for PC maintenance and protection.”
Copyright 2007-9 © Tech Paul. All rights reserved. post to jaanix
Share this post : | ![]()
|
![]()
|
![]()
|
|
![]()
|
![]()
|
![]()
|
![]()
|
![]()
|
![]()
|
![]() |
How To Play Online At Work
Don Reisinger over at C/Net writes, “We all spend some of our time at work doing things that have nothing to do with our job. We surf the Web. We play games. Sure, we all need our downtime, and the enlightened manager knows that. But still, we’d rather just surf in private than deal with the raised eyebrows.
That’s why we need ways to ensure that when our boss surprises us or sneaks up behind us, she’ll think that we’re actually working.”
Now, I hope you won’t think me too much of a curmudgeon, but…
* At a recent major IT “expo”, the keynote speaker kicked off the show by saying, “IT lost 100,000 jobs last month but..”
* Far more companies are downsizing than are hiring, and there are now more job seekers-per-job-opening than ever before in history.
* You’re not twelve any more.
* No amount of camouflage or quick Alt-Tab-ing is going to fool your machine’s logs, or the corporate screen-capture “productivity” monitoring software.
However, if you want to see the latest Fool The Boss tools, click here. There’s some new ones on me.
(And.. here’s another thought; maybe your boss saw this article too..)
Copyright 2007-9 © Tech Paul. All rights reserved. post to jaanix
Share this post : | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Scare Tactics
The shadow Internet economy is worth over $105 billion. Online crime is bigger than the global drugs trade√. No country, no person, no business and no government is immune from CyberCrime.
Currently there is an epidemic of fake anti-malware software on the Internet– which is collectively called “rogue anti-malware“. Marketed under hundreds of different names, such as VirusRemover 2008 and Antivirus XP 2009, this type of rogue software scares people by giving false alarms, and then tries to deceive them into paying for removal of non-existing malware.
This video (produced by the good folks at WOT) shows what happens when a legitimate site gets infected and redirected to one of these bogus anti-malware scams.
Yes, folks, legitimate websites are being ‘hacked’.
The people behind this scourge use many different ways to try to entice you to click– realistic looking pop-up windows appear, offers of “free trials” arrive in e-mail, and “free scan” buttons on legit-looking ‘fight malware’ websites.. the means are quite varied!
As this video shows, the user is tricked into (scared into, really) providing their credit card # to clean infections that weren’t there before they clicked and aren’t really there now.
* The ‘false positives’ are not “cleaned” BUT, more adware and spyware is installed.
* A good percentage of my calls at Aplus Computer Aid are folks needing help with getting rid of these rogues. Because these clever programs use the latest techniques to combat removal, and it can be quite tough — if not impossible — to truly remove them.. without formatting your hard-drive.
* For more, please read Is that anti-spyware program really spyware?
* One Website dedicated to combating this epidemic is Spyware Warrior. It has a pretty good list of known rogues, and much more detailed information. Another excellent resource is Bleeping Computer.
* I have written several How-To’s on protecting yourself from malware, and how to clean your machines as well. Click here to see those titles.
√ From the new MessageLabs whitepaper. (This eye-opening report provides a disturbing look into the ‘dark’ world of cyber-crime. This link is the online version.. you need to scroll a bit..)
Today’s free download: WOT is a free Internet security addon for your browser. It will help keep you safe(r) from online scams, identity theft, spyware, spam, viruses and unreliable shopping sites. WOT warns you before you interact with a risky Website. It’s easy and it’s free.
- Ratings for over 20 million websites
- Downloaded 1 million times
- The WOT browser addon is light and updates automatically
- WOT rating icons appear beside search results in Google, Yahoo!, Wikipedia, Gmail, etc.
- Settings can be customized to better protect your family
- WOT Security Scorecard shows rating details and user comments
Copyright 2007-8 © Tech Paul. All rights reserved. post to jaanix
Share this post : | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |