To Site Search or to Google: That is the Question (September 7, 2010)

Posted: September 7, 2010 by docbabad in Apple Mac Tips, Multiple Products, Software Reviews
Tags: , , , , , , , , , , , , ,

By Harry {doc} Babad, © Copyright 2010, All Rights Reserved.

Introduction

In a moment of passing enlightenment, I finally figured out how to speed of my almost daily searches for explicitly selected articles I wanted to collect and archive. These are the items I tabbed for downloading while reading my paper magazine subscriptions. The consolidated information in most downloaded articles appears as illustrated below.

However the articles as published in the paper edition of the Economist had information on the magazine’s name, issue date, and page numbers, in the magazine page header or footer. The actual article name or category is exemplified above. The general headlines of the article show up first. A descriptor or perhaps subtitle, is located below the article ‘headline.’ At times an author’s name is listed. Now, readers, what is the real ‘searchable’ title of the article. We can all, if not in the general listed information, then in an articles first paragraph, figure out its subject.

But for what do search on the publisher’s web site. Well, folks, it depends… depends both on the magazine and the mechanics of a site’s search engine. If one entry doesn’t work, say “Good Policy and Bad” then try another such as “Some mitigation policies are effective, some are efficient, and some are neither”. If there’s an author listed that the third item I check. Struck Out?

Dig deep and look at all the special reports or whatever the article category lives in, but do so within a range of publication dates. Why a range? An article, initially posted on the web edition of a magazine may have a different date then that listed for the date the paper copy was printed. Sometimes it takes more than three strikes before you get a hit.

Dumb, getting long in the tooth, Harry. The solution for me is Google it.

Most of the time any of the three choices I’ve provided works when googling, and using the find feature of your browser you can skip thought irrelevant items at a click of a mouse. I usually do this by entering the magazine into the pages find field, no not into the search field!

Read on for the rest of the story, I’ve provided you the punch line. Don’t be a kitten on the keys.

Background

As many of you are aware, I spend most of the time, when not actually writing, doing searches on my Macintosh. What I’m looking for is fairly eclectic. My interests range from climate change, nuclear science and energy, folk music, technology especially energy related themes, to all things Macintosh, with side trips for fining obscure widgets and gadgets needed by someone in my family. After burnout usually around 10:00p, I turn toward recipes, cooking related (free) eBooks and on occasion obscure movies that I saw in art deco movie houses in the 50-70’s.

I have not yet worked with Microsoft’s “Bing” [http://www.bing.com/] or the new beta search engine from Wolfram, and will not until I read that its gotten more robust — Its an omission I can live with. I also, for now, have not learned to use data mining software and methods.

I also subscribe and read almost cover-to-cover, a variety to magazine, no e-editions for me, ranging from Time to the economist and Scientific American. As I read these, I mark (PostIt tabs) articles I want to collect for future use, either as references of as a basis for future exploration – most of my curiosity cats are dead, but there’s so much of interest out there; so… I keep on truckin‘.

A few generalizations — Searching individual websites for information can be either easy or maddening. If a site has opted to use the Google engine to power its search, it is easy to use, tolerant of syntax errors and even forgives my frequent misspelling. But first, I’m from the government and am here to help you! Let us count the ways.

Department of Energy [DOE] and most other Federal Sites such as that of the EPA and NRC Sites — Please note, I have published over 100 documents, papers and articles during my 30-year career as a supporter of DOE’s waste management effort. Therefore my criteria for success are “how many of these Babad co-authored goodies will a search find. In addition, I have an extensive list of citations, again form my professional work at the Hanford Site, how many of these can I find to replace shelf-hogging paper copies.

Note this does not include my academic or industrial career, or the articles I’ve written about folk music and the Macintosh. They would be out of scope for most of the Federal databases to which I have free access. (I’m sure big brother is watching me, but I can’t check what he’s seen.)

The various government sites I need to check for background or reference materials, supposedly peer reviewed or at least check for quality, when writing books and articles is pure horror. I habitually check the DOE, NRC, EPA and IAEA sites and on occasion the NSF and NIH portals.

One of the most exasperating are the two DOE OSTI (Office of Science & Technology Information) on which I can usually find public domain references to R&D and DOE programs, but pot with multiple rephrasing of either search criteria or syntax. Specifically most simple searches, say for instance the publication of Harry Babad (me)

First I Tried To Work With Science Accelerator — This is a ‘newish’ gateway to science, including R&D results, project descriptions, accomplishments, and more, via resources made available by the Office of Scientific and Technical Information (OSTI), U.S. Department of Energy. Using the ‘Science Accelerator” to do a global database search turned up one item, a shared patent issued in 1980. [http://www.scienceaccelerator.gov/index] Checking “Harry Babad:” turned up zilch, as did “Harry Babad” Author and other {thinking cap} input variations.

Okay, the Since Accelerator doesn’t do authors. However my search for ‘Desalinization” gave me 184 hits, which I could sort by date or even burrow down into by limiting the list by subset subsets; the later did not help because the indexer and I didn’t obviously see eye to eye on what a subset defines. That’s a matter of selecting key words we’d both label a document. Since I don’t have the data dictionary for Science Accelerator, I can’t get into the site sysop’s mind. However accelerator contains helpful links to Wikipedia, which I used to my advantage — a springboard to digging deeper [http://en.wikipedia.org/wiki/Desalinization].

A data dictionary is a concept/term most often used in database creation and use. In part, a data dictionary is a bit like the cloud ‘items lists you find on a few websites or the tags you now find on many individual web pages, like our blog. The difference is that the data dictionary is more formal and constrains the choice of key words a use can use to search with. See: http://en.wikipedia.org/wiki/Data_dictionary

Let’s Try The OSTI Bridge Site. [http://www.osti.gov/bridge/]  — It comes in two flavors, only one of which is accessible by the general public. Although dealing with cleared and released documents, the DOE/DOE contractor option, which also deals with so called Freedom Of Information Act [FOIA] contents such as correspondence or guidance, requires a password, which I not longer had. Wow – Instantly I got 101-matches which I could sort by date or even focus by doing an advanced (field related) search. Great, I’ve solved my problem and have my citations to guide me to the references I want. Not so fast, Doc.

That’s the good news. The bad news is that those 101 hits contained articles by many of my colleagues, in which I was not a contributing author. My only relationships to the papers were the fact that some of my work was referenced therein. Only 11 of the papers contained my work. Searching H Babad turned up 991 hits, some of which were clearly mine. A narrower ‘field’ search [H. Babad] correctly turned up 30 relevant documents, while Harry Babad turned up none. Hmm!

The Mostly Private Sector My magazine article collecting experiences

As mentioned earlier in passing, I subscribe and skim/read/study to Consumer Reports, Business Week (now Bloomberg’s BW), Chemical and Engineering News (ACS), Chemical Heritage, Discover, Time, Nuclear News (ANS), National Geographic, The Economist, Technology Review (MIT) and Scientific American. Were applicable I have a subscribers access to content. Of course, this does not count other subscription, both electronic and hard copy that are science and technology oriented, including my Macintosh related items.

Periodically, usually every-other month, I recheck the paper copies, go to the publisher’s web site and download the articles of interest as well as any other closely associated documents linked the highlighted original. All of this lives in a 40 GB partition collected ion nested folders. Although I’ve developed a database (FileMaker Pro] I’m to busy to do the data entry so live with a combination of title searches (EasyFind by DEVONtechnologies) and contents (Houdah Software’s HoudahSpot, a great front end for Spotlight).

Now I can give you a blow by blow of the strengths and weaknesses of doing searches on each of the magazine publisher’s web site. Search capability ranges from fair to good, and often require either varying the search terms, or changing the display order (usually by Date.) NO I will not, it is a waste of all our time.

However, I finally made a discovery, after blundering around d individual sites for years.  The closer a site has come to adapting or mimicking the Google search engine, the easier it is to find things. Our macCompanion site uses this tool, although the site also provides search by ixquick, which did not meet my needs since its output was broadly focused and mostly irrelevant stuff from the entire WWW. However, the Google engine on the macC site turned up over 200 hits. Searching “Doc Babad” turned up 1580 hits, many more then the 250 or so items I’ve published. The truth lies somewhere in between, it just takes more time to ferret it out.

Conclusion

Okay, this is a little bit like the number of angels on the head of a pin. “To Site Search or to Google, That is the question.“ The answer is both!

If you only need to search a few webs sites, over and over again, consider mastering is search tools. If you stay close, then all the hits or misses are limited to the site you are searching.

If you however have broader multiple-site specific needs… Google them to pieces!

Doc.

– – – – – – – – – – – – – – – – – – – – – – – – – — – – – – – – – –

Appendix — Advances Search’s

Almost all search sites such as Google, MSN (Bing), and Yahoo as well as many others including MacUpdate, ‘stute magazines and newspapers, have advanced search features. The image below is what Google offers. Alternatively, like a good reference librarian, you can take advantage of Boolean search methods to sometimes narrow down your search, More on this can be found in my July 6, 2010 blog posting called Google to the Max at https://mhreviews.wordpress.com/2010/07/06/july-6-2010-google-to-the-max/. In addition there’s lot of generalized information on Wikipedia at http://en.wikipedia.org/wiki/Boolean_search#Boolean_operations/. It’s a bit of heavy reading, but well worth the time.

– – – – – – – – – – – – – – – – – – – – – – – – – — – – – – – – – –

End Notes:

An earlier version of this article was posted in the March 2000 edition of the now defunct eZine — macCompanion. Since it’s no longer accessible, I updated it and am posting on our blog.

Product and company names and logos in this review may be registered trademarks of their respective companies.

Reviews and tests were carried out on my iMac 2.8 GHz Intel Core 2 Duo with 2 GB 667 MHz DDR2 SDRAM running Mac OS X version 10.6.4 —Snow Leopard.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s