Seek and Ye Shall Find: Locate original documents on the Web
by Wendy Boswell

Finding original source documents on the Web, otherwise known as firsthand accounts, primary sources, etc., has thankfully become a pretty easy task. There are plenty of great sites out there that have made it easy for the average searcher to locate pretty much anything from the Bill of Rights to Civil War journals to the Islamic History Sourcebook to the archived Godey’s Lady Books – seriously, there is NO end to the fun stuff you can find (and don’t we all need a bit of break from reading about Google’s latest exploits?). Most of these I found using the tips and tricks I outlined in another Lifehacker article, How to Search the Invisible Web, along with a bit of special Web searchin’ sauce (I’m lying about the special sauce-although Thousand Island does seem to perk things up a bit).
Here are just a few of the sites on the Web that you can use to locate primary documents; of course there’s many, many more, but these are my favorites that I find myself coming back to again and again, just to browse (I’m such a dork, but that’s okay). Remember, if you’re searching for original documents in order to cite them in an academic paper, you need to make sure you evaluate them first, and use proper citation procedures.
- Duke University Libraries Database List: Over 60 huge databases are profiled on this list, anything from Early Canadiana to Historic American Sheet Music. Note: not all of these databases allow you access; some are behind university or library firewalls and require authentication, but most do allow access to their information.
- The Library of Congress: An astonishingly huge repository of information. This is the first place to go for American documents of any kind.
- United Nations Documents: Read resolutions from United Nations sessions clear back to 1946.
- A Chronology of US Historical Documents: Presented in chronological order for your reading enjoyment; everything from the Magna Carta to papers documenting the atomic bomb decision.
- Cold War Hot Links:A very long list of links compiled by an anthropologist. Makes for interesting reading, especially the declassified newsreels and CIA Intelligence Studies.
- Internet Ancient History Sourcebook: from Fordham University; “hundred of local files as well as links to source texts throughout the net.” Arranged in country sections, i.e., Mesopotamia, Egypt, etc.
- Public Papers of the Presidents of the United States: You have a few different ways to search this large database; I recommend viewing the sample searches just to get a feel for what you need to do.
- Supreme Court Collection: You can even subscribe to this site via RSS or email to get digests of daily or recent decisions. Updated frequently.
- Our Documents: “100 milestone documents of American history;” you can view high-resolution photographs of the actual documents as well as document transcripts and .pdf downloads.
- Historical Maps of the World: Probably one of the most interesting sites I’ve come across on the Web. Lots of original maps scanned in from as early as the 1600’s.
- Internet Modern History Sourcebook: From the same folks who brought us the Internet Ancient History Sourcebook. A huge collection of resources from the Reformation to the 21st century.
- Internet Sacred Texts: There is a mind-boggling array of sacred and religious texts here, from Confucianism’s Li Ki text to Freemasonry’s The Builders.
- Online Exhibits from the National Archives: Very cool and lots of pictures! Yay! Includes the 1918 Influenza Epidemic, When Nixon Met Elvis, and a lot more.
- Founding Documents: Yep, we’ve already got some American document sources on this list, but this site has them in seven different possible formats.
- American Civil War: Nicely put together database of original source Civil War documents.
This is by no means an exhaustive list. There’s the Australian National Archives Database, the Ad Access Project that “presents images and database information for over 7,000 advertisements printed in U.S. and Canadian newspapers and magazines between 1911 and 1955″, the Internet Library of 18th and 19th century Journals, Making of America, a database of American social history primary sources, Chinese cultural texts, Asian history sources…it goes on and on.
Again, even though you might not need to check out any of these primary sources, it’s nice to know that after you’ve downloaded the latest iTunes episode of “The Office” that you DO have the option of browsing through the original Louisiana Purchase documents. I mean, come on – can you imagine what kind of mad office credibility that would give you?
Data Mining using Google
Google has quickly become one of the most well known words in the world and is used by millions daily, including myself. In an advanced database class back in university, we spent a couple of weeks studying the inner workings of search engines, and one topic which happened to come up was data mining using Google. Much to my surprise, out of a class of 80 fourth year computer engineers maybe four or five knew how to use Google to perform any sort of advanced queries.Google (and many other search engines) has the ability not only to search on keywords, but also using a more “database-ish” query language to really narrow down your search results. Below is a summary of a few of the most useful lesser known features.
Note: in the examples, replace cyberwyre.com with your own domain.
Basic Usage:
- Use quotation marks ” “ to locate an entire string.
eg. “bill gates conference” will only return results with that exact string. - Mark essential words with a +
If a search term must contain certain words or phrases, mark it with a + symbol. eg: +”bill gates” conference will return all results containing “bill gates” but not necessarily those pertaining to a conference - Negate unwanted words with a -
You may wish to search for the term bass, pertaining to the fish and be returned a list of music links as well. To narrow down your search a bit more, try: bass -music. This will return all results with “bass” and NOT “music”.
General Tips: (I use many of these almost on a daily basis)
- site:www.cyberwyre.com
This will search only pages which reside on this domain. - related:www.cyberwyre.com
This will display all pages which Google finds to be related to your URL - link:www.cyberwyre.com
This will display a list of all pages which Google has found to be linking to your site. Useful to see how popular your site is - spell:word
Runs a spell check on your word - define:word
Returns the definition of the word - stocks: [symbol, symbol, etc]
Returns stock information. eg. stock: msft - maps:
A shortcut to Google Maps - phone: name_here
Attempts to lookup the phone number for a given name - cache:
If you include other words in the query, Google will highlight those words within the cached document. For instance, cache:www.cyberwyre.com web will show the cached content with the word “web” highlighted. - info:
The query [info:] will present some information that Google has about that web page. For instance, info:www.cyberwyre.com will show information about the CyberWyre homepage. Note there can be no space between the “info:” and the web page url. - weather:
Used to find the weather in a particular city. eg. weather: new york
Advanced Tips:
- filetype:
Does a search for a specific file type, or, if you put a minus sign (-) in front of it, it won’t list any results with that filetype. Try it with .mp3, .mpg or .avi if you like. - daterange:
Is supported in Julian date format only. 2452384 is an example of a Julian date. - allinurl:
If you start a query with [allinurl:], Google will restrict the results to those with all of the query words in the url. For instance, [allinurl: google search] will return only documents that have both “google” and “search” in the url. - inurl:
If you include [inurl:] in your query, Google will restrict the results to documents containing that word in the url. For instance, [inurl:google search] will return documents that mention the word “google” in their url, and mention the word “search” anywhere in the document (url or no). Note there can be no space between the “inurl:” and the following word. - allintitle:
If you start a query with [allintitle:], Google will restrict the results to those with all of the query words in the title. For instance, [allintitle: google search] will return only documents that have both “google” and “search” in the title. - intitle:
If you include [intitle:] in your query, Google will restrict the results to documents containing that word in the title. For instance, [intitle:google search] will return documents that mention the word “google” in their title, and mention the word “search” anywhere in the document (title or no). Note there can be no space between the “intitle:” and the following word. - allinlinks:
Searches only within links, not text or title. - allintext:
Searches only within text of pages, but not in the links or page title. - bphonebook:
If you start your query with bphonebook:, Google shows U.S. business white page listings for the query terms you specify. For example, [ bphonebook: google mountain view ] will show the phonebook listing for Google in Mountain View. - phonebook:
If you start your query with phonebook:, Google shows all U.S. white page listings for the query terms you specify. For example, [ phonebook: Krispy Kreme Mountain View ] will show the phonebook listing of Krispy Kreme donut shops in Mountain View. - rphonebook:
If you start your query with rphonebook:, Google shows U.S. residential white page listings for the query terms you specify. For example, [ rphonebook: John Doe New York ] will show the phonebook listings for John Doe in New York (city or state). Abbreviations like [ rphonebook: John Doe NY ] generally also work.
Putting it all Together:
Now it’s time to start to get creative with our search terms and really narrow down our results. Now that we have the basics, let’s start to combine them all into one search term.
Example #1: Search for some MP3s
Let’s say you’re a Beatles fan and want to see if you can find some of their songs on the Internet without using Kazaa, etc. Try this query:
>“index of” + “mp3″ + “beatles” -html -htm -php
or you could try this query:
* “index of/mp3″ -playlist -html -lyrics beatles
Right away on the first few results returned by Google you can download MP3s.
Example #2: Mixing some techniques together
Here’s a simple exercise. We’ll mix around a few terms to get more accurate results. Let’s say we want to research sleep recommendations. One assumption could be that research papers on this topic would most likely be on an educational website — perhaps with a .edu domain. We could try this query:
sleep recommendations site:edu
Maybe we’re in my situation, and am thinking of applying to grad school. Let’s see if we can find the Graduate Studies Admissions Requirements at the University of Toronto. We could try this query:
grad school admission requirements site:utoronto.ca
Summary:
After reading this article, you might be thinking “well, I could probably find those results without remembering these advanced search terms”. Well, the truth is that you probably could. The reason you want to start to use these advanced search tips is because they will help you find what you’re looking for faster. They greatly help narrow down the results, and more often than not, the information you were looking for will be in the first two or three results.