searchlore.org

Deep Searching and Indexing

What is Deep Searching?
Deep searching revolves around data that can’t be easily found through traditional search engines. Content, such as technical papers, scientific research and private databases make up what is called the Deep Web. Deep searching generally entails connecting to these otherwise inaccessible databases, in real-time, and servingĀ  up relevant results.

Take for example, a website found with the popular and competitive search term same day payday loans, this website is very accessible, but the website’s database of payday lenders as it were, will never be found by the search engine, because the bot simply can’t access it.

The Difference Between the Surface Web and The Deep Web
Unlike searching the Surface Web, which is composed of all of the content that we typically see on a day-to-day basis, the deep web can’t be accessed by search engines’ bots or crawlers. Search engines, like Google and Yahoo, use bots to copy the data from webpages and create a searchable index for users. When content is restricted to authorized users, or isn’t visible to the bot, it will never be found on these search engines.

How Big Is the Deep Web?
The deep web represents a treasure trove of information that can’t readily be found online. Webpages and other forms of online content, that aren’t accessible through traditional search engines are estimated to be as much as 500,000 times larger than the web we typically browse. In some cases materials that comprises parts of the deep web, are not even accessible through common web browsers. Instead the computer must have specific software installed to connect to the subnets which house these deep web databases.

Private Databases
Private databases are data stores or information centers that can only be accessed by authorized users. Many social networking sites, government databases, and subscription-based media outlets have large private databases, that can only be accessed by registered users or subscribers.

The Office of Scientific and Technical Information, or OSTI, created a real-time, multi-database, deep web search engine at science.gov, which can query a multitude of government data stores at once.

Most social networks have developed their own internal search engines for registered users, alongside sets of APIs, which allow limited third-party querying of their data.

Media outlets have increasingly began deploying what are referred to as paywalls, which allow public access to a portion of content, while still requiring a subscription to obtain full access.

Dynamic Applications
Dynamically-generated content has become increasingly common on the Internet, and main stream search engines are struggling to catch up to this trend. When content is served dynamically, or on-the-fly, search engines which rely on a locally stored copy of the page’s data are unable to determine what might be displayed on that page at another point in time. As such, it becomes very difficult for these search engines to recognize and recommend relevant pages from these websites.

Alternate Protocols
The most significant hindrance to the old-fashioned search engine are new internet protocols. Most of our web activity takes place over ‘http’ or ‘https’. These are the primary protocols for the mainstream web. In recent time, however, a variety of alternate Internet Protocols have taken hold. Some have been designed for the purpose of anonymity or to circumvent censorship, while others have been around since before the web as we know it, and are being used by a growing number of underground web users.

Some of the most common alternate protocols, include;

  • Tor (The Onion Router): The Onion Router Project uses a network of peers, which consists of over four thousand relays, to obscure the users identity and location. Tor uses unique .tor addresses through it’s free distributed, open-source TOR software. In 2013, the NSA issued a report referring to Tor as the king of secure, low-latency anonymity software.
  • FreeNet is a peer-to-peer protocol designed to circumvent Internet censorship. Freenet stores encrypted pieces of content on users’ computers, and connects to those pieces of content through intermediate peers.
  • Usenet has been around since 1980, and is used to post and view content on a worldwide, distributed network of servers.

Get in touch

Our team will be glad to help you anytime with general
or technical questions, suggestions or comments.