What is the Dark Web ?

What is Dark web?

The world wide web is more vast than you think, but the majority of us see only the surface of it.

In fact, when someone says that Google or other search engines know about everything on the internet, they’re making a complete mistake. According to statistics from Worldwidewebsize.com, search engines have indexed at least 4.57 billion pages at the start of November 2017. The part of the internet indexed by Google and other search engines is known as “Visible Web” or “Surface Web.” Despite the enormous size in human terms, the Surface Web is merely a fraction of the entire web, and it’s pretty small—less than 10 percent.

Visible web,Deep web and Dark web
3 segments of various scale represent the Internet as we know it, Dark web being a subset of Deep web

In this article, we will dive deeper and look below the “ocean surface,” where the remaining 90 percent of information lives. This includes the Deep Web and its smaller chunk, the Dark Web. We’ll discuss the aspects of their operation and usage. Additionally, we’ll outline the differences between the two, since most people and even some media outlets mistakenly use them interchangeably.     

The Deep Web: How Is It Different From the Surface Web

The main characteristic of surface web is its visibility for search engine crawlers, which are bots programmed to visit a set of pages, detect the links in these pages, and add them to a queue for further crawling. Crawlers may be directed to download some pages’ contents or revisit some of the links periodically.

Even though there are attempts to build automated crawlers that would access and index some portions of Deep Web content, there hasn’t been any notorious breakthrough.

Search engines process all this information for indexing the web. It makes searching extremely fast and accurate. Most of the sites you visit daily are indexed by search engines and are part of the surface web: blogs, news sites, Wikipedia, e-commerce sites like Amazon, etc.

With the Deep Web, things are different.

The Deep Web is effectively walled off from indexation, but it doesn’t require any special skills to access. Most of us spend a considerable amount of time on the Deep Web, albeit unknowingly.  Few examples include:

  • Sites requiring some form of authentication, like email and cloud service accounts, banking sites, and even subscription-based online media restricted by paywalls or ad block walls
  • Companies’ internal networks and various databases
  • Education and certain government-related pages  
  • Dynamic content, where what you see is based on a submission in website’s search box or a form (Crawlers can’t do these things.)
  • Content formats “unreadable” by crawlers
  • Content intentionally restricted for crawlers in the pages’ configuration (Robots.txt) (Incidentally, a page that has no inbound links would also become a part of the invisible web.)  

Even though there are attempts to build automated crawlers that would access and index some portions of Deep Web content, there hasn’t been any notorious breakthrough.

Attempts at Deep Web Crawling Research

Among possible solutions for Deep Web crawling and indexing, a method proposed and validated by researchers from Shandong University may help to solve the issue and grant access to immense data hidden behind search interfaces. Their publication, “An Approach to Incremental Deep Web Crawling Based on Incremental Harvest Model” focuses on the method that would allow maintaining an accurate, organized local copy of crawled databases and file systems from the Deep web while keeping it cost effective.

In order to crawl a Deep web database, crawler needs to issue appropriate queries to obtain incremental database records, since full crawling would be costly in terms of time and resources. Proposed “incremental crawler” is a solution which tackles the problem by employing machine learning to determine appropriate queries to the database. It results in crawler issuing selective queries that would allow reaching the maximum consistency between the Deep web database and its local version. In this manner, incremental crawling also solves the problem of obtaining three kinds of records, as identified by researchers: new records deleted old records and updated records. Researchers ran a set of tests on 3 Deep web databases, demonstrating a reduction in costs without loss of incremental coverage.

However, parts of the information that belong to the Deep Web may be picked up by search engines in the case of a data breach or targeted attack. In February of 2017, Cloudflare, a US-based content delivery platform, announced that some of their clients’ data like passwords and cookies had been leaked in plaintext due to a software bug. Search engines worsened the situation by caching the leaked data. Google researcher Tavis Ormandy, who discovered the bug, commented:

“Data was leaked accidentally over the last six months by crawlers, and regular users downloading files and visiting websites. That data could contain passwords, cookies, private data, etc. We don’t know what’s out there, private messages, passwords, credit card details”.

The lesson is that no matter how “deep” the webpage is, security measures still apply.

Overall, the Deep Web has grown along with the expansion of the internet and has a utilitarian purpose at its core. Interestingly, there are all kinds of fascinating and helpful resources that are not indexed by search engines and are part of the invisible web! This list has of some of the most useful Deep Web resources.

One level below the Deep Web is the Dark Web—an area of the internet that has made headlines and is primarily associated with illegal activity and crime. It differs from the layers we have already discussed both regarding architecture and use cases.

The Dark Web: The Underbelly of the Internet

Dark Web or “darknet” term usage dates back as early as the 1970s when it was used to describe networks separate from ARPANET, a predecessor of the modern internet.

In current terms, the Dark Web is a layer of information and pages whose operation is assured by overlay networks, operating on top of the internet and obscuring access.

The Dark Web is the smallest, most secluded part of the internet—hardly visible to outsiders. Here’s why:

  • Most pages are anonymously hosted.
  • There is a high level of encryption.
  • It requires special software that is correctly configured to be accessed.

The New York Times was the first of all major digital media outlets to open their website on the Dark Web. You can try visiting it here, with caveats. If you try accessing it from your usual browser, it won’t get you anywhere. You need special tools to get to that page. Which brings us to one of the leading traits of the Dark Web: it has far more prerequisites for access than a typical website or even Deep Web page.

Software Used for Accessing the Dark Web

There are several tools used for reaching these parts of the internet. The TOR (The Onion Router) maintains the most popular tool for Dark Web access. Their primary product is the Tor browser. It provides a relatively secure connection for those looking for anonymity.

TOR Network operation on a high level
How TOR works, high level scheme. Source: Electronic Frontier Foundation

On the Tor network, internet traffic is directed through the network of random relays. The browser builds a route of encrypted connections, one-by-one. Each relay knows only the previous and the next relays, but full connection route stays untraceable. Tor browser uses new encryption keys for each connection along the way so that source and destination of data stays unknown, even in case of an attempt to intercept it.  Multiple layers of encryption resemble the structure of an onion. And now you understand why the domain name for the Dark New York Times page ends in .onion. This page is accessible only through the Tor browser. Such an approach provides a certain level of anonymity, but there may be exceptions (misconfiguration, vulnerabilities, etc.). Other platforms that allow users to browse the Dark Web are I2P (Invisible Internet Project) and Freenet, among others.

TOR Project logo
TOR Project logo

However, it isn’t just Dark Web technical implementation that sparks much controversy in society and on the media globally.

Why Is the Dark Web Considered a Safe Haven for Criminal Activity?

Greater anonymity induces the worst of human vice and—for criminals wanting to capitalize on it—the Dark Web is a perfect fit. The rise of the newer payment methods, like the Bitcoin cryptocurrency which allows incognito payments, has also contributed to illegal trade.          

According to article “Cryptolitik and Dark Net,published by Thomas Rid and Daniel Moore (King’s College London), out of 2723 active sites found on Tor Dark web during several weeks, 1547 or 56.8 percent contained illicit material of some kind.

Dark web sites' list
Categories of illegal Dark web sites according to research “Cryptolitik and Dark Net” by Thomas Rid and Daniel Moore (King’s College, London)

It turns out that a majority of cybercriminals, selling everything from compromised personal and financial data to drugs and hacking tools, constitute over half of Dark Web contents. The article was published in February 2016, but the history of black marketplaces online dates back to 2011, when Ross William Ulbricht launched the infamous Silk Road. It was notable for the scale and assortment of drugs sold there. Despite Ulbricht’s arrest in 2013 and the original site being taken down, other Dark Web marketplaces emerged shortly after that.

Dark web marketplace
Screenshot of notable “Silk road” Dark web marketplace

In July 2017, it was announced that two of the largest Dark Web black markets, Alpha Bay and Hansa, were shut down as a result of cooperation between American and European authorities. After Alpha Bay operations ceased, many buyers and vendors transferred to Hansa, not knowing that Dutch police had secretly taken over it in June. Despite these crackdowns, law enforcement agencies acknowledge that Dark Web markets are rapidly replaced by their competitors.

It isn’t just illegal trade of substances or stolen sensitive data that makes the Dark Web a core component of the cybercrime ecosystem. Adversaries leverage features of TOR and I2P networks in order to sustain the operations of malware ranging from botnets to banking trojans.

In the course of an attack, cybercriminals deploying malware are occupied with several tasks:

  • Delivering malicious payloads and infecting victims by exploiting system vulnerabilities
  • Enabling malware propagation across networks and devices
  • Extracting and transferring sensitive data from victims and more

All of these tasks need to be done while staying undetected and untraceable by security researchers and law enforcement.

In order to control and orchestrate malware operations, criminals rely a command and control server (C&C)—a vital part of malware campaign infrastructure. Once the malicious program initializes on the affected computer and secures its persistence, it would signal to C&C server that its ready to receive instructions. Adversaries often use several servers of which malware is “aware.” It will try to connect to at least one of the domains that resolve to such server’s IP address.

Researchers and law enforcement use these techniques to gain access to the malware. In this scenario, researchers will identify requests to hackers’ C&C servers and spoof the connection, effectively redirecting the traffic for analysis instead of for the C&C domain or host. Law enforcement agencies work with hosting providers and other organizations to alter the DNS records for identified malicious domains in order to disrupt malicious connections. This technique is known as “sinkholing”.

To keep their operations going some threat actors have resorted to deploying their control infrastructure on the Dark Web using capabilities of TOR network. This configuration allows hackers to run an encrypted communication between the malware and C&C server based on TOR Hidden services, making it harder to detect with security tools. C&C servers are accessible only via TOR and their real identifiers (IPs) aren’t exposed. When criminals create hidden services, a random domain ending in .onion is generated. This configuration is a major obstacle to sinkholing measures and takedowns.  

A Fortinet researcher describes a Vawtrack banking Trojan which first connects to Tor2Web proxy server and serves as a bridge to establish a connection to TOR. From there, a connection is made to a Dark-Web-based hidden service that controls Vawtrack. This scheme is not without weaknesses and may be spotted via certain traffic inspection tools, but these tools are not part of the arsenal of average users.

It is worth noting that infection with malware that runs operations on the Dark Web may occur elsewhere, including compromised Clear Web and Deep Web sites.

While this cat and mouse game gives little hope for decisive cleanup of the Dark Web, it’s not only about criminal activity. The Dark Web and Tor are nothing but technologies—neutral by design. There are use cases when tools like Tor can be helpful for maintaining anonymity under possible surveillance. For example, for political activists, reporters, and human rights activists. Tor does a great job of outlining the benefits of the Dark Web to the general population.

Have you ever used the Dark Web? Given the potential security benefits, would you ever try it?