Internet cached webpages: What are they and how can you deal with them?

During the course of my practice I’m often asked to provide social media law advice, typically to organizations or individuals who want defamatory, proprietary or privacy invasive content removed from the Internet.  Once content has been successfully removed from a website I often discuss with clients issues related to Internet cached webpages.  So I thought it would be helpful to provide some general information on what Internet cached webpages are and how you can deal with them in situations where you don’t want to have certain content available online.

What are Internet cached webpages? 

When search engines such as Google, Yahoo, and Bing crawl a website, they take a snapshot of what the site looks like at that time. This “snapshot” is known as a cached webpage and it’s stored by the search engine. The cached webpage is what the search engine then uses to determine whether the site matches a user’s specific query.

When a user clicks on the cached webpage, they will be taken to the version of the webpage that was online when the site was last crawled and not the current version. The cached webpage can be accessed when the current version of the site is unavailable due to Internet congestion, a slow website, or the webpage has recently been removed.

How can you deal with Internet cached webpages?

It is very difficult to remove a cached webpage from a search engine if you’re not the webmaster of the website. If you’re not the webmaster and you’d like a cached page removed from Google, Yahoo or Bing, you have three options:

  1. You can contact the site administrator and ask them to take the steps necessary to have the cached webpage removed from the search engine in question. Google typically responds to such requests from webmasters within 2-3 hours and Yahoo typically responds in 5 hours;
  2. You can seek a court order or other legal document to have certain content removed from the website/cached webpage; or
  3. You can just wait until the search engine crawls the site again and updates its cached webpages. This option can take the longest amount of time to see results since it might be 1-2 months before the site is crawled again.

Google offers an additional tool to have a cached webpage removed by a party who does not own the webpage in question. My understanding is that this process will only work for HTML pages and will not work for PDFs or .doc files):

  1. The webpage must be have been updated since the cached version;
  2. Go to (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663691) and click on Google Public URL Removal Tool;
  3. Sign in to Google;
  4. Then it will prompt you for the URL you’d like removed;
  5. Click Continue;
  6. Type a word that appears on the out-of-date cached version of the page but not anywhere on the live version. Its better to use single words and not phrases; and
  7. Click Remove cache.

Given the technical nature of this topic, and fact that I’m not a techie, I’d recommend that you work with a technology professional and not rely on this post. That being said I hope this helps and at least provides you with some general information to get you started.

About these ads

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 135 other followers

%d bloggers like this: