Back to Blog

MirrorWeb Modifies Crawling Process to Reduce Carbon footprint

Sean Stapleton

Leading communications surveillance platform, MirrorWeb, has announced wholesale changes to its web crawling technology in order to become more energy efficient.

MirrorWeb operates Amazon Web Services (AWS) accounts in London, Ohio, Virginia and Frankfurt, which are utilized depending on their clients’ preferences. Over any given 24-hour period, each of these accounts is known to run thousands of web crawls, a vital element of the digital archiving service that MirrorWeb provides.

In recent months, the company has been making the transition from Intel based crawl servers to ARM (Advanced RISC Machine) based crawl servers. ARM processors were developed by Acorn Computers and eventually Apple, and provide a low power, energy-efficient alternative to their Intel counterparts.

AWS’ version of an ARM chip set, Graviton, uses up to 60% less energy for the same performance than comparable EC2 instances, such as Intel. Due to the move to the ARM chip set, MirrorWeb was able to reduce the size of their crawl servers by half, based on the performance gains achieved with Graviton.

Additionally, MirrorWeb is introducing the practice of ‘upload on rotation’. Traditionally, for each crawl, archival data would be stored on the crawl server for the duration of the web crawl. The storage capacity would need to expand as the crawl progressed, with additional space being repeatedly requested from Amazon, as it was unclear how large the crawl would end up being. At the end of the crawl, it would take some time to upload it to the cloud, depending on the size of the crawl.

For the new ‘upload on rotation’ process, every time an archive file is completed, a new file is created, and the previous file is uploaded right away. This saves energy wasted on repeatedly growing the storage, and the longer upload period at the end of the crawl, further increasing energy efficiency.

Philip Clegg, Chief Technology Officer of MirrorWeb, said: “The changes that we’ve made have been on the agenda for a while now, and we’re very happy to make the transition over to ARM processors. The performance benefits are remarkable, and we can use up to 60% less energy to get the same results. From an environmental perspective, it’s a no-brainer.

“Further tweaks to our crawling process should complement that perfectly. 'Upload on rotation’ saves energy on every one of our crawls. It shows our commitment to honing our processes while embracing our responsibilities”.

For more information about MirrorWeb web archiving, visit https://www.mirrorweb.com/solutions/capabilities/website-archiving

More from the Blog

FCA Consumer Duty: A Work in Progress

The Consumer Duty came into action over a year ago. We assess its impact so far.

Read Story

Connector Spotlight: Vimeo

We're now able to capture Vimeo content through MirrorWeb Insight. Read on for more details!

Read Story

UPDATE - How the SEC Keeps Raising the Stakes on Mobile Messaging

Following August 2024's wave of enforcement, an updated overview of the SEC's three year probe into off-channel communications.

Read Story

See what we can do for you.

Let us show you why MirrorWeb is trusted by organizations across the globe for their compliance and digital preservation needs.