Introducing Malwatch, a new malware scanning system created at Pagely.

Malware is the bane of most site owners’ online journey. New threats emerge daily and a successful attack can bring your operation down by defacement, tanking your SEO, or harvesting your private data. Pagely has created a new malware file scanning tool under our PressARMOR™ security framework that is not only more capable than existing tools, but also faster.

Malwatch Highlights

  • 15% Better Detection
  • 25% Faster Scan Time
  • 14% Less CPU Usage
  • Real Time Monitoring and API Reporting Engine
  • Already deployed across the entire Pagely fleet.

Existing malware scanning tools used across most web hosts are proven and effective – having many years of development and market use. While we were satisfied with the existing tools, we asked the simple question – how can we make it better?

A typical one off website owner may not be as concerned with the performance of a scanning tool as long it works. As a web hosting service provider, Pagely requires a malware scanning solution which focuses on efficient resource usage, can be easily deployed and maintained platform wide, while being able to closely integrate with our other tools and platform.

Most existing malware scanning solutions today do not place high emphasis on threat intelligence – they do a decent job of reporting a hit or quarantined file, yet the analytics and reporting leave much to be desired. Basic log files and email reports do not solve our complex needs and more robust reporting capabilities were necessary. Most existing solutions are heavily based on technologies such as ClamAV but our needs were for an in house solution that can be easily maintained and include the latest technology, while still offering the same or better feature set.

Roscoe Skeens, a member of our Info-Sec team, took this question of scanning performance and threat intelligence to heart and began working on an answer last year.

Introducing Malwatch

Malwatch is a fully featured malware scanner designed for Linux based general purpose web server environments. It can scan both files and processes together with the inclusion of real time scanning capabilities.

Malwatch is built upon the incredible Yara library, which is the same technology used by XProtect that comes included for macOS.

Malwatch by Pagely logo.Development started in January 2020 and within a few weeks we had a basic scan engine implementation working. It became clear by March that the concept was achievable and additional core requirements such as automated malware cleanup and an embedded database began to materialize.

Real time file monitoring was added during the April sprints, which concluded the minimum core set of functions that were originally scoped.

During testing we used large sets of malware samples to compare accuracy and Malwatch started to show higher detection rates than other solutions towards later builds, even though the same signature bases were used. We completed tens of thousands of test runs. See benchmarks below.

Deployment to production environments was a very well planned and controlled process. Individual AWS regions were chosen, each with a week long trial run alongside Pagely’s previous malware scanning setup. Over the course of Q4 2020 Malwatch was deployed across the entire Pagely fleet.

Only a couple of bugs were identified post rollout and each were fixed on the same day. Neither caused any serious problems and each could only be reproduced every thousand or so scan runs.

Malwatch features an API driven reporting engine offering JSON based compatibility across our platform and tooling. In an upcoming release, customers will be able to see informative scanning reports and findings within our Atomic hosting dashboard, while our information security team can monitor trends, scope, and active threats in a purpose built threat intelligence dashboard.

What is Malware?

We’ve all heard about it or have been affected by malware. Malware is any software intentionally designed to steal information, gain access to, or cause damage to a system. Computer viruses, worms, Trojan horses, ransomware, spyware, Facebook, and adware are all forms of malware.

Malware typically infects a WordPress site in 1 of 2 ways. The first being by a code vulnerability in a plugin or theme that has been exploited for nefarious means. The second being by a hacker (or automated program written by the hacker) using a stolen or brute forced password to gain access to a WordPress user account. With that access they typically upload other malware to perform some nefarious activity. Keeping your code updated and passwords rotated and strong are the single best lines of defense a user can take.

What is Malware scanning?

Tools like Malwatch, Maldet, and others are very much like the antivirus software running on your PC – except they tend to be focused on systems like web/cloud/file servers. They scan files or running processes and compare the contents to a list of known malware signatures (think fingerprints). When a suspected match is found the tools can be configured to send a notification, quarantine the file or process, and in some cases automatically clean/repair the file or process.

Why do hosts like Pagely invest in and run Malware scanning?

The most basic reason is because malware costs everyone time and money. Malware infections may not only cause harm to the site (and the site owner’s business) of the initial infection – but may also affect the underlying systems of the host by slowing down servers, infecting other sites, or harvesting data. In a nutshell it is in everyone’s interest to mitigate the risk of malware and therefore like at Pagely, malware scanning is typically a value add included with many hosting plans.

Scanning Benchmarks during testing

We started this project with a simple question – how can we make our malware scanning platform better? Which set Roscoe down a path that ultimately yielded a brand new tool all together. It was important for us to make sure the new Malwatch tool performed as good or better as the many capable and successful malware scanning tools in use today.

Benchmarks have been conducted on idle Amazon Web Services C4 instances. Each benchmark category’s result is the average from three runs. Malwatch was tested first in each category.

The benchmarks comprise of Malwatch, ClamAV and a web hosting related ClamAV frontend. Most established Linux based web hosts do not use ClamAV in isolation but rather a web hosting specific frontend which is specifically geared for the industry. The standalone ClamAV installation was specifically configured to be as optimal as possible to deliver best possible results.

Throttling tools such as “nice”, “ionice” and “cpulimit” had all been disabled for the benchmarks. Other industry specific solutions such as Linux Malware Detect (Maldet) were also tested but with similar or worse results.

Detection Rate Benchmark

A third party malware zoo was used to compare against a large assortment of known and unknown malware samples. The same signature base was loaded for all three series of tests and the file count also verified.

Malwatch scanning detection rate
One possible reason for the lower than expected ClamAV result is the Yara signature set not being fully compatible, although there were a fair amount of Yara rules being matched.

Scan Duration Benchmark

malwatch scan duration benchmark
25% faster scanning time for 35,000 files.

Process CPU Usage Benchmark

malwatch CPU process benchmark
14% savings in CPU usage with Malwatch

Process RSS Usage Benchmark

malwatch process rss graph
Roughly the same memory footprint

Roscoe’s thoughts on his choice of using Yara and Go

The Go programming language offers extremely stable and well structured support for concurrency, which delivers orders of magnitude better performance when shifted from single threaded operation.

The true power of what Yara can offer is attributable to further performance gains throughout each new build. It became logical to rely on Yara for more optimal algorithms such as Aho-Corasick instead of devising our own form of it. Although it has origins from the 1970s, the Aho-Corasick algorithm is still incredibly quick because it performs all matches in one pass.

In Summary

malwatch-ascii logo

Malwatch is not a Pagely owned tool. Roscoe did much of the early development on his own time – and the later stages utilized company time with collaboration from other Pagely software and DevOps engineers. We agreed early on if anything came of the project we would work out a fair IP agreement for both parties. Therefore all the source code and IP (that which was owned by Pagely) has been assigned to Mr. Skeens in return for a perpetual use agreement/license with Pagely, Inc.

Roscoe is considering the option of open sourcing his work and we fully intend to support him and Malwatch with further resources and support – least of which will be its continued development and testing inside our hosting platform.

Pagely has been a market leader in Managed WordPress hosting for over a decade – we were first-to-market in the now multi-billion dollar channel we created after all. A core commitment to our customers has always been a focus on information security and a trouble free experience for hosting clients. Adding Malwatch to our PressARMOR™ risk mitigation framework allows Pagely to achieve greater detection accuracy with quicker completion times, while reducing resource usage on customer instances across the entire fleet.

Well done Roscoe, well done indeed.

New posts to your inbox.

1 Comment

  1. Anastasios
    Anastasios

    Exciting news, thank you for sharing and hope it’ll be open-sourced, Joshua!

    Reply