Featured Topic
 

Smart Recovery: A Smartly Automated, Smartly Designed Software Recovery System

Every engineer is familiar with how small corruptions work their way into a PC’s software and slow it down, or crash it. As IA technology hastens its convergence with IT systems these slowdowns become much more than mere irritations. For systems where five nines of availability and precisely benchmarked high-speed communications are fundamental requirements, eliminating software corruption is a critical concern. Fortunately, there is a safe, reliable, and economical way to address the issue: Moxa Smart Recovery, an automated recovery utility that can dependably restore any software environment to a pristine post-install state.

One important reason for the population of automated industrial environments with Ethernet and IT devices are the powerful and convenient utilities they provide for remote administration. The expertise of the on-site staff who maintain and operate automated facilities like electrical substations, wind farms, or transport environments such as trains and ships often do not possess strong competency in computer administration, troubleshooting, and repair. For these environments, reliable automation that also can be quickly and reliably activated either remotely or locally is an imperative. At the same time, preventive maintenance schedules in industrial environments have always been a critical consideration. By taking our bearing from these two fundamental features of today’s industrial landscape, Moxa has created a recovery utility that rewrites the full software platform, one that is capable of reliable remote administration, able to provide scheduled rewrites for preventive maintenance, and can rewrite a system that has catastrophically crashed. We call it Moxa Smart Recovery.

The Current State of the Art

These requirements are beyond what current market-standard rescue software in the IT industry provides. Platform rescue solutions like Acronis, Norton Ghost, or open-source solutions like Clonezilla all suffer from a fatal design flaw: they operate in user-space, which means that recoveries can never be fully automated. Should an operating system crash from software corruption then human interaction is required to reset the system. While it is theoretically possible to automate these tools, that would require some costly coding and reliability testing.

Bit-Level Re-writes

Firstly, before any work on Smart Recovery could begin, a decision had to be made at what level the backup system would be stored. To guarantee the most accurate and uncorrupted re-writes possible, it was decided to write the backups and perform the re-writes the block level, bit-by-bits, rather than simply copying over files. This protects the backup system from corruptions are that might creep in during the re-write process. Additionally, it is much harder for file level rewrites to compensate for corruption of the physical storage device than it is for bit level copies. Only by guaranteeing that every bit is successfully re-written to the platform’s physical storage medium—whether disk or solid state—can the system recovery mechanism guarantee that a successful recovery procedure has been completed, and that every fragment of data has been successfully returned to its initial post-install state.

A Robot to Raise a System from the Dead

Moxa Smart Recovery, however, is integrated with BIOS hooks that allow engineers to configure a watchdog timer that will trigger a full re-write of the entire suite of installed software—operating system, all applications, and the full system configuration—at the block level, using a cached copy that is created when the system is first set up for deployment. When software errors bring down a system protected by Smart Recovery, the computer will automatically go into a soft reboot and, after the watchdog recognizes the system can no longer boot up, on the next soft reboot the system will flip into Smart Recovery mode. Smart Recovery then re-writes the entire software platform, and attempts to reboot the system. If the failure was because of a software corruption, Smart Recovery will have fixed it.

Software Routines Tailored for Industrial Automation

Additionally, however, Moxa’s long familiarity with automated industrial environments has allowed us to assess a broad spectrum of features that are specifically tailored for the needs of industrial automation engineers. Going back to the watchdog example, it’s clear that with only some minor tweaks to the code that it is possible to allow system administrators to set this system up so that it can rewrite the software platform at scheduled intervals, each time returning the entire software system to a pristine pre-deployment operating state. These scheduled rewrites give administrators an extremely powerful new tool to use when setting up preventive maintenance procedures. This feature allows system administrators and engineers to set the computer to perform a system recovery at specified time intervals, which will eliminate the slowdowns that are associated with operating systems that have been in use for a long time. In this way, system engineers can guarantee that a Smart Recovery-equipped device will always function at the benchmarks to which it was initially configured

Remotely Initiated and Manual Recoveries

While full automation is useful, certain situations will demand user-initiated recoveries, as well. “Remote” recoveries include not only procedures that are initiated from a far-distant control room, but also those called from a local control station located on-site. The mechanism is simple: when a system administrator perceives a need to restore a device’s software platform then he or she simply sends a call to the device that begins an automated re-write. With little more than a click of a mouse, the administrator will take the device offline and return the platform to its earliest, most pristine configuration. Remote re-writes give administrators the power of tuning up or rescuing a system remotely, whenever the need arises.

Manual recoveries, however, are very different, because they serve a different type of user. Manual recoveries are initiated at the device’s physical location using a system recovery key. Essentially, a user–any user–can simply insert the key, hit the computer’s reset button, and Smart Recovery does the rest. Once the process is completed, the platform will either be returned to its earliest post-install state, or its permanent failure confirmed. These manual recovery keys are perfect for users who are not trained for low-level IT work, as is often the case when computers are used as HMIs for heavy industrial machinery, as on ships, oil platforms, or in solar farms.

Moxa’s Smart Recovery utility offers industrial engineers a new, powerful tool in their administrative toolbox, something they cannot find anywhere else. As a secure, fully automated, intelligent BIOS level platform recovery that copies all software from the block level—and with software enhancements that allow it to be tailored to virtually any industrial automation need—Smart Recovery offers industrial engineers the tantalizing possibility of permanently eliminating the failures and slowdowns that come with the normal wear-and-tear of continuous use.

For more information on Moxa Smart Recovery, study up on the details of the technology behind it by reading our white paper: Intelligent Automation for the Maintenance and Recovery of Software Platforms here .

Back to index