还原卡还原了数据能恢复吗

Hi! My name is Daulet Tymbayev, and today I want to share my experience of developing a system that (theoretically) is able to recover a disk much faster than traditional recovery. Let’s start from the beginning to cover all the project stages.
嗨! 我的名字叫Daulet Tymbayev,今天我想分享一下我的经验,即开发一个理论上能够比传统恢复快得多的磁盘的系统。 让我们从头开始介绍所有项目阶段。
Before joining Acronis I pursued my master's degree at Innopolis University (MSIT-SE program). Innopolis is a relatively new university and the MSIT-SE program is even newer. Nevertheless, it is built upon Carnegie Mellon University programs, and therefore it includes such things as industrial projects.
在加入Acronis之前,我在Innopolis大学(MSIT-SE计划)攻读了硕士学位。 Innopolis是一所相对较新的大学,MSIT-SE计划甚至更新。 尽管如此,它是建立在卡内基梅隆大学计划的基础上的,因此它包括诸如工业项目之类的东西。
The ultimate goal of an industrial project is involving students in the real software development process and putting into practice their newly acquired theoretical knowledge. To do so, the university partners with companies such as Yandex, Acronis, MTS and dozens of others (as of 2018, the university had 144 partners). In terms of collaboration, companies purpose their projects to the university, and students choose one of them, according to their interests and technical skills level.
工业项目的最终目标是让学生参与真正的软件开发过程,并将他们新获得的理论知识付诸实践。 为此,该大学与Yandex,Acronis,MTS和其他数十家公司合作(截至2018年,该大学有144个合作伙伴)。 在合作方面,公司将其项目定向到大学,学生根据他们的兴趣和技术水平选择其中一个。
Two years ago, I was on “the other side”. I was a student, working on another Acronis project, and last year I was appointed as a technical consultant of a new team of students. I have presented the Active Restore project to the university. The idea behind Active Restore was invented by the Kernel team in Acronis, but the development process started together with Innopolis University.
两年前,我在“另一边”。 我是一名学生,正在从事另一个Acronis项目,去年,我被任命为一个新的学生团队的技术顾问。 我已经向大学介绍了Active Restore项目。 Active Restore背后的想法是由Acronis的内核团队发明的,但是开发过程是与Innopolis大学一起开始的。
为什么我们需要主动还原? (Why do we need the Active Restore? )
The traditional recovery process goes as follows: after a problem that compromises your computer, the user opens the available backup system interface and clicks on «the emergency button» to restore to a saved state. Then, after N minutes your system is ready to resume working.
传统的恢复过程如下:在计算机出现问题后,用户打开可用的备份系统界面,然后单击“紧急按钮”以恢复到保存状态。 然后,在N分钟后,您的系统就可以恢复工作了。

As you can see, N has an impact on our business. This value represents the recovery time objective (from now on, RTO) and it depends on several factors, such as the connection speed (if the selected solution implements cloud recovery), hard drive bandwidth, size of the recovery files and many others.
如您所见,N对我们的业务有影响。 该值表示恢复时间目标(从现在开始,称为RTO),它取决于几个因素,例如连接速度(如果所选解决方案实现了云恢复),硬盘驱动器带宽,恢复文件的大小以及许多其他因素。
But this traditional approach treats the whole process without prioritizing the files. If we do prioritize our system, initially we restore it with the necessary files to boot, and later on, bring back other things such as picture and video files.
但是,这种传统方法会在不对文件进行优先级处理的情况下处理整个过程。 如果我们确定了系统的优先级,则首先我们将其与启动所需的文件一起还原,然后再带回其他内容,例如图片和视频文件。
需要驱动程序... (Driver needed...)
The operating system is expected to start up with a fully-ready drive, hence the need for performing a series of check-ups to ensure drive consistency. If one or more system files are absent or corrupted, it simply won't boot up. To solve this problem we decided to put file-redirectors on the disk, which replaces absent or corrupted files. These file-redirectors are empty and that’s why it will not take much time to create them.
预计操作系统将以完全就绪的驱动器启动,因此需要执行一系列检查以确保驱动器一致性。 如果一个或多个系统文件不存在或损坏,则它将无法启动。 为了解决此问题,我们决定将文件重定向器放在磁盘上,以替换不存在或损坏的文件。 这些文件重定向器为空,这就是为什么创建它们不需要花费很多时间的原因。
The recovery continues in the background. While the operating system works, “empty” files are filled with data. The background process considers the disk load and does not exceed preset limits. But the user or the OS can request some not-yet-recovered file. In this case, we launch the second recovery mode. The priority of the requested file is raised to the maximum and the recovery system transfers it to the disk as fast as possible. This way the OS gets the needed file, but with latency.
恢复将在后台继续。 在操作系统运行时,“空”文件中充满了数据。 后台进程将考虑磁盘负载,并且不会超过预设的限制。 但是用户或操作系统可以请求一些尚未恢复的文件。 在这种情况下,我们将启动第二种恢复模式。 所请求文件的优先级最高,恢复系统将尽快将其传输到磁盘。 这样,操作系统就可以获取所需的文件,但是会有延迟。
That’s the ideal situation. In the real world, there are a lot of problems and potential deadlocks. Together with Innopolis undergraduate students, we decided to research this recovery scenario, evaluate RTO advantages and clarify if such approach is possible in general. For that moment there were no such solutions on the market.
那是理想的情况。 在现实世界中,存在许多问题和潜在的僵局。 我们决定与Innopolis本科生一起研究这种恢复方案,评估RTO的优势,并弄清这种方法是否普遍可行。 当时,市场上还没有这样的解决方案。
I decided to leave service development to Innopolis students. At Acronis, we started mini-filter FS driver development. Windows Kernel team was responsible for that. We had a plan:
我决定将服务开发留给Innopolis学生。 在Acronis,我们开始了微型过滤器FS驱动程序的开发。 Windows内核团队对此负责。 我们有一个计划:
- Launch a driver at the early OS startup stage, 在OS的早期启动阶段启动驱动程序,
- Launch a service when userspace is ready; 用户空间就绪后启动服务;
- The service processes driver requests and coordinates the further recovery operation. 该服务处理驱动程序请求并协调进一步的恢复操作。

驱动程序构造细节 (Driver construction details)
My colleagues will describe in detail the service in the next post. In this post, we will disclose some details about our driver development. Our mini-filter driver has 2 operation modes – when the system is started up in a normal state, and when there were a fault and a recovery is launched. Before user-space libraries and applications (and our service) are loaded, our driver acts the same way in any situation because the driver is not sure, in which state the system is. That’s why every create, read and write operation is logged with all the metadata. When the service goes online, the driver will provide these logs for further analysis.
我的同事将在下一篇文章中详细描述该服务。 在这篇文章中,我们将披露有关驱动程序开发的一些详细信息。 我们的微型过滤器驱动程序有2种操作模式-系统在正常状态下启动时,以及在发生故障并启动恢复时。 在加载用户空间库和应用程序(以及我们的服务)之前,我们的驱动程序在任何情况下都以相同的方式运行,因为该驱动程序不确定系统处于哪种状态。 这就是为什么所有创建,读取和写入操作都会记录所有元数据的原因。 当该服务联机时,驱动程序将提供这些日志以供进一步分析。

In the case of normal operation, the service will tell the driver to work «on relax mode», which stops it from logging all metadata. Then the driver logs only disk changes and provides the service with these updates. The backup is maintained in the most actual state on the user-defined media by other Acronis tools. It can be cloud, remote, incremental or night-only backup but it is another story.
在正常运行的情况下,该服务将告知驾驶员“在放松模式下”工作,这将阻止其记录所有元数据。 然后,驱动程序仅记录磁盘更改并为服务提供这些更新。 其他Acronis工具将备份以最实际的状态维护在用户定义的介质上。 它可以是云备份,远程备份,增量备份或仅夜间备份,但这是另一回事了。
In the case of the recovery mode, the service tells the driver to work in the “Recovery” mode. During the recovery process, the driver intercepts the requests of partially recovered files checking whether those files are on disk and if they are readable.
在恢复模式下,服务会告知驱动程序以“恢复”模式工作。 在恢复过程中,驱动程序将拦截部分恢复的文件的请求,以检查这些文件是否在磁盘上以及是否可读。
If the file is absent, the mini-filter sends this information to the service, which raises the recovery priority for that file (because the recovery process is also performing in the background). So, the file jumps to the beginning of the queue. The service recovers the file (by itself or using other Acronis tools) and reports “OK” to the driver. The operating system can access the data, and the driver “releases” the original request to the disk.
如果文件不存在,则微型过滤器将此信息发送到服务,这将提高该文件的恢复优先级(因为恢复过程也在后台执行)。 因此,文件跳到队列的开头。 该服务(单独或使用其他Acronis工具)恢复文件,并向驱动程序报告“确定”。 操作系统可以访问数据,驱动程序将原始请求“释放”到磁盘。
If the recovery is not possible, and there’s no such file in the backup, the service reports to the driver. Our mini-filter driver ignores the system request and releases it. Then the OS or application receives the “file not found” error. But it’s ok if the file is really absent on the disk or in the backup, the user just asked for a non existing file by mistake.
如果无法恢复,并且备份中没有此类文件,则该服务会向驱动程序报告。 我们的微型过滤器驱动程序会忽略系统请求并释放它。 然后,操作系统或应用程序会收到“找不到文件”错误。 但是,如果磁盘上或备份中确实不存在该文件是可以的,则用户只是错误地要求了一个不存在的文件。
Of course, the OS will work much slower, because the reading of any file or library takes several steps, possibly with remote data access. That is the price that we pay to be able to start working earlier, despite the ongoing restoration process.
当然,操作系统的运行速度会慢得多,因为读取任何文件或库都需要执行多个步骤,可能需要进行远程数据访问。 这是我们需要付出的代价,尽管恢复过程正在进行中,但能够早日开始工作。
我们需要更深,更深... (We need to move deeper, much deeper...)
The prototype proved the concept but we discovered we needed to dive deeper to avoid deadlocks. This appeared, for example, when OS requested different libraries in several threads, and the service looped back.
原型证明了这一概念,但我们发现我们需要更深入地研究以避免死锁。 例如,当OS在多个线程中请求不同的库,并且服务循环返回时,就会出现这种情况。
I’m currently working on finding a way to raise the Active Restore speed and enhancing system security. In the case that the system requests only part of the file, we developed an additional driver — a storage filter driver operating on the block level. The principle of the operation is the same. In the standard mode, the driver just logs block changes on the disk. However, while on restore mode, it tries to read blocks and request reprioritization from the service in case of failure. All the other parts of the system remain the same. The OS-level service doesn’t even know that there is another driver. Our main goal is to provide the OS with the necessary data, but there’s a field for further development because the service is still not able to operate on the block level.
我目前正在寻找提高Active Restore速度和增强系统安全性的方法。 在系统仅请求文件一部分的情况下,我们开发了一个附加驱动程序-在块级别运行的存储筛选器驱动程序。 操作原理是相同的。 在标准模式下,驱动程序仅将块更改记录在磁盘上。 但是,在还原模式下,它会尝试读取块并在出现故障的情况下向服务请求重新排序。 系统的所有其他部分保持不变。 操作系统级别的服务甚至不知道还有另一个驱动程序。 我们的主要目标是为OS提供必要的数据,但是还有一个需要进一步开发的领域,因为该服务仍无法在块级别上运行。
The next phase is diving to the UEFI level with driver and Native Windows applications with service to start even faster. For that reason, we have developed the UEFI boot driver (DXE driver), which is started and killed even before the OS start-up. Stories about UEFI drivers, their construction, and installation will be discussed in the following posts. Subscribe for our blog and I will be happy to see your comments!
下一阶段是使用驱动程序和带有服务的Native Windows应用程序跳入UEFI级别,以使其启动更快。 因此,我们开发了UEFI引导驱动程序(DXE驱动程序),该驱动程序甚至在操作系统启动之前就已启动并被杀死。 以下帖子将讨论关于UEFI驱动程序,其构造和安装的故事。 订阅我们的博客,我很高兴看到您的评论!
翻译自: https://habr.com/en/company/acronis/blog/496584/
还原卡还原了数据能恢复吗