Rebooting in Recovery: A technical dive...

So recently, most of my posts have been related to either the recent passing of one of our beloved cats, and soon to be the discussions about the pending passing of *another* one of our cats. Yes, that's right, the cats in our household are cursed, apparently. The good news is, the two conditions aren't related, but the bad news is, that doesn't help.

So instead of talking about my personal life, while I've got a few moments of a clear brain, I'm going to go into a bit of a technical dive into the design of Phoenix, as well as the reason for some of the architectural decisions I'm making.

To better understand the "Phoenix Recovery System" as I've been calling it, it's helpful to understand the parts of the system. The first, and most obvious portion of Phoenix is the "recovery mode". Recovery mode is a special mode in Android devices that allows you access to the main partitions of the device to apply updates or repair damage. The second portion, and less obvious to some, is the Android system itself. While some recovery engines have a UI that you can use for preparing the recovery partition to perform some tasks, we're taking it a step further. Phoenix is a complete package, not just a recovery mode with the ability to tack on a UI. The flow between the recovery mode and the application will be seamless and easy. The final portion of the system is the computer portion.

Now this needs a bit more explaining. For some devices, when things go wrong, you can just remove the battery, hold down some buttons, and start up. But some devices aren't this friendly. Most of the Samsung devices are like this. If a problem occurs with the kernel, the device won't boot to recovery, either. And this is where a computer recovery is necessary. And this is why we're investigating Windows and Linux support for bringing these devices back up and restoring a valid image. Since I'm not a Mac developer, we'll have to see what can be done to get Mac users the same full experience.

Because the most important part of the recovery system is the recovery engine itself, that is what I'm going to focus on. So the first question I will answer is, what's wrong with Google's/CWM/Amon_RA/TWRP/etc... Well, at a high level, they are all based on some very simple designs that met the needs of Google at the time. The underlying recovery system is the same between them, with the exception of the nandroid portion. But even the nandroid mechanisms are very similar. And I disagree with their architecture, with the exception of Google who really only needed it to deliver the smallest of functionality.

So what does the design look like? Well, for starters, the whole recovery is being written in C++. Now, I'm very pragmatic about my approach to a language, so don't expect to find a lot of templates of templates of templates. And yes, I realize that with the software not being open-source, you won't find anything at all, but come on, cut me a break here and just follow along. C++ brings to the table the ability to use namespaces and classes. And believe me, they will both be used. Nothing will exist that doesn't either belong to a namespace or a class.

I use a pretty common naming scheme, all interfaces (pure virtual classes which represent abstract models) start with I. So the generic interface for a volume is IVolume. So let me give a good example of how a format call would behave on a device...

IVolume* volume = VolumeManager::FindVolume("/system");
if (!volume) return -1;

return volume->Format();

So what did that do? Well, it called the VolumeManager class, which has a static member of FindVolume, and retrieved the IVolume interface pointer.
But what is the IVolume interface pointer a pointer to? Well, it actually points to an instance of the GenericVolume class. And what does the format function do? Well, let's see...

int GenericVolume::Format(FSTYPE fsType /* = FST_Default */)
{
int ret = 0;
vector<IPartition*>::iterator iter;
for (iter = mPartitions.begin(); iter < mPartitions.end(); iter++)
ret += mPartition->Format(fsType);
return (ret == 0 ? 0 : -1);
}

Well *that* was easy. But also doesn't do anything but iterator through a vector of IPartition pointers, and call their Format member. So what's each IPartition point out? In this example case, it's an older device and the only element of the vector is an MTD partition, named MTDPartition of course.

int MTDPartition::Format(FSTYPE fsType /* = FST_Default */)
{
// For this example, we don't support setting the fsType to anything but default.
if (fsType != FST_Default)
return -1;

// To format this device, we just erase all the blocks, using the MTDBlockReaderWriter class
return getBlockReaderWriter()->Erase();
}

Again, a very simple and straightforward function. But again, doesn't *actually* do anything. But it's getting closer... So what does getBlockReaderWriter do? Let's look...

IBlockReaderWriter* MTDPartition::getBlockReaderWriter()
{
if (!mBlockReaderWriter)
mBlockReaderWriter = new MTDBlockReaderWriter(mDeviceName);
return mBlockReaderWriter;
}

So our final dive into this design pattern is the MTDBlockReaderWriter, which implements the IBlockReaderWriter interface, which describes the erase call...

int MTDBlockReaderWriter::Erase(long int startBlock = 0, long int endBlock = 0xFFFFFFFF)
{
// For simplicity, I'm not going to describe the exact calls to erase either a single or a
// group of MTD blocks, but the implementation of that code goes here. Very easy
// to locate, very easy to unit test.
}

So as you can see with the flow, formatting the system partition of the MTD device may look like it goes down a complex path, but think about how simple that path really is. If this were an EMMC device, the only places of difference would be EMMCPartition and EMMCBlockReaderWriter (if EMMCPartition used the IBlockReaderWriter to perform the format, it's just as likely IFileSystem would handle EMMC so you could format to different file system types). But IFileSystem doesn't know how to read or write to a block device. It gets the IBlockReaderWriter from the IPartition class that instantiates it.

This design is meant to keep every peace of logic restricted to doing *exactly* what that piece of logic is intended to do. Nandroid isn't a piece of logic, it's a massive set of operations against numerous block devices, file systems, etc. But because each volume isolates the underlying access mechanisms, the nandroid backup/restore can focus on doing it's own job, but better.

And how hard is it to support new devices? Well, since a large portion of the design is self-detecting (meaning that we don't need a bunch of config files and flags to tell us what the partition types are and block devices, etc), the code is very easy to launch on different platforms. And this model will extend through every task of Phoenix. The screen has an interface for rendering. The touch panel and buttons have an event interface. There's even entire objects dedicated to management of worker threads.

Even the data storage format for the nandroid backup is kept in this type of interface pattern. There is an INandroidContainer interface which gives access to the contents of a nandroid backup. And to support old TWRP backups, there is a TWRPNandroidContainer. Phoenix has its own container, PhoenixNandroidContainer, which has support for incremental backups. Because the model is designed from the ground up on paper with the interfaces being laid out in advance, we'll be able to develop and test for numerous devices easier than ever before. Unit tests can be combined into packages that people can fastboot on their device which only perform read tests, to ensure core functionality. Write tests can be written which intentionally use known-safe blocks for read/erase/write/compare tests to prove we can erase and write blocks. All these features make implementing new features and functionality easy and far safer than todays recovery systems allow.

Rebooting in Recovery

Friday, January 13, 2012

A technical dive...

No comments:

Post a Comment