22-08-2018 | Remy van Elst | Text only version of this article
Are you running Linux on Microsoft Azure? Then by default anyone with access to your Azure portal can run commands as root in your VM, reset SSH keys, user passwords and SSH configuration. This article explains what the backdoor is, what it is meant to do, how it can be disabled and removed and what the implications are.
If you like this article, consider sponsoring me by trying out a Digital Ocean VPS. With this link you'll get a $5 VPS for 2 months free (as in, you get $10 credit). (referral link). This link is on all my articles, but Digital Ocean also install an agent. However, they have a checkbox to disable it.
Azure is Microsoft's Cloud platform. It provides virtual machines and related datacenter virtualization, next to software as a service (hosted stuff like databases). I currently work on a project where an hosted application platform is built on Azure using Linux (Ubuntu, CentOS), and recently found out about this backdoor. Or, useful feature, since it's not just a blatant and deliberate backdoor
I have no idea how the situation on Windows on Azure is, and have not researched this.
Azure, as any other good Cloud provider, has images which you use to create new VM's (sometimes calles VPSes, instances, droplets). It speeds up the rollout of new VM's, because mounting an ISO and manually installing a VM is time-consuming. In that image, they often change stuff to make it run better on their cloud. When I created images for an OpenStack provider, we pre-installed
haveged for example. Microsoft does that as well, they install their own agent, the
wa-linux-agent. Later on in this article I explain what this agent is meant to do. It is not just a backdoor, it provides actual useful features. One of those features is root access outside of the VM.
In the Azure portal one can login and configure their Azure cloud. For this project I use Ansible and Terraform, so I don't have to regularly interact with this webpage (which is kind of slow to work with). However, I also have a personal Azure account for playing around, which is where I found this feature.
Via the Azure portal one can execute commands as root inside a VM and change SSH keys, user passwords and SSH configuration.
I'll let the pictures speak for themself:
Scripts run by default as elevated user on Linux
Think of the Azure VMAccess extension as that KVM switch that allows you to access the console to reset access to Linux or perform disk level maintenance.
Microsoft puts this feature in a positive way so that it looks less like a backdoor:
The disk on your Linux VM is showing errors. You somehow reset the root password for your Linux VM or accidentally deleted your SSH private key. If that happened back in the days of the datacenter, you would need to drive there and then open the KVM to get at the server console. Think of the Azure VMAccess extension as that KVM switch that allows you to access the console to reset access to Linux or perform disk level maintenance. This article shows you how to use the Azure VMAccess Extension to check or repair a disk, reset user access, manage administrative user accounts, or update the SSH configuration on Linux when they are running as Azure Resource Manager virtual machines.
For command execution as well:
This capability is useful in all scenarios where you want to run a script within a virtual machines, and is one of the only ways to troubleshoot and remediate a virtual machine that doesn't have the RDP or SSH port open due to improper network or administrative user configuration.
I believe that this is a useful feature. I however also am of the opinion that this is a backdoor. It is not made obvious that this agent is running in your VM or that it provides root access. Only after looking (in the output of
ps aux and in the Microsoft docs) I found out what it is and what risks are connected to it. Took me a good hour.
A tickbox when deploying VM's (or API flag) to disable this agent would be nice.
Anybody with access to VM's in the Azure portal is able to execute commands as root inside any VM they have access to.
This also means anybody working for or on Microsoft (Azure) is able to run commands as root inside your VM. (They can already take a live snapshot of your VM with RAM included, but that is a know risk you take when using a cloud/VPS provider. So consider all your private keys, certificates and data compromised as soon as you don't control the entire chain of equipment).
Microsoft Azure however is audited regularly and has an ISO 27001 certificate, so let's hope they don't abuse this power.
If you are the only one with an Azure portal account, the impact is probably not that bad.
If you have multiple people working on a project, (multiple people having access to the portal), the risk is larger. Any one of those people (and all that had access to the portal in the past, like contractors or employees that moved on), has (had) root access to all your VM's.
Any one (or all) of your VM's could be compromised. Maybe your manager has access to the portal but not SSH, or not root, and want's to put you in a bad position. Installing that rootkit or cryptocoin miner under your account and removing all the logging just got way easier.
The agent is just a package, so using your package manager you can remove it:
# dpkg -l | grep walinuxagent ii walinuxagent 2.2.21+really2.2.20-0ubuntu1~16.04.1 amd64 Windows Azure Linux Agent # rpm -qa | grep LinuxAgent WALinuxAgent-2.2.18-1.el7.centos.noarch
apt-get purge walinuxagent
yum remove WALinuxAgent
If you just want to stop the service (for example, to see what the impact is), you can do so using your init system of choice:
# systemctl list-unit-files | grep agent waagent.service enabled systemctl disable waagent systelctl stop waagent
# systemctl list-unit-files | grep agent walinuxagent.service enabled systemctl stop walinuxagent systemctl disable walinuxagent
I believe your commentary has more to do with RBAC than with the agent, but I'm trying to better understand your concerns.
Even if you remove waagent (link to the code on GitHub, thanks for mentioning the code is open source in your post) from an Azure VM, an administrator in your subscription could lock you out with a firewall rule, can restart or stop the VM or can delete it altogether.
If your primary partition isn't encrypted, they can stop the VM and attach that disk to another running VM they control and change user passwords, SSH configuration, etc. And without getting into those weeds, even without
waagent you can pass custom data to
waagent aren't mutually exclusive in Azure) using, say, the Azure CLI.
In an organizational setting (a team, a company) it's likely you as the VM operator have been granted less permissions than the administrator of the subscription, so it'd actually be expected that they (the administrator) can perform operations on your VM. Your subscription (your team) shouldn't be an adversarial scenario, but there are still ways the team can use RBAC here.
If you run
az provider operation list you'll see that adding an extension to a virtualMachine is an operation you can actually write a custom role for. If you're trying to enforce rather than delegate, you can also use a custom Azure Policy. All of those methods are enforced at the API level, so the end result is the same whether you use the portal, the CLI or 3rd party tools like Ansible or Terraform.
It's also worth mentioning that when you run a custom script from the portal, the operation is not only logged in the VM but also in the Activity Log for that resource in Azure itself - even if that was someone else in your team that is a subscription administrator.
If you open up SSH and have concerns that someone could manipulate SSH or PAM configuration from outside the VM, there are Azure features such as just in time access that are designed to help you exactly with that. But I still think your commentary has more to do with RBAC than it has with
waagent or the
VMAccess extension. Other redditors have commented, like you did, that there's a troubleshooting aspect to this. It is true that many features you see in the portal such as the ability to reset SSH configuration, run a particular command or see the serial console output are used for troubleshooting (including when guided by our own Linux Support Escalation team) but there are also there for composability.
By that I mean someone that has heavily automated/scripted their Linux setup in Azure and instead of maintaining their own custom image (a documented scenario including extensive discussion on the agent) they rely on standard images and attach extensions with custom scripts,
cloud-init custom data, pulling SSH keys from Azure Key Vault (or using a third-party tool like Hashicorp's Vault) or using the AAD login extension (in preview, and not much to do with the AD we love to hate) You said it's not made obvious this agent runs in the VM, and imply that the backdoor nature is concealed.
I personally always make the point to introduce the agent at any public presentation (including recently at OSCON) so I'm genuinely interested in your suggestions for changes in documentation, portal prompts, etc., so people are more aware that this agent is running and how it helps them? I work on Azure.
(Edit - adding details on Azure Policy and Activity Log)
When executing a command, the following appears on Ubuntu in
2018/08/22 11:58:07.546949 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-188.8.131.52] Target handler state: enabled 2018/08/22 11:58:07.657715 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-184.108.40.206] [Enable] current handler state is: enabled 2018/08/22 11:58:07.769027 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-220.127.116.11] Update settings file: 1.settings 2018/08/22 11:58:07.883364 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-18.104.22.168] Enable extension [./vmaccess.py -enable] 2018/08/22 11:58:08 VMAccess started to handle. 2018/08/22 11:58:08 [Microsoft.OSTCExtensions.VMAccessForLinux-22.214.171.124] cwd is /var/lib/waagent/Microsoft.OSTCExtensions.VMAccessForLinux-126.96.36.199 2018/08/22 11:58:08 [Microsoft.OSTCExtensions.VMAccessForLinux-188.8.131.52] Change log file to /var/log/azure/Microsoft.OSTCExtensions.VMAccessForLinux/184.108.40.206/extension.log 2018/08/22 11:58:10.012703 INFO Event: name=Microsoft.OSTCExtensions.VMAccessForLinux, op=Enable, message=Launch command succeeded: ./vmaccess.py -enable, duration=2025 2018/08/22 11:58:10.161467 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.0] Target handler state: enabled 2018/08/22 11:58:10.212205 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.0] [Enable] current handler state is: enabled 2018/08/22 11:58:10.262973 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.0] Update settings file: 1.settings 2018/08/22 11:58:10.313158 INFO [Microsoft.CPlat.Core.RunCommandLinux-1.0.0] Enable extension [bin/run-command-shim enable] 2018/08/22 11:58:11.444806 INFO Event: name=Microsoft.CPlat.Core.RunCommandLinux, op=Enable, message=Launch command succeeded: bin/run-command-shim enable, duration=1031 2018/08/22 11:58:11.704326 INFO Event: name=WALinuxAgent, op=ProcessGoalState, message=Incarnation 5, duration=4836
When changing an SSH key, the following is in that same log:
2018/08/22 11:27:00.603980 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-220.127.116.11] Target handler state: enabled 2018/08/22 11:27:00.713193 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-18.104.22.168] [Enable] current handler state is: notinstalled 2018/08/22 11:27:01.042711 INFO Event: name=Microsoft.OSTCExtensions.VMAccessForLinux, op=Download, message=Download succeeded, duration=0 2018/08/22 11:27:01.364254 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-22.214.171.124] Initialize extension directory 2018/08/22 11:27:01.509190 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-126.96.36.199] Update settings file: 0.settings 2018/08/22 11:27:01.656294 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-188.8.131.52] Install extension [./vmaccess.py -install] 2018/08/22 11:27:05 VMAccess started to handle. 2018/08/22 11:27:05 [Microsoft.OSTCExtensions.VMAccessForLinux-184.108.40.206] cwd is /var/lib/waagent/Microsoft.OSTCExtensions.VMAccessForLinux-220.127.116.11 2018/08/22 11:27:05 [Microsoft.OSTCExtensions.VMAccessForLinux-18.104.22.168] Change log file to /var/log/azure/Microsoft.OSTCExtensions.VMAccessForLinux/22.214.171.124/extension.log 2018/08/22 11:27:05.788982 INFO Event: name=Microsoft.OSTCExtensions.VMAccessForLinux, op=Install, message=Launch command succeeded: ./vmaccess.py -install, duration=0 2018/08/22 11:27:06.105463 INFO [Microsoft.OSTCExtensions.VMAccessForLinux-126.96.36.199] Enable extension [./vmaccess.py -enable] 2018/08/22 11:27:06 VMAccess started to handle. 2018/08/22 11:27:06 [Microsoft.OSTCExtensions.VMAccessForLinux-188.8.131.52] cwd is /var/lib/waagent/Microsoft.OSTCExtensions.VMAccessForLinux-184.108.40.206 2018/08/22 11:27:06 [Microsoft.OSTCExtensions.VMAccessForLinux-220.127.116.11] Change log file to /var/log/azure/Microsoft.OSTCExtensions.VMAccessForLinux/18.104.22.168/extension.log 2018/08/22 11:27:08.271645 INFO Event: name=Microsoft.OSTCExtensions.VMAccessForLinux, op=Enable, message=Launch command succeeded: ./vmaccess.py -enable, duration=0 2018/08/22 11:27:08.438678 INFO Event: name=WALinuxAgent, op=ProcessGoalState, message=Incarnation 2, duration=8933
Since this is a root access backdoor, the logging on the VM can be compromised. If you have a centralized logging system, now would be a good time to check if any of your VM's could have been exploited with this feature.
This agent is a piece of software created by Microsoft to make life in "the cloud" easier. OpenStack and Digital Ocean for example have a comparable piece of software called cloud-init and the
qemu-guest-agent. The code is open source, can be found here on github.
It states the following features:
Since OpenStack is not as widespread as Azure and cloud-providers all build their own images, the impact of this is much lower.
The OpenStack provider I used to work for included this in their images (since it can help freeze the VM when a snapshot is made, to keep data consistent).
Using the nova set-password command one can reset a user password via this agent.
Bottom line, inspect the software running in your VM before you put it in production. Check daemons and agents you don't know, check for rouge SSH keys/users, make use of the firewall, build multiple layers of security, defense in depth, and most important, use your head.