<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Canopy Blog</title>
  <subtitle>Engineering insights and project updates from the Canopy open-source BMC firmware project.</subtitle>
  <link href="https://canopybmc.org/feed.xml" rel="self"/>
  <link href="https://canopybmc.org/blog/"/>
  <updated>2026-04-16T00:00:00.000Z</updated>
  <id>https://canopybmc.org/blog/</id>
  <author>
    <name>Canopy Team</name>
  </author>
  <entry>
    <title>Why We Are Building Canopy</title>
    <link href="https://canopybmc.org/blog/why-we-are-building-canopy/"/>
    <updated>2026-03-17T00:00:00.000Z</updated>
    <id>https://canopybmc.org/blog/why-we-are-building-canopy/</id>
    <summary>Server firmware shouldn&#39;t be a black box. We are building an open-source BMC firmware platform on OpenBMC — here is where we are and where we are heading.</summary>
    <content type="html"><p>Every server has a small computer inside it that most people never think about. The Baseboard Management Controller — the BMC — is responsible for monitoring hardware health, controlling fans, reading temperatures, managing power, and providing remote access. It runs its own operating system, has its own network stack, and it boots before the host CPU even powers on. If you have ever used iLO, iDRAC, or any other out-of-band management interface, you have talked to a BMC.</p>
<p>The problem is that this critical piece of infrastructure is almost entirely proprietary. The firmware running on your BMC is a black box, written by your server vendor, with no source code, no ability to audit, and no way to fix bugs yourself. You get what the vendor ships, on their schedule, with their priorities.</p>
<p>We think that needs to change.</p>
<h2>What Canopy is</h2>
<p>Canopy is an open-source BMC firmware distribution built on <a href="https://github.com/openbmc/openbmc">OpenBMC</a>. We take upstream OpenBMC, add long-term support, hardware CI testing on real servers, and the integration work needed to run on production platforms.</p>
<p>Our first target is the HPE ProLiant Gen11 platform — specifically the DL360, DL380, and related SKUs running on HPE's GXP SoC. We chose this platform because it is widely deployed, the hardware is capable, and there is real demand for an open firmware alternative.</p>
<p>We are not a fork. We track OpenBMC upstream weekly, rebase continuously, and contribute fixes back. The goal is to stay as close to upstream as possible while providing what upstream alone does not: tested releases, long-term maintenance, and production readiness for specific hardware.</p>
<h2>Where we are today</h2>
<p>Canopy is running on real hardware. Not in simulation, not &quot;coming soon&quot; — on actual HPE ProLiant Gen11 servers in our lab. Here is what works today:</p>
<ul>
<li><strong>Multi-platform support</strong> for 12+ Gen11 SKUs, with runtime device tree overlay selection so a single image supports multiple baseboard variants</li>
<li><strong>PSU monitoring</strong> with a custom <code>gxp-psu</code> hwmon driver we wrote from scratch — 8 sensor channels per power supply (voltage, current, power, temperature, fan speed)</li>
<li><strong>CPU and DIMM temperatures</strong> via Intel PECI and AMD SBTSI</li>
<li><strong>Fan control</strong> with PID thermal regulation through phosphor-pid-control — 7 fans, with GPIO-based presence and fault detection</li>
<li><strong>Host power control</strong> via x86-power-control with platform-specific GPIO timing</li>
<li><strong>BIOS communication</strong> through a clean-room CHIF implementation — SMBIOS table transfer, persistent configuration storage, and BIOS event logging</li>
<li><strong>Full Redfish API</strong> for remote management, plus a Canopy-branded web interface</li>
<li><strong>NVMe drive monitoring</strong> with temperature and FRU data for Samsung, SK hynix, and Micron drives</li>
<li><strong>67 kernel patches</strong> on Linux 6.18, carrying a modern kernel on a platform that was originally shipping with 5.10</li>
</ul>
<p>All of this is upstream-aligned. We use entity-manager for hardware discovery, dbus-sensors for monitoring, and the standard OpenBMC service stack. Where we had to write kernel drivers — like the PSU monitoring driver and the POST code capture driver — we followed upstream coding standards and ran <code>checkpatch.pl --strict</code> before every commit.</p>
<h2>Where we are heading</h2>
<p>The work so far covers the basics. Servers boot, sensors report, fans spin at the right speed, and you can manage the machine over Redfish. But there is more to do:</p>
<ul>
<li><strong>KVM / remote console</strong> — porting the GXP video capture driver to the 6.18 kernel so you can see the host display remotely</li>
<li><strong>IPMI in-band and over LAN</strong> — because a lot of existing tooling still depends on it</li>
<li><strong>BIOS firmware update</strong> via Redfish — flashing host firmware from the BMC</li>
<li><strong>BMC health monitoring</strong> — tracking the BMC's own CPU, memory, and storage usage</li>
</ul>
<p>Further out, we want to expand to additional server platforms and processor architectures. The meta-layer approach makes this practical — the platform-specific configuration lives in JSON and device tree overlays, not in code forks.</p>
<h2>Why a blog</h2>
<p>BMC firmware is one of those areas where the gap between &quot;how things actually work&quot; and &quot;what is publicly documented&quot; is enormous. If you have ever tried to understand how a baseboard management controller discovers hardware, how fan PID loops are tuned, or how the BMC talks to the host BIOS during POST, you know the pain. The information either does not exist, is locked behind vendor NDAs, or is scattered across mailing list archives.</p>
<p>This blog is where we will share what we learn — real code, real configs, and the engineering decisions behind them. You can follow along via <a href="/feed.xml">RSS</a>.</p>
<p>We are building Canopy because we believe server firmware should be open, auditable, and fixable. If that matters to you too, stick around.</p>
</content>
    <author>
      <name>Christian Walter</name>
    </author>
  </entry>
  <entry>
    <title>QEMU to Hardware: Our Testing Approach</title>
    <link href="https://canopybmc.org/blog/qemu-to-hardware-testing-approach/"/>
    <updated>2026-04-16T00:00:00.000Z</updated>
    <id>https://canopybmc.org/blog/qemu-to-hardware-testing-approach/</id>
    <summary>How we test every commit against emulated and real hardware, from a QEMU AST2600 with virtual sensors to an HPE DL320 Gen11 in the lab.</summary>
    <content type="html"><p>BMC firmware has a testing problem. The software runs on constrained embedded hardware, interacts with dozens of physical sensors and buses, and controls safety-critical systems like fan speed and host power. But most OpenBMC development still follows the same pattern: build an image, flash it onto hardware, SSH in, poke around, and hope nothing regressed. There is no <code>npm test</code>. There is no CI that catches a broken Redfish route or a failed user authentication before it reaches the lab. OpenBMC has the Robot Framework test suite, but it is not integrated into upstream CI — most developers never see it run on their patches.</p>
<p>We built Canopy alongside FirmwareCI, so automated testing was not optional — it was built in from day one. Every commit to Canopy tests against a QEMU-emulated AST2600 running the full firmware image, and an HPE DL320 Gen11 in our lab getting its flash re-imaged on every merge to main. This post describes how that works.</p>
<h2>The problem with &quot;just test on hardware&quot;</h2>
<p>Testing exclusively on physical hardware has obvious limitations. Hardware is scarce, flashing is slow, and the feedback loop is measured in hours rather than seconds. Worse, many regressions have nothing to do with hardware at all. A misconfigured systemd unit, a broken Redfish RBAC rule, or a D-Bus service that fails to register — these are software problems that happen to run on a BMC. You should not need a server rack to catch them.</p>
<p>At the same time, testing exclusively in emulation misses an entire class of problems. QEMU does not have real I2C buses, real PECI interfaces, or real power supply PMBus devices. You cannot test whether your entity-manager JSON correctly discovers an HPE GXP SoC's temperature sensors in QEMU, because those sensors do not exist there. Fan PID control is meaningless without real PWM outputs and real tachometer feedback. But you can test Redfish RBAC enforcement, user authentication, and WebUI accessibility — problems that have nothing to do with hardware.</p>
<p>We need both. So we built both.</p>
<h2>Two targets, one repository</h2>
<p>The <code>canopybmc</code> repository defines two machine configurations. The first is <code>hpe-proliant-g11</code>, which is the production target — the actual HPE ProLiant Gen11 platform running on HPE's GXP baseboard management controller. The second is <code>canopy-qemu</code>, a machine configuration that targets the AST2600 EVB in QEMU:</p>
<pre><code># meta-canopy/conf/machine/canopy-qemu.conf
KERNEL_DEVICETREE = &quot;aspeed/aspeed-ast2600-evb.dtb&quot;
UBOOT_MACHINE = &quot;ast2600_openbmc_spl_defconfig&quot;
UBOOT_DEVICETREE = &quot;ast2600-evb&quot;
FLASH_SIZE = &quot;65536&quot;
PREFERRED_PROVIDER_virtual/kernel = &quot;linux-aspeed&quot;
</code></pre>
<p>The QEMU machine deliberately excludes the HPE GXP layers. It uses the upstream Aspeed kernel, the standard AST2600 EVB device tree, and a 64 MB SPI flash layout. This is intentional — the QEMU target is not trying to emulate the HPE hardware. It is testing the Canopy distribution layer: our distro configuration, our security hardening, our user management policies, our Redfish interface, our service stack. Everything that is not platform-specific.</p>
<p>Both machines share the same <code>canopy</code> distro configuration:</p>
<pre><code># meta-canopy/conf/distro/canopy.conf
DISTRO = &quot;canopy&quot;
DISTRO_NAME = &quot;CanopyBMC (based on Phosphor OpenBMC)&quot;
DISTROOVERRIDES .= &quot;:canopy&quot;
</code></pre>
<p>This means the QEMU image and the hardware image are built from the same codebase, the same package versions, and the same distro policies. When we test user RBAC enforcement in QEMU, the result applies to hardware too, because the Redfish stack, the user manager, and the access control configuration are identical.</p>
<h2>GitHub Actions: build both, test both</h2>
<p>Every push to <code>main</code> and every pull request triggers two GitHub Actions workflows. Each workflow builds the firmware image on a self-hosted runner (we call these Hydra nodes), then hands the resulting <code>.static.mtd</code> binary off to FirmwareCI for testing.</p>
<p>The QEMU workflow is straightforward — build, upload the artifact, trigger the test pipeline:</p>
<pre class="language-yaml"><code class="language-yaml"><span class="token comment"># .github/workflows/build-canopy-qemu.yml</span>
<span class="token key atrule">jobs</span><span class="token punctuation">:</span>
  <span class="token key atrule">build</span><span class="token punctuation">:</span>
    <span class="token key atrule">runs-on</span><span class="token punctuation">:</span> <span class="token punctuation">[</span>self<span class="token punctuation">-</span>hosted<span class="token punctuation">,</span> Hydra<span class="token punctuation">,</span> Large<span class="token punctuation">]</span>
    <span class="token key atrule">steps</span><span class="token punctuation">:</span>
      <span class="token punctuation">-</span> <span class="token key atrule">uses</span><span class="token punctuation">:</span> actions/checkout@v6
        <span class="token key atrule">with</span><span class="token punctuation">:</span>
          <span class="token key atrule">submodules</span><span class="token punctuation">:</span> recursive
      <span class="token punctuation">-</span> <span class="token key atrule">uses</span><span class="token punctuation">:</span> ./.github/actions/build
        <span class="token key atrule">with</span><span class="token punctuation">:</span>
          <span class="token key atrule">board</span><span class="token punctuation">:</span> canopy<span class="token punctuation">-</span>qemu

  <span class="token key atrule">firmwareci</span><span class="token punctuation">:</span>
    <span class="token key atrule">needs</span><span class="token punctuation">:</span> build
    <span class="token key atrule">steps</span><span class="token punctuation">:</span>
      <span class="token punctuation">-</span> <span class="token key atrule">uses</span><span class="token punctuation">:</span> actions/download<span class="token punctuation">-</span>artifact@v8
        <span class="token key atrule">with</span><span class="token punctuation">:</span>
          <span class="token key atrule">name</span><span class="token punctuation">:</span> obmc<span class="token punctuation">-</span>phosphor<span class="token punctuation">-</span>image<span class="token punctuation">-</span>canopy<span class="token punctuation">-</span>qemu.static.mtd
      <span class="token punctuation">-</span> <span class="token key atrule">uses</span><span class="token punctuation">:</span> docker<span class="token punctuation">:</span>//firmwareci/action<span class="token punctuation">:</span>v5.2
        <span class="token key atrule">with</span><span class="token punctuation">:</span>
          <span class="token key atrule">WORKFLOW_NAME</span><span class="token punctuation">:</span> $
          <span class="token key atrule">BINARIES</span><span class="token punctuation">:</span> firmware=obmc<span class="token punctuation">-</span>phosphor<span class="token punctuation">-</span>image<span class="token punctuation">-</span>canopy<span class="token punctuation">-</span>qemu.static.mtd</code></pre>
<p>The hardware workflow is similar but the build is more involved. HPE ProLiant Gen11 systems use a GXP SoC with a secure boot chain that requires signed firmware. The build step injects a signing key and appends the GXP bootblock, producing a complete 32 MB flash image directly. The resulting binary is uploaded to FirmwareCI without any post-build processing.</p>
<p>The hardware pipeline also has a <code>firmwareci-main</code> job that only runs on pushes to <code>main</code> (not on PRs). This triggers a more comprehensive test suite that includes boot timing regressions, fan control validation, and VUART console verification — tests that take longer and exercise the physical server more aggressively.</p>
<h2>FirmwareCI: the test execution engine</h2>
<p><a href="https://firmwareci.com">FirmwareCI</a> runs the tests. GitHub Actions builds the images, uploads them to FirmwareCI, and FirmwareCI handles device control, test execution, and result reporting. We define two device-under-test (DUT) configurations in the repository: <code>dut-canopy-qemu</code> and <code>dut-hpe-dl320</code>.</p>
<p>The QEMU DUT starts a <code>qemu-system-arm</code> process with an AST2600 EVB, forwards SSH to port 2222 and Redfish to port 2443, waits 100 seconds for boot, then runs the test suite.</p>
<p>The hardware DUT uses <a href="https://github.com/BlindspotSoftware/dutctl"><code>dutctl</code></a> to control a physical HPE DL320 Gen11 in the lab. The test sequence: power off, write the firmware image to a flash emulator connected to the server's SPI bus, power on, wait for the &quot;Phosphor OpenBMC&quot; boot banner on the serial console (10-minute timeout), then run the test suite. This is hardware-in-the-loop testing — the BMC boots real firmware on a real GXP SoC.</p>
<h2>What we test</h2>
<h3>QEMU: 12 tests covering the platform-independent stack</h3>
<p>The QEMU test suite validates everything that does not require real hardware. Every test connects via SSH or Redfish.</p>
<p><strong>Boot, D-Bus, networking.</strong> Clean boot (zero failed units), core D-Bus services registered (<code>Inventory.Manager</code>, <code>Logging</code>, <code>Settings</code>), bmcweb and IPMI running, eth0 configured.</p>
<p><strong>User management.</strong> Six tests covering CRUD operations via Redfish, RBAC enforcement (ReadOnly/Operator restrictions), password policy, session management, account limits, and root account disable/re-enable.</p>
<p><strong>Event logging and web interface.</strong> Event creation/retrieval/deletion at all severity levels, logging service restart resilience, HTTPS connectivity, TLS validation, Redfish authentication enforcement.</p>
<h3>Hardware CI: 13 tests for platform-specific validation</h3>
<p>The hardware CI test suite runs on every pull request.</p>
<p><strong>Service health and D-Bus.</strong> 24 systemd services must start (EntityManager, CHIF, sensors, power control, bmcweb, FRU, logging, etc.). Platform-specific D-Bus names registered (<code>GxpChif</code>, <code>Smbios.MDR_V2</code>, fan/PSU/CPU sensors).</p>
<p><strong>Inventory and power control.</strong> Host powers on via Redfish, POST completes, inventory populated (20+ sensors, CPU/DIMM/PSU data, system identity). Full power cycle: On → verify state → ForceOff → On again, console output validated at each step.</p>
<p><strong>KVM.</strong> Video capture driver (<code>/dev/video0</code>), UDC driver, and HID gadgets (<code>/dev/hidg0</code>, <code>/dev/hidg1</code>). HID devices only appear when a VNC client connects, so the test opens a TCP connection to port 5900, waits 3 seconds for gadget binding, then checks for the devices. Userspace validation: <code>obmc-ikvm</code> service args, VNC port listening, ConfigFS configuration, bmcweb WebSocket endpoint.</p>
<p><strong>CHIF.</strong> I2C proxy and PlatDef download handlers. Verifies BMC journal shows PlatDef extraction and I2C segment mapping at startup, host console log has zero CHIF response-format errors after POST.</p>
<h3>Hardware main: 8 deep tests on merge to main</h3>
<p><strong>Boot time regression.</strong> Reflash, power on, assert boot banner within 80 seconds and SSH within 25 seconds after that. Baseline: 74s to banner, 19s to SSH. Catches service dependency or kernel changes that slow boot.</p>
<p><strong>Fan control and user persistence.</strong> PID control active, 5 PWM channels reporting 5-100% range. User creation via Redfish survives BMC power cycle (catches filesystem persistence issues).</p>
<p><strong>VUART console stack.</strong> Full console pipeline: <code>obmc-console-server</code>, udev symlinks, Unix socket, SSH console on port 2200, log file. Power cycle host, verify log grew and contains POST output.</p>
<p><strong>VUART kernel and data path.</strong> Driver-level validation: <code>ttyS3</code> registered as 16550A, sysfs attributes correct, raw data capture from <code>/dev/ttyS3</code> during boot (stops obmc-console, uses <code>dd</code>, verifies bytes + interrupt count + POST strings). TX path: write to <code>/dev/ttyVUART0</code> and <code>obmc-console-client</code>, confirm <code>/proc/tty/driver/serial</code> TX counter increments.</p>
<p><strong>SMBIOS and CHIF stability.</strong> After POST, verify D-Bus objects for TPM, PCIe slots, BIOS, DIMM, CPU. Then 3 host reboot cycles via SSH, checking for zero CHIF errors, zero PlatDef errors, CHIF service never restarted.</p>
<h2>What this catches</h2>
<p>In practice, the two-tier approach catches different failure modes at different stages.</p>
<p>The QEMU tier catches packaging errors (a recipe dropped from the image), configuration errors (a D-Bus service file with the wrong bus name), Redfish API regressions (a bmcweb change that breaks authentication), and security regressions (RBAC rules that stopped enforcing). These are typically caught on PRs, before the code ever touches hardware.</p>
<p>The hardware tier catches integration failures: an entity-manager JSON that does not match the actual I2C topology, a kernel driver that fails to probe on the real SoC, a fan control configuration that produces out-of-range PWM values, a VUART driver that works in isolation but breaks the console pipeline. These are caught either on PRs (the CI suite) or on merge to main (the deep suite).</p>
<p>The boot time regression test is worth calling out specifically. OpenBMC boot time tends to creep upward as features are added — a new service here, a new dependency there. The 80-second hard limit forces us to notice and address boot time regressions as they happen, rather than discovering months later that the BMC now takes two minutes to boot.</p>
<h2>What is next</h2>
<p>We are expanding the QEMU sensor model to include more of the emulated I2C devices so we can test entity-manager discovery and dbus-sensors without hardware. We are also integrating the DMTF Redfish Service Validator into the pipeline — the storage configuration for it is already in the repository — to catch Redfish schema compliance issues automatically.</p>
<p>The longer-term goal is to make every test that runs on hardware also runnable in QEMU for the cases where the test is checking software behavior rather than hardware integration. The less often an engineer has to wait for a hardware flash cycle to find out their change broke something, the faster we all move.</p>
<p>The test definitions are all in the <code>canopybmc</code> repository under <code>.firmwareci/</code>. If you are evaluating Canopy for your platform, they serve as a concrete reference for what we validate on every commit.</p>
</content>
    <author>
      <name>Christian Walter</name>
    </author>
  </entry>
</feed>
