Virtualizing Time
Oxide and Friends - A podcast by Oxide Computer Company
Categorie:
Jordan Hendricks joined Bryan and Adam to talk about her work virtualizing time--particularly challenging when migrating virtual machines from one physical machine to another!We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from June 12th, 2023.In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleague Jordan Hendricks.The (lightly edited) live chat from the show:DanCrossNYC: The TSC ticks at a fixed rate now days, regardless of voltage scaling on the CPU.jbk: just x86 doesn't provide a consistent want to determine what the rate isjbk: (I guess some chips will tell you via CPUID, but I've yet to actually encounter such chips)jbk: some hypervisors will tell you via an MSRzorg24: Looks the Linux kernel docs have some documentation on the x86 TSC and PIT https://www.kernel.org/doc/html/next/virt/kvm/x86/timekeeping.htmlDanCrossNYC: CPUID or an MSR, but yeah, most systems sample over a fixed interval (determined by another time source) to figure it out.jbk: no, versus some other present component that allows you to measure the frequencyDanCrossNYC: No, the PIT or HPET or something.jbk: https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/tscc_pit.c?r=236cb9a8jbk: is how it uses the PITjbk: (the HPET code needs to improve it's accuracy, so it's only used when the PIT isn't there at the moment)jbk: some Intel NUCs have no PITjbk: so HPET is the only optionbcantrill: https://github.com/illumos/illumos-gate/commit/717646f7112314de3f464bc0b75f034f009c861eDanCrossNYC: Two big ones: system maintenance without disturbing guest workloads, and also load balancing across a rack."Sevan: ah, thanks.https://github.com/illumos/illumos-gate/blob/717646f7112314de3f464bc0b75f034f009c861e/usr/src/test/bhyve-tests/tests/common/common.c#L166"bcantrill: https://github.com/oxidecomputer/tsc-simulator/tree/masterDanCrossNYC: The guest may well be running NTP itself.iangrunert: I assume you could also check that NTP is alive / has synced recently before doing a migration right?aka_pugs: Do people use IEEE 1588/PTP in datacenters? Maybe finance wackos?zorg24: also it might be tricky to check if NTP synced recently if it is happening in usermodeiangrunert: Might've missed this - is it just the hypervisor that has to run NTP recently or the VM as well?saone: I believe it was just the hypervisorDanCrossNYC: The host.DanCrossNYC: A guest may or may not; that's up to the guest.jbk: but IIUC, if the guest IS running NTP, then the host definitely needs it to avoid any time warpsDanCrossNYC: Yup.DanCrossNYC: Fortunately, there's a bit of an out for the blackout window during migration: SMM mode can effectively pause a machine for an indefinite period of time.DanCrossNYC: We don't USE SMM anywhere, but robust systems software kinda needs to handle the case where the machine goes out to lunch for a minute.zorg24: 🙌 hooray for hardware with no SMM useDanCrossNYC: We have done everything we can to turn it off.ahl: https://github.com/dtolnay/case-studies/blob/master/autoref-specialization/README.mdahl: https://github.com/oxidecomputer/propolisearltea: it worked so well I almost thought the VM didn't migrate 😅saone: It's easy to forget that there's a world outside the cloud, but edge deployments that have physical peripherals hooked up need to maintain those connections to peripherals; migrating those peripherals to cloud environments and managing that integration has been a big challenge for my group.iangrunert: https://signalsandthreads.com/clock-synchronization/ Good listen about clock synchronization and PTP in the ""finance weirdos"" world. MiFID 2 time sync requirements require timestamping key trading event records to within 100 microseconds of UTC.jhendricks: a bit belated, but the propolis side of these changes: https://github.com/oxidecomputer/propolis/commit/7ed480843d3b5cfd9fd07dce41772f8eac4e9171saethlin: The calvalry??saethlin: Are we just going to let that slidesaethlin: Is this a pronunciation situation againzorg24: not the first time I've heard it pronounced that way 🤷saethlin: Well maybe it's me learning this timeDanCrossNYC: CalvaryDanCrossNYC: That's the religious thing.ahl: https://github.com/illumos/illumos-gate/blob/0c5967db436935325af441af2b27d337f4e64af5/usr/src/uts/common/os/cyclic.c#L44zooooooooo: thought this was rust typescript at first 😳DanCrossNYC: Dunno... I missed it. 🙂ahl: * Starting in about 1994, chip architectures began specifying high resolution * timestamp registers. As of this writing (1999), all major chip families * (UltraSPARC, PentiumPro, MIPS, PowerPC, Alpha) have high resolution * timestamp registers, and two (UltraSPARC and MIPS) have added the capacity * to interrupt based on timestamp values. These timestamp-compare registers * present a time-based interrupt source which can be reprogrammed arbitrarily * often without introducing error. Given the low cost of implementing such a * timestamp-compare register (and the tangible benefit of eliminating * discrete timer parts), it is reasonable to expect that future chip * architectures will adopt this feature. aka_pugs: Bryan's TSC is overflowing.DanCrossNYC: That's Tom.DanCrossNYC: Riding in with the cavalry.aka_pugs: Good session.ahl: Thanks...