SuperTuner Technical Documentation

1.What SuperTuner is

SuperTuner is a chromatic instrument tuner combined with a polyphonic reference-tone generator, built specifically for iPhone and iPad. It is designed for serious musicians who need a tuner that is accurate, fast, robust in noisy environments, and respectful of the iOS device's audio hardware limitations.

The product is intentionally small and focused. It does one job, tuning instruments, and pairs that with a high-fidelity reference-tone surface so the user can sound out chords, intervals, or unison drones against their instrument. It is not a recording app, DAW, effects suite, looper, or multi-tool.

What's in the box, conceptually:

A chromatic pitch detector accurate across a four-octave musical range, designed to track low-bass fundamentals (down to ~30 Hz) and to recover gracefully from noisy, harmonic-rich, or partially-masked signals.
A three-voice polyphonic reference-tone surface with strum-style decay and per-finger touch tracking. It plays sustained reference pitches for any note in the active tuning preset.
Three primary visual meters (LED arc, chromatic note wheel, stroboscopic harmonic display) and three sub-meter readouts (chromatic strip, info bar, ±100 cents fine-tune ruler). The meters can be paired or cycled independently.
13 instrument groups containing 100+ tuning presets, from standard guitar through orchestral strings, brass, woodwinds, and a wide collection of world and folk instruments (sitar, bouzouki, charango, oud, dilruba, esraj, and more).
Ten reference-pitch calibration presets (415 Baroque through 444 European Bright) plus a free 400 to 480 Hz slider.
A skeuomorphic CRT-style visual design housed in a physical-looking device chassis. This is the PeaceDrone hardware aesthetic: vintage gear language, not a glassy modern app.
A power button that fully halts audio processing while keeping the app open, for thermal and battery management during long tuning sessions.

↑ Back to top

2.Privacy and data handling

SuperTuner makes the following hard commitments, all verifiable by reading the codebase:

No analytics SDKs. None, no Firebase, no Mixpanel, no Amplitude, no Apple Analytics opt-in. No event tracking of any kind.
No network calls. The codebase contains no URLSession, no HTTP client, no networking imports. The app cannot phone home because it has no mechanism to.
No third-party SDKs or trackers. No Swift Packages, no CocoaPods, no Carthage. Every line of audio DSP and UI code is in-tree.
No subscriptions, no in-app purchases, no advertising.
No user account. There is nothing to sign up for.
No backups, no cloud sync, no shared state across devices. Settings are persisted locally only, via standard UserDefaults / @AppStorage.

The only system permission SuperTuner requests is microphone access, used solely for live pitch detection on the audio thread. Audio is processed in real time and is not buffered to disk, uploaded, or transmitted anywhere.

This stance is structural. To "add analytics later" would require importing a networking framework that currently has no presence in the project.

↑ Back to top

3.User-facing features

Pitch detection

When the user plays a note, SuperTuner displays:

The note name (e.g. A4, C#3).
The detected frequency in Hz, smoothed for stability.
The cents offset from the target preset note (e.g. +12.3¢ or −4.7¢).
An in-tune indicator when pitch locks within ±5 cents of the target.

The displayed pitch is latched. Once a note is identified, the display holds on it until either silence is detected for ~200 ms or a different note is confirmed by several consecutive detection frames. This makes the readout stable enough to read against rather than flickering between candidates.

When a non-chromatic tuning preset is active, the target note is the nearest valid note in the preset. For example, in Guitar Standard (E A D G B E), if the user plays something at 110 Hz the detector recognises it as A2 (a low E2 being too far off) and shows cents offset from A2, not from the chromatic nearest semitone. This means the cents reading is always relative to the note the user is actually trying to tune.

The three primary meters

A "Mode" button cycles through three paired meter combinations. Each combination links an upper meter (the large CRT display) with a lower sub-meter (the small bar below). Tapping the upper meter or the lower sub-meter individually also cycles only that face.

Mode	Upper meter	Lower sub-meter
1	LED arc	Chromatic strip
2	Note wheel	Fine-tune ruler (±100¢)
3	Stroboscope	Info bar

LED arc meter

A 19-segment arc of LED-style lights, with 9 red on each side and a central green. The user reads tuning by which side lights up and how far out: zero lit means within ±5 cents; one segment lit each side of centre indicates a small adjustment is needed; more segments mean a larger correction. The centre LED glows white when the note is in-tune.

The classic LED tuner readout, but rendered using cached arc-length geometry so the segments stay correctly placed on the curved CRT chassis regardless of device size.

Note wheel

A large rotating chromatic dial. The full circle holds all twelve semitones; the visible arc shows the seven nearest semitones around the currently-detected note. The wheel "rotates" continuously as pitch drifts, with the centre triangle indicator at twelve o'clock pointing at the current note. The target note glows white when in-tune.

The wheel's geometry is mathematically locked to the CRT chassis shape itself: the rim lines, label baselines, and tick fan all live on parallel translations of the CRT chassis arch curve. Tick lines fan radially toward a virtual wheel centre below the canvas, so every label and its centre tick lie on the same radial spoke, verifiable by eye at any rotation.

Stroboscopic meter

Four horizontal harmonic bands. Each band's brightness tracks the energy at its respective harmonic of the target note (1st, 2nd, 3rd, 4th). Each band's horizontal motion shows the phase deviation of that harmonic from perfect tuning. When the note is in tune, all bands stop moving. Sharp pitch makes bands drift one direction; flat pitch the other.

A strobe-style readout, but bounded by a photosensitivity safety: phase velocity is capped at 120 rad/sec (~19 Hz visible flicker), staying well above the 3 to 30 Hz photosensitive-epilepsy risk window.

The three sub-meters

Chromatic strip

A horizontal strip showing the nine nearest chromatic notes around the target preset note. Two triangle indicators (top and bottom) glide horizontally to show where the detected pitch sits relative to the target. When the indicators sit dead-centre and turn solid white, the note is in tune.

Info bar

A three-column readout:

Left: Detected frequency in Hz (e.g. 440.2 Hz)
Centre: Note name with flat/sharp triangle pair indicators that light progressively as deviation grows (0 within ±5¢, 1 at 5 to 25¢, 2 at 25+¢)
Right: Cents offset (e.g. +12.5¢)

Fine-tune ruler

A static ±100 cents graduated scale with a moving vertical line + triangle indicator. The graduation marks have a hierarchy of heights so the player can read at a glance: tallest at 0¢ (centre), then 25¢ multiples, then 10¢, then 5¢, then individual cents. The indicator pins to the edge at ±100¢ rather than disappearing off-screen.

Polyphonic ribbon controller

The bottom of the screen is a touch-sensitive "ribbon" with one key per note in the currently-active tuning preset. Tap a label to latch a reference tone. It plays continuously until tapped again. Slide a finger across the ribbon to play notes transiently as you go (strum-style). Multiple fingers play multiple notes simultaneously.

The ribbon enforces a hard three-voice cap. Latching a fourth note (or moving a transient touch onto a fourth distinct key) automatically releases the oldest latched note. The UI updates to reflect this immediately. What you see lit is exactly what is sounding.

Latching is deliberately gated to tap-on-label: a quick press-and-release on the visible note-name area. Sliding, dragging, or holding does not latch. It strums or sustains. This separation prevents accidental latches during expressive finger gestures.

Multi-touch is fully independent. Putting a second finger on the ribbon while the first is held does not interrupt or modify the first finger's behaviour in any way. Each touch is tracked individually from the moment it lands to the moment it lifts, with no inter-touch interference.

Mode and Power buttons

Mode cycles the meter+sub-meter combination (see 3.2). Tapping the meter face directly also advances the upper meter on its own, and tapping the sub-meter advances only the sub-meter.
Power halts audio processing while keeping the app open. The CRT display goes dark, the ribbon dims, the Power button glows red. The audio engine fully stops after a 300 ms fade so no clicks are audible mid-strum, and CPU/battery drop to baseline. Tapping Power again silently restarts the engine. Useful for putting the app aside between songs without burning battery or thermal headroom.

Instrument presets

13 instrument groups containing 100+ tunings:

Group	Examples
Open/Chromatic	Latches any note across the chromatic range
Guitar	Standard EADGBE, Drop D, DADGAD, Open G, Open D, Nashville, math-rock variants. 17 tunings total.
Extended Guitar	Baritone B standard, A standard, Drop A; 7- and 8-string
Bass Guitar	4/5/6-string standard, Eb, D, Drop D
Orchestral String	Violin, viola, cello, double bass (solo and standard)
Mediterranean Folk	Mandolin, mandola, Irish bouzouki, Greek bouzouki, bandurria
Ukulele	Soprano, tenor, baritone, U-Bass
Banjo	Open G, Sawmill, C, Double-C, Double-D, A tuning, plectrum, tenor
Steel Guitar	Lap C6, E7, Open E, Open D, dobro, pedal E9, pedal C6
Latin Folk	Charango, ronroco, cuatro (Venezuelan, Puerto Rican), jarana, cavaquinho, and others
Indian Classical	Sitar (Kaharaj, Gandhar), tanpura, sarod, surbahar, veena, dilruba, esraj, erhu, sarangi
Lute	Renaissance, baroque, theorbo, vihuela, cittern, oud (Arabic, Turkish)
Orchestral Woodwind	Flute, piccolo
Reed	Oboe, English horn, clarinet (Bb, A), bass clarinet, bassoon, contrabassoon, saxes (soprano, alto, tenor, baritone)
Brass	Trumpet, cornet, French horn, trombone (tenor, bass), euphonium, tuba (Bb, C)

Each tuning preset stores its open-string note set as MIDI numbers. The detector uses this set to compute the target note, the nearest preset note to the detected pitch, for cents-offset readout and for the chromatic strip / fine-tune ruler. The Open/Chromatic preset disables this constraint and reports the nearest chromatic semitone instead.

The currently-selected preset persists between sessions.

Reference pitch calibration

Ten factory presets for tuning standard (A above middle C):

Preset	Hz	Use
415 Baroque	415.0	Period instrument performance
416 Half-Step Down	416.0	Half-step-down ensembles
430.54 Scientific	430.54	Scientific pitch (C4 = 256 Hz)
432 Verdi	432.0	Verdi tuning, "natural" pitch advocates
440 Standard	440.0	ISO concert pitch
441 Boston Symphony	441.0	Boston Symphony Orchestra
442 NY Philharmonic	442.0	NY Philharmonic, many European orchestras
443 European Concert	443.0	Some European orchestras
444 European Bright	444.0	Higher-pitched European concert standard
Free	400 to 480	Custom value via slider, 0.1 Hz granularity

Both the selected preset and the current Hz value are persisted between sessions.

Adaptive layout

Four distinct layouts are selected automatically based on the device idiom and the viewport's actual width, not just the device class. This means iPad Split View / Slide Over windows that drop below the iPad width threshold get the iPhone layout, which scales down cleanly.

iPhone portrait: vertical stack: header, meter, sub-meter, mode/power buttons, ribbon
iPhone landscape: two-column layout matching the iPad landscape proportions
iPad portrait: vertical stack with iPad-specific scaling and the wider one-line SuperTuner wordmark
iPad landscape: two-column with the meter + buttons on the left and the ribbon on the right

The CRT chassis shape is the same in every layout; only its size and the surrounding chrome differ.

Splash screen and chime

On launch, a brief splash sequence runs:

The SuperTuner two-line wordmark and the PeaceDrone pd lettermark fade in (~0.4 s ease-out).
The audio engine initialises (microphone permission request on first launch, then DSP graph build).
A 3-note ascending arpeggio chime plays at low volume: E5, G5, C6 (the 3rd, 5th, and octave root of C major).
The splash dismisses to the main tuner.

The chime serves two purposes: it gives the user a distinctive audio identity ("the PeaceDrone audio logo") and it warms up the speaker DAC and microphone path before the user reaches the tuner, eliminating the click-or-pop that some iOS audio sessions produce on first activation.

Total chime duration is ~254 ms note onsets; the splash dismisses ~500 ms after the last note's release tail completes.

↑ Back to top

4.System architecture

SuperTuner is a SwiftUI app with an Objective-C++ bridge to the audio engine. The high-level dependency graph is small:

SwiftUI views (MainTunerView, SplashView, meter components)
        │
        ▼
AudioManager (Swift, ObservableObject)
        │
        ├──► FrequencyDetector (Swift) ──► PitchEstimator + SpectralHarmonicFingerprint
        │
        ├──► ToneGenerator (Swift, AVAudioSourceNode render callback)
        │
        └──► AudioEngine (Swift) ──► SuperTunerBridge (ObjC++) ──► AVAudioEngine + RemoteIO

The bridge layer is intentionally thin. It exists for one reason: AVAudioEngine requires careful configuration (session category, preferred I/O buffer duration, hardware sample-rate negotiation, render-thread-safe buffer pooling) that is cleaner to express in Objective-C++ with direct access to CoreAudio C structures than to wrestle out of pure Swift.

Above the bridge, Swift owns:

The detection state machine: ring buffer, smoothing filters, latch logic, display throttling.
The synthesis engine: voice management, oscillators, filters, voicing chain, limiter.
The UI state: currently-selected preset, reference pitch, mode, latched ribbon keys.

Below the bridge, CoreAudio owns:

The real-time render thread and its strict no-allocation, no-locking guarantees.
The microphone input tap and hardware route management.

There are no external libraries used for DSP. Every line of audio processing, including pitch detection, synthesis, filtering, and limiting, is implemented in-tree, by hand, primarily in Swift (with Apple's Accelerate framework / vDSP used for vectorised inner loops in the autocorrelation core).

↑ Back to top

5.Audio engine

Configuration

Default sample rate: 48 000 Hz, with hardware-rate negotiation on engine start.
I/O buffer duration: 32 ms (1536 frames at 48 kHz). This balances responsiveness against battery consumption; it's the buffer size that gives the detector enough samples per analysis window without spinning the CPU more than necessary.
AVAudioSession category: PlayAndRecord with MixWithOthers and DefaultToSpeaker options. This means SuperTuner does not duck or mute other audio (so the user can have a metronome or backing track playing while they tune), and the reference tone outputs through the built-in speaker by default rather than the earpiece.
AVAudioSession mode: Default (not Measurement). This permits simultaneous I/O without the harsher gain and routing constraints of measurement mode.
Haptics: Enabled during recording (iOS 13+), so the ribbon's per-touch haptic taps work even while the microphone is open.

Lifecycle

Initialise: build the AVAudioEngine, configure the session, attach the tone-generator source node, install the input tap. This step does not start the engine; if anything fails, the function returns false and the UI surfaces the failure cleanly.
Start: boot the engine, begin pulling audio frames from the microphone and dispatching them to the detector.
Stop: halt the engine. The detector's 200 ms silence timeout subsequently releases any held display latch.
Shutdown: full cleanup, including an explicit FrequencyDetector.reset() so no stale "last detected" note freezes on screen if the user comes back later.

Render-thread safety

The bridge maintains a pre-allocated pool of three AVAudioPCMBuffer instances and rotates through them on the input tap, so the render callback never has to allocate memory. Frame dispatch to the detector happens via a serial dispatch queue, not the render thread directly, so the detector's Swift code can do its work (including allocations and array creation) without violating real-time constraints.

Interruption and route changes

The tone generator subscribes to AVAudioSession.routeChangeNotification and updates an internal usingBuiltInSpeaker flag. This drives a route-aware voicing chain (see section 8): when audio is routed to the built-in speaker, a 4th-order high-pass at 140 Hz engages to protect the speaker from cone-excursion distortion. When the user plugs in headphones, switches to AirPlay, or connects Bluetooth/USB audio, the high-pass disengages and the full bandwidth is restored. The switch is automatic and silent.

↑ Back to top

6.Pitch-detection pipeline

This is the heart of the app and the part that most differentiates it from typical "just call FFT" tuner code. The detector is designed around three goals:

Accuracy under noise. Real-world tuning happens in living rooms, rehearsal spaces, and stages with ambient noise. The detector tolerates considerable noise without losing its lock.
Robustness to weak fundamentals. Many real instruments, such as open low bass strings, a sax low Bb, or a cello C2, have a fundamental that's quiet or partially masked by the room. The detector recovers the perceived pitch from upper harmonics when needed.
Snappy attack, stable sustain. The display reacts immediately to a fresh attack but does not jitter during a sustained note.

Signal flow

microphone samples (16-bit / float)
       │
       ▼
ring buffer (2048 samples, 512-sample hop)
       │
       ▼
latch-aware Butterworth low-pass (only if latched to ≤400 Hz fundamental)
       │
       ▼
HybridEstimator (YIN + per-lag-normalised autocorrelation)
       │
       ▼  (if both estimators fail to agree)
Spectral Harmonic Fingerprint rescue
       │
       ▼
harmonic gate (enforced on new notes only)
       │
       ▼
display latching + cents-offset smoothing
       │
       ▼
UI update (throttled to display rate)

Each stage exists to handle a specific kind of failure mode that the simpler stages can't.

Time-domain core: autocorrelation and YIN

Pitch detection happens primarily in the time domain via two estimators that run on the same windowed signal.

Per-lag normalised autocorrelation: Computed as Σ x[i]·x[i+L] / sqrt(E1·E2), where E1 and E2 are the segment energies at the lag-zero and lag-L positions. This gives a value in [-1, 1] that is invariant to overall signal amplitude. The first peak that exceeds 90% of the global maximum wins. The first peak rather than the largest, because choosing the largest can bias toward octave-down errors. Harmonic coherence is then scored as a weighted sum of autocorrelation values at lags 2L, 3L, 4L, 5L, which gives a quality measurement separate from peak height.

YIN: A specific algorithm in the autocorrelation family that is sharper than naive autocorrelation at the cost of being more sensitive to noise. The implementation:

Computes the difference function d[τ] = E1 + E2 − 2·R[τ] using vectorised vDSP.
Converts to the cumulative mean normalised difference (CMND): d'[τ] = d[τ] / ((1/τ)·Σ d[j]).
Looks for the first τ where the CMND dips below an absolute threshold of 0.15 and sits at a local minimum.
If no candidate passes, applies a relaxed threshold of 0.40 to the global argmin as a rescue.
Refines the chosen τ with 3-point parabolic interpolation for sub-sample precision.

YIN returns a confidence score equal to 1 − CMND[τ_selected], scaled into [0, 1] where 1.0 represents perfect periodicity.

Latch-aware threshold relaxation: When the detector is already latched to a note, YIN searches a narrow window around the expected lag with a relaxed threshold of 0.25 (instead of 0.15). This lets the detector track a note through its natural decay without lowering the gate for new notes. The wider threshold only applies in the window around the latched lag.

The hybrid combiner

Below about 95 Hz, YIN's CMND normalisation becomes unreliable. There are simply fewer cycles per analysis window to compute meaningful statistics on. Autocorrelation does not have this problem (it's a structural measurement, not a statistical one) but is more vulnerable to room noise. The detector combines them across three frequency bands:

Band	Strategy
Above 95 Hz	Trust YIN directly. Fastest response, snappiest attack.
65 to 95 Hz (soft)	Accept YIN if confidence ≥ 0.88; otherwise require autocorrelation to agree on MIDI class.
Below 65 Hz (strict)	Require both YIN and autocorrelation to return a result and round to the same MIDI note.

This means that at high frequencies, including guitar and most things above bass, the detector is snappy. At very low frequencies, including the bottom of bass, the lowest notes of orchestral strings, and tuba, it is deliberately more cautious, requiring agreement across two independent measurement methods before declaring a pitch.

The latch-aware low-pass filter

When the detector is locked onto a note at or below 400 Hz, a 2nd-order Butterworth low-pass filter is applied to the analysis window before YIN runs. The filter:

Has a cutoff of 12× the latched fundamental, clamped to 4 kHz maximum.
Runs forward then backward over the buffer (zero-phase), which doubles the effective rolloff to −24 dB/octave.
Is bypassed entirely for notes above 400 Hz (where high-frequency noise isn't the problem).
Is only applied to the YIN input, not to the Spectral Harmonic Fingerprint rescue path (which needs the raw upper spectrum).

For a latched A2 (110 Hz), the cutoff sits at 1320 Hz: high enough to pass the first ~12 harmonics, low enough to substantially attenuate the broadband noise that would otherwise raise YIN's CMND at long lags and de-lock the detector.

This is the single largest behavioural improvement over a vanilla YIN implementation: tracking through bass decay in noisy environments.

Spectral Harmonic Fingerprint rescue

When both YIN and autocorrelation fail to agree on a pitch, a third path runs. It's specifically designed for situations where the fundamental is masked or absent but the upper harmonics are clearly present. For instance, a low bass note where the room's noise floor sits above the fundamental's amplitude, or a sax low Bb where the body resonance suppresses the actual lowest partial.

The implementation uses Goertzel single-frequency detectors (essentially per-frequency IIRs that compute a DFT bin at O(N) instead of O(N log N)) to probe harmonics 2 through 8 of each candidate pitch. For each candidate:

SNR at each harmonic is measured against the ambient power half a semitone away (the "noise reference band"). Harmonics must be 3× above the noise band to count as present.
Power-line rejection discards probes within ±3 Hz of 50 Hz, 60 Hz, and their harmonics. Up to three out of seven probes may fall in power-line bands before the candidate is rejected entirely. This prevents a noisy power line from being identified as a note.
Decay validation checks that each higher harmonic is no more than 2× the amplitude of the one below it. Real instrumental harmonics decay; rising harmonics indicate noise, not a note. Up to two violations across the 7-harmonic stack are tolerated.
Composite coherence score is a weighted sum: 40% × (number of present harmonics / 7) + 40% × clamped total SNR + 20% × (1 − decay-violation penalty).
Temporal persistence boost adds 0.06 per prior frame the same MIDI candidate appeared in a rolling 8-frame window, capped at +0.25. Sustained candidates are more trustworthy than one-shot blips.

The minimum acceptance score is 0.35. Candidates that pass go forward as the detected pitch.

The persistence-and-decay validation is the part that lets the rescue work reliably in messy real-world recordings rather than just clean test tones.

The harmonic gate (for new notes only)

After a pitch candidate is selected, one final gate runs before the candidate is accepted as a new note (already-latched notes bypass it, since their harmonics weaken naturally during decay):

Frequency	Required harmonic evidence
< 80 Hz	≥ 2 harmonics present or harmonic score ≥ 0.35
80 to 130 Hz	≥ 1 harmonic present or harmonic score ≥ 0.20
130 to 200 Hz	≥ 1 harmonic present or harmonic score ≥ 0.15
> 200 Hz	No gate (YIN + AC agreement is sufficient)

Strong autocorrelation peaks (≥ 0.90) also bypass the gate. These come from pure-sine sources like a tone generator, where harmonic evidence is genuinely absent and shouldn't be required.

The gate rejects spurious low-frequency "notes" that have no harmonic structure behind them: HVAC rumble, mic-stand thumps, table bumps.

Smoothing and noise floor

Frequency smoothing: Volume-weighted alpha blending. Large jumps (>6%) snap immediately so the display tracks fresh attacks. Small jumps (<2%) get heavy smoothing for stable sustain.
Noise floor: A rolling ambient-level tracker with asymmetric attack/release (fast on quiet, slow on loud) keeps the detector calibrated to the room. The effective noise floor is ambient + 12 dB, clamped to [-60, -30] dB. There's no hard threshold gate. The autocorrelation accept threshold rejects noise structurally, not by level.
Cents smoothing: Computed relative to the latched note (not the nearest chromatic semitone), so the cents reading doesn't drift mid-tuning when the player approaches centre.

Display latching

The displayed note is held stable by a small consecutive-detection state machine. By default, three consecutive detections of a new candidate are required before the display switches. For octave-distance jumps (≥11 semitones) the threshold rises to eight detections. This resists false octave-drop errors caused by decay enriching the second harmonic relative to the fundamental.

The preset further biases the threshold:

If the current latched note is in the preset and the proposed new note is an octave above (and not in the preset), the threshold rises to 12 (highly resistant to false octave-up errors).
If the current latched note is not in the preset but the proposed octave-down candidate is in the preset, the threshold drops to 3 (eager to fall into a preset note when the data supports it).

A 200 ms silence timeout releases the display when no pitch has been detected; the screen returns to ---.

↑ Back to top

7.Reference tone synthesis and the ribbon controller

The polyphonic ribbon plays sustained reference tones for any note in the active tuning preset. The synthesis engine and the touch surface are independent layers.

Voice architecture

Each voice in the engine is a subtractive synthesis chain:

PolyBLEP band-limited sawtooth oscillator
        │
        ▼
2-pole state-variable lowpass filter (Andrew Simper TPT/ZDF form)
        │
        ▼
soft-clip saturation  (x / (1 + |x|))
        │
        ▼
ADSR-style fade envelope (attack, sustain, release)

The oscillator is a sawtooth, a tone rich in odd and even harmonics. A naive saw aliases badly when its fundamental exceeds about 1/4 the Nyquist limit; the implementation uses PolyBLEP to band-limit the discontinuity at the wrap point, eliminating audible aliasing across the entire musical range.

The filter is a state-variable lowpass with:

Q = 1/√2 = 0.707… (Butterworth response: maximally flat passband, no resonant peak).
Cutoff = 2× the voice's fundamental, clamped between 220 Hz and 5000 Hz, and never above 45% of Nyquist.

The 2× key-tracking ratio means the timbre stays consistent across the musical range. The relative balance between fundamental, second harmonic, and upper harmonics is the same on a high E5 as on a low E2. The 220 Hz minimum cutoff is a deliberate choice for bass voices: the fundamental of an E1 (41 Hz) sits well below the cutoff and the filter passes harmonics 2 through 5 (82, 123, 164, 205 Hz) into the speaker's good band, where the brain reconstructs the perceived pitch via the missing-fundamental psychoacoustic effect.

The saturator is a soft-clip x/(1+|x|), providing odd-harmonic warmth without harsh clipping artefacts. At the voice amplitude used (1.4), saturation is gentle: essentially linear at normal levels, contributing harmonic warmth only during peaks.

Polyphonic voice management

The engine maintains 8 voice slots but caps simultaneously playing voices at 3. The remaining 5 slots are reserved for decaying release tails, so triggering a fresh voice never hard-kills another voice that is still audibly fading.

Voice stealing (when all idle slots are full): the engine takes the quietest decaying voice (lowest current fadeGain) and resets it to play the new note. Phase, filter state, and envelope are all reset on the steal so there's no carry-over from the previous voice.

Click-free same-frequency retrigger: if a voice is mid-release and the same frequency is requested again (the user re-taps the same key while it's still decaying), the engine lifts the fadeGain back into the attack phase without resetting phase or filter state. The result is a continuous, click-free re-trigger.

Envelope timing:

Attack: 25 ms (snappy)
Release: 250 ms (strum-like)

These are short enough to feel responsive on the ribbon but long enough that releases sound musical rather than abrupt.

The ribbon touch surface

The ribbon is built on a UIKit UIView with isMultipleTouchEnabled = true, wrapped in a SwiftUI UIViewRepresentable. SwiftUI's standard DragGesture does not handle multi-touch cleanly. Adding a second finger can end or modify the first finger's gesture in subtle ways. The custom UIKit layer eliminates this.

Each touch is identified by a fresh UUID from the moment it lands. Callbacks fire on the main thread:

onBegan(id, position): a finger went down
onMoved(id, position): a finger moved (only on actual movement)
onEnded(id, startPos, endPos): a finger lifted normally; both start and end positions are passed so the receiver can tell tap from drag
onCancelled(id): a system interruption (alert, palm rejection) cleared the touch; never treated as a latch

No UIGestureRecognizer is involved on the ribbon, so there is no recognizer arbitration that could steal one finger's behaviour because of what another finger is doing.

Latch logic

The ribbon UI maintains two independent state collections:

activeRibbonTouches: [UUID: Int]: currently-held touches and which key each is over
ribbonLatchedKeys: [Int]: explicitly latched keys, in oldest-first order

A key's voice is audible if any touch is on it or it's in the latched list. The voice starts when the user count goes from zero to one, and stops when it returns to zero.

Latch detection is gated on three conditions, all of which must hold:

The touch began on the visible note-label region (not the chrome above).
The touch ended on the note-label region.
Total movement during the touch was less than 10 points.

If all three hold, the touch is interpreted as a tap on label and toggles the latch state for that key. Otherwise the touch is a strum (transient sustain that ends when the finger lifts).

Latching a currently-touched key does not add a new voice. It converts the existing transient voice into a sustained one. Voice stealing therefore fires only when a genuinely new distinct key enters the active set (via a new touch or a drag onto a different key), never when an existing touch is being converted to a latch. This is what allows three keys to be latched simultaneously without losing one to a phantom steal.

Voice stealing logic at the ribbon level

When adding a new distinct active key would push the count to four, the UI calls stealOldestRibbonLatchedIfAtCap(). This removes ribbonLatchedKeys.first (the oldest latched entry) and stops the associated voice if no touch is still holding it. The visible "lit" state of the displaced key clears immediately, so the UI never lies about what is actually playing.

This mirrors the audio engine's voice cap exactly. The audio engine would steal a voice anyway; doing it explicitly in the UI keeps the visual state consistent with what the speaker is producing.

↑ Back to top

8.Output stage and speaker voicing

The polyphonic mix is processed through a four-stage output chain before reaching the device's audio output:

voice mix
   │
   ▼
polyphony AGC  ──►  speaker HPF (if route = built-in speaker)
   │                       │
   │                       ▼
   │                speaker LP (if route = built-in speaker)
   │                       │
   └────────────►─────────┴────►  look-ahead brick-wall limiter  ──►  output

Polyphony AGC

A simple 1 / max(1, weightedActiveCount) gain reduction, where the count weights each voice by its current envelope amplitude (so fading voices contribute proportionally less). Smoothed with a 20 ms attack and 200 ms release time constant.

The purpose: when the user latches a chord, the unsmoothed sum of three sawtooth waves can hit the limiter hard, producing audible pumping as the limiter reacts to the beat patterns between voices. The AGC pre-reduces the gain proportionally to the active voice count, keeping the limiter mostly inactive and eliminating the pumping artefact.

Speaker-bypass high-pass

When audio is routed to the built-in iPhone or iPad speaker, a 4th-order Linkwitz-Riley high-pass at 140 Hz (two cascaded 2nd-order Butterworth biquads, each at Q = 1/√2) is engaged in series with the output. This is bypassed when audio is routed to headphones, AirPlay, Bluetooth, or USB. Those routes can reproduce the full bandwidth, and the high-pass would only diminish them.

The reason: iPhone and iPad built-in speakers have a sharp acoustic rolloff below ~200 Hz, and trying to push high-amplitude signal into that band causes cone-excursion distortion that sounds buzzy and unpleasant. The 140 Hz cutoff is a deliberately gentle compromise: it rolls off the deep sub-bass that would cause excursion, but passes the 140 to 280 Hz band (where the speaker can still produce sound, even if attenuated) so bass notes' second and third harmonics get through. Combined with the missing-fundamental psychoacoustic reconstruction described in 7.1, this lets bass voices on built-in speakers be audible and pitch-clear without distortion.

The high-pass engages and disengages automatically based on AVAudioSession.routeChangeNotification. The switch is silent.

Look-ahead brick-wall limiter

The final stage. A 64-sample look-ahead buffer (~1.33 ms at 48 kHz) tracks the peak amplitude of the next 64 samples. When a peak would exceed 0.95 of full scale, the gain is smoothly reduced to bring it under, with a hold of 64 samples on each new peak (so the gain release doesn't chase fast transients).

Parameters:

Ceiling: 0.95 (5% of full-scale headroom)
Attack: ~0.1 ms (very fast: catches new transients before they exceed the ceiling)
Release: ~80 ms (musical, no pumping)

The limiter is a safety net, not a creative effect. With the polyphony AGC working upstream, the limiter is mostly inactive during normal playback. It exists to guarantee the output never clips, even on edge cases like rapid retrigger or large voice-stealing transients.

↑ Back to top

9.Visual meters

This section documents the rendering pipelines of the three primary meters and three sub-meters for completeness; user-facing behaviour is described in sections 3.2 and 3.3.

The CRT chassis shape

Every meter and panel is housed in a CRTDisplayShape, a single Shape defined by a split-cubic-Bezier arch at the top, vertical sides angled inward by 13% width, and tangent-arc rounded corners at the bottom. Key geometric parameters:

Arch depth: 56% of the shape's height
Arch apex: 30.4% above the shoulders
Side inset: 13% of width
Bottom-corner radius: 7% of width

The shape exposes its tangent vectors at every point on the arch, used by the LED meter and strobe meter to align their content with the chassis curvature.

LED arc meter

A Canvas-backed render that lays out 19 LED-shaped polygons (9 left red, 1 centre green, 9 right red) along an arc inside the chassis. The arc-length parameterisation is computed once and cached: subsequent frames just light up the appropriate segments based on the smoothed cents offset and the in-tune flag. This is the cheapest meter in the app to render per frame.

Note wheel

The chromatic note wheel has been redesigned to share the chassis arch curve mathematically:

Rim lines (the two horizontal arc lines at the top) are drawn directly from the chassis arch path with a small vertical translation. They are literally the chassis arch shape, just shifted down.
Label baselines sit on a parallel translation of the same chassis arch, with letters drawn at theta-derived X positions that are then projected onto the radial line from a virtual wheel centre below the canvas. This ensures each letter and its corresponding centre tick lie on the same straight radial spoke, not just at the same X.
Tick fan radiates from the virtual centre (at 2× the canvas height below the apex), so ticks tilt outward at the edges in a natural wheel-spoke pattern.
Tick groups: each note has a group of 5 ticks: short, medium, long centre, medium, short. The boundaries are shared between adjacent groups. The letter sits directly above the long centre tick.

The wheel rotates continuously as pitch is detected; at exact in-tune, the rotation pauses and the target note glows white.

Per-device tuning: only the visible angular sweep (halfArc = 0.42 on iPad, 0.50 on iPhone) differs between devices. All offsets are identical. The chassis arch is the same shape on both, so the layer hierarchy needs the same proportions to clear visual overlaps.

Stroboscopic meter

Four bands following the chassis arch curve. Each band's vertical position cycles based on the phase deviation of its respective harmonic (1st, 2nd, 3rd, 4th of the target). When phase deviation is zero, motion is frozen.

The phase velocity is capped at 120 rad/sec to stay below the photosensitive epilepsy risk band (3 to 30 Hz visible flicker). At full strobe motion, the visible flicker rate is ~19 Hz: well within the safe zone.

Lower-panel sub-meters

Chromatic strip renders a 9-note window via an HStack of StyledNoteText views, with two Canvas-drawn triangle indicators that move horizontally and turn solid white at in-tune.
Info bar is a 3-column HStack of Text views with shadow layers for the CRT-glow effect.
Fine-tune ruler is a single Canvas drawing the graduation marks and the moving indicator each frame.

All sub-meter content uses metrics.subMeterFontScale (1.4× on iPad, 1.0× on iPhone) so the readout is legible at iPad's larger physical cell size without forcing iPhone to render comically large text.

↑ Back to top

10.Instrument presets and reference pitch

These are described in sections 3.6 and 3.7 from the user's perspective; this section adds technical detail.

Preset internal representation

Each TuningOption stores:

A canonical name (e.g. "Standard EADGBE")
A display name override (for the info bar)
An array of MIDI note numbers representing the open strings

The detector consumes this as Set<Int> for O(1) membership checks. At every detection step, the target note is the nearest member of this set to the detected pitch, which is what the cents-offset readout and the chromatic-strip / fine-tune-ruler display are computed against.

The Open/Chromatic preset returns nil for the allowed-notes set, which disables the constraint and computes target-note as the chromatic nearest semitone.

Reference pitch internals

The reference pitch is stored in two places:

@AppStorage("referencePitchPreset"): the name of the active preset (e.g. "432", "free")
@AppStorage("referencePitchHz"): the current Hz value (defaults to 440.0)

On engine start, the detector reads referencePitchHz and calls frequencyDetector.setReferencePitch(_:), which recomputes the MIDI-to-frequency mapping table from that root. This means changing the reference pitch is essentially instantaneous and does not require restarting the engine.

The free slider has a range of 400 to 480 Hz with 0.1 Hz step granularity. Changes propagate to the detector in real time.

No tuning data is bundled as a separate file

The full preset catalog (all 13 groups, 100+ tunings) is statically defined in InstrumentPreset.swift. Adding a new preset is a code change, not a data file change. This is deliberate. There is no preset import/export, no community sharing, no remote update mechanism. The app's preset list is exactly the list shipped in the binary.

↑ Back to top

11.Adaptive layout system

The layout system is defined in LayoutMetrics.swift, which computes proportional dimensions from the current viewport's actual width and height, not from the device class. This means iPad Split View / Slide Over windows that shrink below the iPad layout's width threshold automatically fall back to the iPhone layout, which scales down cleanly.

Reference design: iPhone 14 (390 × 844 pt). All proportional metrics are computed as multipliers of min(widthScale, heightScale) of the actual viewport relative to this reference.

Key clamps:

General UI scale: clamped to [0.85, 1.6] (iPhone SE doesn't shrink too small; iPad Pro doesn't balloon too large)
Meter scale: clamped to [0.85, 1.8] (more aggressive, because the LED meter is the hero element)
iPad sub-meter font scale: 1.4× (chromatic strip, info bar, fine-tune ruler, note wheel labels and ticks)

Layout selection thresholds:

useTwoColumnLayout: iPad AND landscape AND screenWidth ≥ 700 pt
useIPadPortraitLayout: iPad AND portrait AND screenWidth ≥ 600 pt
useIPhoneLandscapeLayout: otherwise landscape
Else: iPhone portrait

↑ Back to top

12.Performance characteristics

These numbers come from on-device measurement on a representative range of supported hardware.

CPU baseline (engine off): <1% on every supported device.
CPU during active tuning: 3 to 8% on iPhone 14 / iPad Pro, peaking briefly during attack transients. Below 12% on iPhone SE (1st generation).
Memory footprint: <40 MB resident.
Audio latency (input to display): typically 32 to 96 ms end-to-end, dominated by the audio I/O buffer (32 ms) and the detection window hop (10 to 20 ms).
Audio latency (touch to tone): ~32 to 64 ms (one to two audio buffers).
Frame rate: 60 fps on all supported hardware. The animated meters (note wheel rotation, strobe band motion) update on Canvas redraws driven by the detector's state changes; no animation loops run when the detector is silent.

Supported devices:

iPhone SE (1st generation) and later
iPad (5th generation) and later
iPad mini (5th generation) and later
iPad Air (3rd generation) and later
iPad Pro (all generations supported)

Minimum OS: iOS 16.6 / iPadOS 16.6.

↑ Back to top

13.Engineering decisions and rationale

This section documents the why behind the major architectural choices. It's useful for engineers evaluating SuperTuner against alternatives, and for understanding why a few design constraints were made.

Why autocorrelation over FFT for the pitch core

FFT-based pitch detection (find the spectral peak, snap to the nearest semitone) has two specific failure modes: spectral leakage from non-integer bin frequencies introduces a small but consistent sharp bias, and the bin spacing limits resolution unless the FFT size is very large. Autocorrelation measures periodicity directly in the time domain. There is no bin grid, no window function, no leakage bias. With 3-point parabolic interpolation, sub-sample lag precision is straightforward.

Why both YIN and per-lag-normalised autocorrelation

They fail differently. YIN is sharp and confident on clean signals but its CMND inflates in noisy environments, especially at low frequencies. Per-lag-normalised autocorrelation is more noise-tolerant but less sharp, with a tendency to favour octave-down errors on harmonic signals. Running both and combining their verdicts gives the snappy attack of YIN with the noise tolerance of AC.

Why a custom Spectral Harmonic Fingerprint rescue

In real-world recording (rooms with HVAC, fluorescent lights, instruments with body-resonance suppression of their own fundamental), the fundamental is sometimes simply not present in the spectrum. A YIN-only detector loses tracking in these cases even though a musician would clearly hear "that's a low E". The SHF rescue measures the upper harmonics (2 through 8), tests them for consistency with a missing-fundamental hypothesis, and reports the inferred pitch. It is gated tightly enough (3× SNR per harmonic, decay-violation limit, temporal persistence requirement) that it does not produce false positives on noise.

Why a subtractive synthesis engine (rather than additive)

Earlier versions used additive synthesis (a manual sum of 10 sine harmonics with per-harmonic amplitude shaping). The downside was a "thin, digital-reed" character that sounded harsh on small speakers. Subtractive synthesis (a single PolyBLEP saw through a tuned lowpass filter) produces a continuously-varying harmonic balance as a function of cutoff vs. fundamental, which is more pleasant tonally and naturally limits high-frequency energy on the lowest voices.

Why a 3-voice cap rather than unlimited polyphony

Three voices give comfortable chord coverage (root + third + fifth or seventh) without the engine straining the polyphony AGC or the limiter. More voices would require more aggressive gain reduction, which would either pump or clip. Three was the largest count that consistently sounded clean across the iPhone speaker.

Why 140 Hz for the speaker high-pass (was 280 Hz, then 220 Hz, now 140 Hz)

iPhone and iPad speakers reproduce content from ~140 Hz upward usable, with severe rolloff below. A 280 Hz high-pass removed all bass, including harmonics that the speaker could have produced. The 140 Hz cutoff is the compromise that passes the speaker's full usable band while still removing the sub-bass that would cause cone excursion and audible distortion. The filter only engages on built-in speaker routes; headphones, AirPlay, and Bluetooth/USB are not affected.

Why a polyphony AGC and a limiter (rather than just a limiter)

A standalone brick-wall limiter tracks the envelope of the polyphonic signal. With three sawtooths summed, the envelope has clear pumping at the beat frequencies between the voices, which the limiter reacts to audibly. The pre-AGC reduces the gain in a content-aware way (proportional to active voice count, weighted by envelope amplitude), keeping the limiter mostly idle and eliminating the pumping artefact.

Why a Power button (rather than just stopping the audio session)

Two reasons:

Thermal and battery. Long tuning sessions in a hot room can cause sustained CPU usage to spool up the device thermal envelope. The Power button fully halts the engine so CPU drops to idle.
False positives. Leaving the tuner running while not actively tuning (e.g. between songs in a long set) can pick up audience noise, instrument cable handling, and other sounds that produce nuisance latches. Power off, then on when ready.

Why a custom UIKit multi-touch surface for the ribbon (rather than SwiftUI gestures)

SwiftUI's DragGesture(minimumDistance: 0) does not provide consistent multi-touch semantics. Specifically, adding a second finger to a ribbon that already has a held finger can end or modify the first finger's gesture in ways that depend on the gesture-recognition arbitration between the body's drag gesture and the label's tap gesture. In SuperTuner ≤ 1.1.3 this caused a real bug where adding a second finger anywhere on the ribbon would unexpectedly latch the held note.

The custom RibbonTouchSurface (UIKit UIView with isMultipleTouchEnabled = true, wrapped via UIViewRepresentable) eliminates the gesture-recognition layer entirely. Touches arrive directly at the view's touchesBegan/Moved/Ended/Cancelled methods, identified by their UITouch references, with no arbitration possible. Each finger is independent in a way SwiftUI cannot match.

Why the note-wheel curves all match the CRT chassis arch

In versions of the design where rim and label curves were near-but-not-quite identical (e.g. rim drawn from the chassis Bezier, labels drawn on a circle), the small mismatch was visible at the corners as a wobble between layers. The current implementation draws rim, label baselines, and tick tops as exact translations of the chassis arch, with label X positions projected onto the same radial spoke their centre tick uses. The result is that every visible curve in the wheel is mathematically the chassis arch curve, just translated. There is no curve mismatch to perceive.

↑ Back to top

14.Glossary

ADSR: Attack, Decay, Sustain, Release. The four phases of a synthesiser's amplitude envelope. SuperTuner uses a simplified attack/sustain/release envelope (no separate decay).

AGC: Automatic Gain Control. A processor that reduces signal gain in response to input level, used here to pre-attenuate polyphonic content so a downstream limiter doesn't pump.

Autocorrelation: A mathematical operation that measures how similar a signal is to a delayed copy of itself. Used to find periodic structure (i.e. pitch) without a frequency-domain transform.

Cents: A logarithmic unit of pitch. 100 cents = one semitone. A "5 cents flat" pitch is one-twentieth of a semitone below the target, which is quite tight, comparable to the precision required for orchestral string tuning.

CMND: Cumulative Mean Normalised Difference. The core measurement YIN uses to find the period of a signal. Lower CMND means stronger periodicity.

Goertzel algorithm: An efficient single-frequency DFT. Computes the magnitude at a chosen frequency in O(N) time instead of computing the entire spectrum.

Harmonic: A frequency that is an integer multiple of a fundamental. The 2nd harmonic of 440 Hz is 880 Hz, the 3rd is 1320 Hz, and so on. Real instruments produce a fundamental plus a series of harmonics, in proportions that define the instrument's timbre.

Linkwitz-Riley filter: A filter topology made by cascading two Butterworth filters of half the desired order. A 4th-order Linkwitz-Riley high-pass is two cascaded 2nd-order Butterworth high-passes; it has a −24 dB/octave rolloff with no resonant peak.

Look-ahead limiter: A peak limiter that delays the audio signal by a small amount (here, 64 samples ~ 1.33 ms) while peak-detecting the future signal. This lets the limiter begin reducing gain before a peak arrives, eliminating overshoot.

Missing-fundamental effect: A psychoacoustic phenomenon in which the human brain perceives the pitch of a tone from its harmonic series alone, even when the actual fundamental frequency is absent. Used here to make bass voices audible through a small speaker that physically can't reproduce the fundamental.

MIDI note number: A standard integer-based pitch representation where 60 is middle C, 69 is A4 (440 Hz at standard tuning), and each integer step is one semitone. Used internally for all pitch logic.

PolyBLEP: Polynomial Band-Limited Step. A small correction added to naive oscillator waveforms (saw, square) at their discontinuity points to suppress aliasing. Cheap and effective; the implementation here is a 4-sample polynomial correction.

SHF: Spectral Harmonic Fingerprint. SuperTuner's name for its third-line rescue detector, which probes upper harmonics via Goertzel and scores them for harmonic-series consistency.

SVF: State Variable Filter. A filter topology that gives access to lowpass, highpass, bandpass, and notch outputs from a single internal state. The implementation here uses Andrew Simper's TPT/ZDF (Topology-Preserving Transform / Zero-Delay Feedback) form, which preserves the filter's analogue response accurately at high cutoff frequencies near Nyquist.

vDSP: Apple's vectorised signal-processing library, part of the Accelerate framework. Used in SuperTuner's autocorrelation and difference-function calculations to take advantage of NEON SIMD.

YIN: A specific pitch-detection algorithm in the autocorrelation family, named after the Chinese yin-yang symbol because it pairs (yin = soft, sustained) and (yang = sharp, transient) extraction. Published by Cheveigné and Kawahara in 2002; widely used in music software.

This document is maintained by PeaceDrone LLC. Corrections and clarifications welcome.

↑ Back to top