Why it happens (likely)
- Some vocal styles are biased toward “warm-up” syllables in the intro.
- Open-ended intros with no explicit first phoneme invite filler.
- Long/soft first lines can get treated like “pre-chorus ambience”.
Primary fix: force the first consonant
Give the model a clean first strike: a hard consonant and clear diction instruction. The first sung sound matters more than a paragraph of rules.
Suppression wording you can reuse
[INTRO — NO VOCAL FILL] [no humming, no la-la, no mm-mm] [first lyric begins immediately on a consonant] [precise diction, crisp consonants] [no ad-lib syllables before line 1]
Structural trick: “cold open” boot line
Add a very short “cold open” line that starts with a consonant and ends quickly. It acts like a clapboard.
LINE 1 (cold open): "Click. Confirm. Begin." LINE 2 (normal verse starts): "Now the piano snaps, the snare cuts clean..."
If it still sneaks in
- Shorten the intro section header.
- Increase “precise diction / clear enunciation / no slur” instructions.
- Try changing the first word to start with
T,K,P,B,D. - Replace any leading ellipses or soft vowels with a hard start.
Next experiment
Test whether the model respects suppression better when the first line is a “command” line: short, percussive, and rhythmically tight.