Quote top-level string scalars whose bare form re-parses as a container#2656
Quote top-level string scalars whose bare form re-parses as a container#2656barry3406 wants to merge 1 commit intomikefarah:masterfrom
Conversation
When UnwrapScalar is enabled (the default for yaml output), the yaml encoder writes node.Value verbatim as a bare line. Any string whose content is itself a valid YAML mapping, sequence, or alias then round trips as that container instead of as a string. For example, the input document `"this: should really work"` previously re-emitted as the bare line `this: should really work`, which the next reader parses as a one key map, destroying the original scalar. The same problem surfaces whenever a multiline string literal happens to contain `key: value` lines, which is the form the bug report uses for its second reproducer. Guard the fast-path by re-parsing node.Value with yaml.v4: if the bare form decodes to a non-scalar, fall through to the regular encoder so it can apply the quoting style required by the YAML spec. The check is limited to `!!str` nodes and to structural reinterpretations, so tag expressions such as `!!int` and plain strings that re-read as integers, booleans, or nulls are unaffected. An unparseable value (e.g. one containing NUL) stays on the fast-path so downstream NUL-aware writers still see the raw bytes. Updates the base64 "decode yaml document" scenario whose expected output was `a: apple\n` bare; it is now emitted as a block literal, which round-trips back to the same string. Reproducer: ``` printf '"this: should really work"\n' | yq -p yaml -o yaml ``` Before this fix the second run of yq parses the output as a map; after, it remains the original string. Fixes mikefarah#2608
| // The decoded payload ("a: apple\n") would re-parse as a map if | ||
| // emitted bare, so the yaml encoder keeps it as a block literal to | ||
| // preserve roundtrip safety. See issue #2608. | ||
| expected: "|\n a: apple\n", |
There was a problem hiding this comment.
This is what worries me:
a) how would you now emit this as a bare string if you wanted to;
b) this change could break a whole bunch of scripts / CI/CD pipelines for people
There was a problem hiding this comment.
Yeah that's a valid concern. A couple thoughts:
(a) Users can still get the raw value with -o=raw / --outputformat=raw, which bypasses the yaml encoder entirely. So there is an escape hatch.
(b) On breakage — this only kicks in for strings whose bare form re-parses as a map or sequence (like a: b or - item). Plain strings, numbers, bools are untouched. So it's a pretty narrow change, but I get that even narrow changes can bite someone.
If you'd prefer a flag to opt into this behavior instead of making it the default, I can rework it that way.
The yaml encoder has a fast-path for unwrapping top-level scalars: when
UnwrapScalaris set (the default for yaml output) it writesnode.Valueverbatim and skips the regular encoder. The fast-path never checks whether the bare form round-trips, so any!!strwhose value happens to be a valid YAML mapping or sequence is silently reinterpreted on the next read.Reproducer:
Before this fix the second invocation parses the output as a single-key map, destroying the original string; the reporter's second example (
yq .fooon afoo: |block containinga: a/b: b) hits the same path through the literal style.Guard the fast-path with
bareStringNeedsQuoting, which re-parsesnode.Valuewith yaml.v4 and reports whether the result is something other than a scalar. When it returns true the encoder falls through to the normalencoder.Encodepath, which picks upLiteralStyleorDoubleQuotedStyleas appropriate. The check is deliberately narrow:!!strnodes are examined, so existing tag tests that emit!!int/!!stras a bare scalar are untouched;printer.gostill sees the raw bytes.The new table-driven test
TestYamlEncoderUnwrapScalarRoundtripSafetycovers the colon, dash, and multiline-map cases from the report and asserts that safe plain strings, digits,null, and!!intstill emit bare. Thedecode yaml documentbase64 scenario previously asserted the buggya: apple\nbare output; it now expects the block-literal form that round-trips back to the same string.Fixes #2608