Files
zrec/zrecins-opus-again.org
T
billmanz 4894ada68f Initial commit of the zrec function.
`zrec` exists on the internet now! For anyone who ever looks at
this, yes, this was entirely vibe-coded. Sorry, I saw a niche that
an LLM could help me fill. I find this to work for my usecase.
2026-06-18 09:14:05 -04:00

23 KiB

Help me update this script. It functions well, but I want to change one aspect. When prompted for a field that is a regexp type, I would like the type name to appear in parentheses instead of the long regex. For example, if I define a the phonenum type with "%typedef: phonenum regexp ^(:digit:{3})\-(:digit:{3})\-(:digit:{4})$" then use "%type: phone phonenum" I would prefer the displayed type to say (phonenum). The long regex is distracting and not helpful to me.

Code Review & Solution

This is a well-structured script with good documentation. Your requested change is reasonable—displaying the typedef alias name is far more user-friendly than dumping a raw regex into the prompt.

The Core Issue

Currently, get_field_type() resolves the alias to its underlying type string and discards the alias name. By the time prompt_field() receives the type, the friendly name (phonenum) is gone—only the regex remains.

To fix this cleanly, we need to preserve the alias name for display purposes while keeping the resolved type for validation. These are two separate concerns, so I recommend separating them.

Recommended Approach

I suggest adding a small helper that returns the display label for a field, and passing both a display label and the validation type into prompt_field(). This keeps validation logic untouched (lower risk) while giving you the display you want.

Step 1: Add a display-name resolver

Add this function near get_field_type():

# ---------------------------------------------------------------------------
# Determine a human-friendly type label for display purposes.
#   - If %type: names a base type, show that base type.
#   - If %type: names a %typedef: alias, show the alias name (NOT its
#     expanded definition), which is far more readable for things like
#     long regexp patterns.
#   Prints the label (may be empty if no type is declared).
# ---------------------------------------------------------------------------
get_field_type_label() {
    local block="$1"
    local field="$2"

    # Find the raw token(s) %type: declares for this field.
    local raw_type
    raw_type=$(echo "$block" | awk -v f="$field" '
        index($0, "%type:") == 1 {
            rest = substr($0, length("%type:") + 1)
            sub(/^[[:space:]]+/, "", rest)
            n = split(rest, parts, /[[:space:]]+/)
            names = parts[1]
            typedef = ""
            for (i = 2; i <= n; i++) {
                typedef = typedef (i==2 ? "" : " ") parts[i]
            }
            m = split(names, namearr, /,/)
            for (j = 1; j <= m; j++) {
                if (namearr[j] == f) {
                    print typedef
                    exit
                }
            }
        }
    ')

    [ -z "$raw_type" ] && return 0

    local base="${raw_type%% *}"
    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        # Concrete inline type: show it as-is (already concise).
        printf '%s' "$raw_type"
        ;;
    *)
        # Named alias: prefer the alias name itself for display.
        if [ -n "$(resolve_typedef "$block" "$base")" ]; then
            printf '%s' "$base"
        else
            printf '%s' "$raw_type"
        fi
        ;;
    esac
}

Step 2: Update prompt_field() to accept a display label

Add a fourth parameter used only for the prompt text:

prompt_field() {
    local field="$1"
    local is_mandatory="$2" # "yes" / "no"
    local ftype="$3"        # resolved type, used for validation
    local display_type="$4" # friendly label, used only in the prompt

    # Fall back to the validation type if no display label was provided.
    [ -z "$display_type" ] && display_type="$ftype"

    local label="$field"
    local tags=""
    [ "$is_mandatory" = "yes" ] && tags="${tags} [MANDATORY]"
    [ -n "$display_type" ] && tags="${tags} (type: ${display_type})"

    # ... rest unchanged ...

The validation block inside still uses $ftype (the resolved type), so phonenum validates against the underlying regex exactly as before.

Step 3: Update the three call sites in main()

In each of the three prompt locations, compute the label alongside the type:

# --- Allowed fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$f")
ftype_label=$(get_field_type_label "$block" "$f")
val=$(prompt_field "$f" "$mand" "$ftype" "$ftype_label")
# --- Mandatory fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$f")
ftype_label=$(get_field_type_label "$block" "$f")
val=$(prompt_field "$f" "yes" "$ftype" "$ftype_label")
# --- Additional free-form fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$newfield")
ftype_label=$(get_field_type_label "$block" "$newfield")
val=$(prompt_field "$newfield" "no" "$ftype" "$ftype_label")

Result

With your example:

%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/
%type: phone phonenum

The prompt now reads:

phone (type: phonenum):

…while validation still enforces the underlying regex.

Additional Observations (optional, not required for your change)

A few items worth noting for future maintenance:

  1. Lost exit status after command substitution (existing bug):

    validated=$(validate_field_type "$value" "$ftype")
    if [ $? -ne 0 ]; then

    local validated=$(...) would mask $?, but here you've correctly split the declaration from assignment in =prompt_field=—so this is actually fine. Worth keeping that pattern consistent if you ever refactor.

  2. Redundant awk invocations. get_field_type and the new get_field_type_label share nearly identical awk logic to extract the raw type. If this script grows, consider a single helper that returns the raw %type: token, with thin wrappers for "resolved" vs "display" forms—reducing duplication and keeping the two in sync.
  3. range display is already concise (range 1 100), so I deliberately left it as a concrete type in the label resolver. If you later add named range typedefs (e.g. %typedef: score range 1 100), the alias-name logic will handle them automatically and show (type: score).

These are minor and don't block your change. The three-step modification above achieves exactly what you asked with minimal risk to the working validation logic.

*

Code Review: Displaying Type Aliases Instead of Resolved Regex

Good request — showing (phonenum) is far more user-friendly than dumping a raw regex at the prompt. The challenge is that your current architecture resolves the alias too early: get_field_type returns the resolved base type (e.g. regexp /^.../), so by the time you build the prompt label, the original alias name is already lost.

The Core Issue

You have two competing needs:

  1. Validation needs the resolved type string (e.g. regexp /pattern/).
  2. Display wants the alias name (e.g. phonenum) when one exists.

The cleanest fix is to separate the display name from the validation type, rather than overloading a single return value.

Recommended Approach

1. Add a helper to get the raw (unresolved) declared type

Extract the lookup logic that's currently buried inside get_field_type so you can reuse it:

# Returns the raw type string as declared in %type: (may be an alias).
get_field_raw_type() {
    local block="$1"
    local field="$2"

    echo "$block" | awk -v f="$field" '
        index($0, "%type:") == 1 {
            rest = substr($0, length("%type:") + 1)
            sub(/^[[:space:]]+/, "", rest)
            n = split(rest, parts, /[[:space:]]+/)
            names = parts[1]
            typedef = ""
            for (i = 2; i <= n; i++) {
                typedef = typedef (i==2 ? "" : " ") parts[i]
            }
            m = split(names, namearr, /,/)
            for (j = 1; j <= m; j++) {
                if (namearr[j] == f) {
                    print typedef
                    exit
                }
            }
        }
    '
}

Then refactor get_field_type to call it (DRY):

get_field_type() {
    local block="$1"
    local field="$2"

    local raw_type
    raw_type=$(get_field_raw_type "$block" "$field")
    [ -z "$raw_type" ] && return 0

    local base="${raw_type%% *}"
    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        printf '%s' "$raw_type"
        ;;
    *)
        local resolved
        resolved=$(resolve_typedef "$block" "$base")
        if [ -n "$resolved" ]; then
            printf '%s' "$resolved"
        else
            printf '%s' "$raw_type"
        fi
        ;;
    esac
}

2. Compute a display label

The friendliest behavior: show the alias when one exists, otherwise show the base type (and notably, avoid showing regexes for inline regexp types too, since you said the long regex is distracting). Add a small helper:

# Produce a human-friendly type label for prompting.
#   - If %type: uses a named alias  -> show the alias name.
#   - If the resolved type is regexp -> show "regexp" (hide the pattern).
#   - Otherwise                      -> show the base type name.
display_type_label() {
    local block="$1"
    local field="$2"

    local raw resolved base
    raw=$(get_field_raw_type "$block" "$field")
    [ -z "$raw" ] && return 0

    base="${raw%% *}"
    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        # Inline/base type: hide regexp pattern, show just the keyword.
        printf '%s' "$base"
        ;;
    *)
        # It's an alias name — show it verbatim if it resolves.
        resolved=$(resolve_typedef "$block" "$base")
        if [ -n "$resolved" ]; then
            printf '%s' "$base"
        else
            printf '%s' "$raw"
        fi
        ;;
    esac
}

3. Pass both values into prompt_field

prompt_field should validate against the resolved type but display the label. Add a fourth parameter:

prompt_field() {
    local field="$1"
    local is_mandatory="$2"  # "yes" / "no"
    local ftype="$3"         # resolved type, used for validation
    local type_label="$4"    # friendly label, used for display

    local label="$field"
    local tags=""
    [ "$is_mandatory" = "yes" ] && tags="${tags} [MANDATORY]"
    [ -n "$type_label" ] && tags="${tags} (type: ${type_label})"
    # ... rest unchanged ...

4. Update the three call sites

In each of the three prompting loops in main, compute the label alongside the type:

local ftype label
ftype=$(get_field_type "$block" "$f")
label=$(display_type_label "$block" "$f")

local val
val=$(prompt_field "$f" "$mand" "$ftype" "$label")

Apply the same pattern to the mandatory-fields loop and the free-form loop.

Result

With your example:

%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/
%type: phone phonenum

The prompt now reads:

phone (type: phonenum):

…while validation still runs against the fully resolved regexp /.../ pattern. Best of both worlds.

Additional Observations (not blocking your change)

  1. $? capture bug — In prompt_field:

    validated=$(validate_field_type "$value" "$ftype")
    if [ $? -ne 0 ]; then

    This works only because nothing runs between the assignment and the test. It's fragile. Prefer:

    if validated=$(validate_field_type "$value" "$ftype"); then
        printf '%s' "$validated"
    else
        continue
    fi
  2. Repeated awk parsingget_field_raw_type is now called twice per field (once via get_field_type, once via display_type_label). For small recfiles this is negligible, but if you ever profile this, consider computing both the resolved type and the label in a single helper that returns both (e.g. via a delimiter or two global vars).
  3. recins exit-code capture — Same $? fragility at the end of main:

    recins -t "$rectype" "${args[@]}" "$recfile"
    local status=$?

    This is fine as written, but worth keeping consistent with the if recins ...; then idiom for robustness.

These are optional cleanups. The four changes above fully satisfy your stated requirement.

I think you have raised a great point. Let's modify this to compute both the resolved type and the label in a single helper that returns both to prevent repeated awk parsing.

  • Consolidating the Type Lookup into a Single AWK Pass

Good call. Let's eliminate the redundant parsing by having one helper do the awk work once and return both the validation type and the display label. The cleanest, bash-3.2-safe way to "return two values" is to print them on one line with a delimiter the data can't contain, then split with parameter expansion.

The Combined Helper

A tab is the safest delimiter here — recfile field types are whitespace-separated tokens, but a literal tab won't appear inside %type: declarations in practice, and we control both halves.

# ---------------------------------------------------------------------------
# Look up a field's type ONCE and return two tab-separated values:
#
#       <resolved-type><TAB><display-label>
#
#   - resolved-type : the fully-resolved type string used for VALIDATION
#                     (e.g. "regexp /^.../", "int", "range 1 100").
#   - display-label : a human-friendly label used for the PROMPT
#                     (e.g. the alias "phonenum", or "regexp" for an
#                     inline pattern, or the base type name).
#
#   Both fields may be empty if the field has no declared type.
#   This performs a single awk pass over the descriptor block.
# ---------------------------------------------------------------------------
get_field_type_info() {
    local block="$1"
    local field="$2"

    # Single awk pass: pull the raw %type: token(s) for this field.
    local raw_type
    raw_type=$(echo "$block" | awk -v f="$field" '
        index($0, "%type:") == 1 {
            rest = substr($0, length("%type:") + 1)
            sub(/^[[:space:]]+/, "", rest)
            n = split(rest, parts, /[[:space:]]+/)
            names = parts[1]
            typedef = ""
            for (i = 2; i <= n; i++) {
                typedef = typedef (i==2 ? "" : " ") parts[i]
            }
            m = split(names, namearr, /,/)
            for (j = 1; j <= m; j++) {
                if (namearr[j] == f) {
                    print typedef
                    exit
                }
            }
        }
    ')

    # No declared type: emit an empty resolved/label pair.
    if [ -z "$raw_type" ]; then
        printf '\t'
        return 0
    fi

    local base="${raw_type%% *}"
    local resolved label

    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        # Concrete inline type.
        resolved="$raw_type"
        # Hide the regexp pattern in the label; show just the keyword.
        label="$base"
        ;;
    *)
        # Named alias: resolve it (one typedef lookup, only when needed).
        local typedef
        typedef=$(resolve_typedef "$block" "$base")
        if [ -n "$typedef" ]; then
            resolved="$typedef"   # validate against the underlying type
            label="$base"         # but display the friendly alias name
        else
            # Unknown alias: fall back to raw for both.
            resolved="$raw_type"
            label="$raw_type"
        fi
        ;;
    esac

    printf '%s\t%s' "$resolved" "$label"
}

Splitting the Result at the Call Sites

Use a literal tab in the parameter expansion. To keep it readable and avoid embedding a raw tab in the source, define a tab once near the top of the script:

# Near the top of the script, after the shebang/comments:
TAB=$(printf '\t')

Then each call site becomes:

local info ftype label
info=$(get_field_type_info "$block" "$f")
ftype="${info%%"$TAB"*}"   # everything before the tab  -> resolved type
label="${info#*"$TAB"}"    # everything after the tab   -> display label

local val
val=$(prompt_field "$f" "$mand" "$ftype" "$label")

Apply the identical three-line split in all three loops (allowed, mandatory, free-form), substituting $f / $newfield as appropriate.

Note on quoting: ${info%%"$TAB"*} quotes $TAB so the tab is treated literally rather than as a glob pattern. This is correct in bash 3.2. Since both halves always exist (the helper always prints exactly one tab), %% and # behave predictably even when one side is empty.

Retiring the Now-Redundant Helpers

With get_field_type_info in place, you can delete both:

  • get_field_type
  • display_type_label (the earlier proposal)

…and you no longer need a standalone get_field_raw_type, since its logic now lives inline in the single pass. resolve_typedef stays — it's still called (lazily, only for actual aliases).

Why This Is Better

Aspect Before After
awk passes over block (per field) 2 1
resolve_typedef calls (per field) up to 2 at most 1, only for aliases
Source of truth for "raw type" duplicated in 2 functions single location
Risk of label/type drifting out of sync real eliminated

The validation behavior is unchanged — phonenum still validates against its underlying regex — while the prompt shows the clean (type: phonenum) label, and we've halved the parsing work per field.

One Caveat Worth Flagging

If a descriptor block could ever legitimately contain a tab character inside a %type: line, this delimiter scheme would break. In standard recutils files that doesn't happen (type tokens are space-separated), so a tab is safe. If you wanted to be bulletproof, you could split on a control character like $'\x1f' (ASCII Unit Separator) instead — same mechanism, even less chance of collision. For this script, tab is perfectly adequate.

Finally, help me fix the $?= fragility at the end of main.

  • Fixing the $? Capture at the End of main

The current end-of-main pattern relies on $? surviving from one statement to the next, which works here but is fragile — any future edit that slips a command between the recins call and the capture will silently break the error handling. Folding the call directly into the if condition removes that footgun entirely.

  • The Current Code
    # ---- Build and run recins ---------------------------------------------
    echo "" >&2
    echo "Inserting record..." >&2
    if [ -n "$rectype" ]; then
        recins -t "$rectype" "${args[@]}" "$recfile"
    else
        recins "${args[@]}" "$recfile"
    fi

    local status=$?
    if [ "$status" -eq 0 ]; then
        echo "Record inserted successfully into '$recfile'." >&2
    else
        echo "recins failed (exit code $status)." >&2
        exit "$status"
    fi

The $? here actually reflects the if=/=fi block's exit status (which is the exit status of whichever recins branch ran), so it works — but only by happy coincidence of nothing intervening.

  • The Fix

Capture the status immediately after each branch, so it's bound tightly to the command that produced it:

    # ---- Build and run recins ---------------------------------------------
    echo "" >&2
    echo "Inserting record..." >&2

    local status
    if [ -n "$rectype" ]; then
        recins -t "$rectype" "${args[@]}" "$recfile"
        status=$?
    else
        recins "${args[@]}" "$recfile"
        status=$?
    fi

    if [ "$status" -eq 0 ]; then
        echo "Record inserted successfully into '$recfile'." >&2
    else
        echo "recins failed (exit code $status)." >&2
        exit "$status"
    fi
  • Why This Is Better

The status=$? now sits on the line immediately following each recins invocation, inside the same branch. There is no possibility of an intervening command — and crucially, the intent is now self-evident to anyone editing later: "this captures the exit code of this command."

Note: local status is declared once before the if so the variable has function scope, but the assignment happens inside each branch right after the relevant call. Splitting the declaration from the assignment also avoids the classic bash pitfall where local status=$(...) would mask the command's exit code — though that doesn't apply to a plain $? capture, it's a good habit to keep declaration and value-capture separate when $? matters.

  • An Even Cleaner Alternative

If you'd rather avoid the explicit status variable altogether, you can branch on recins directly. This requires duplicating the success/failure messaging slightly, or factoring it out:

    # ---- Build and run recins ---------------------------------------------
    echo "" >&2
    echo "Inserting record..." >&2

    run_recins() {
        if [ -n "$rectype" ]; then
            recins -t "$rectype" "${args[@]}" "$recfile"
        else
            recins "${args[@]}" "$recfile"
        fi
    }

    if run_recins; then
        echo "Record inserted successfully into '$recfile'." >&2
    else
        status=$?
        echo "recins failed (exit code $status)." >&2
        exit "$status"
    fi

Here the if consumes recins's exit status directly, and $? inside the else branch correctly reflects the failed command (the else branch runs because the condition was non-zero, and $? still holds that value at the first line of the branch).

Caveat for the alternative: run_recins would need access to rectype, args, and recfile. Since these are local to main, defining run_recins inside main (as shown) keeps them in scope. If you'd prefer not to nest a function definition inside main, stick with the first fix.

  • Recommendation

Go with the first fix. It's the smallest possible change, requires no nested function or scope juggling, reads clearly, and directly addresses the fragility you identified. The status=$? immediately after each call is the idiomatic, defensive pattern for capturing exit codes in bash.

*