zrec/zrecins-opus-again.org

:PROPERTIES:
:GPTEL_MODEL: claude-opus-4-8
:GPTEL_BACKEND: Claude
:GPTEL_SYSTEM: Act as a Code Review Specialist. You are an experienced software developer with a keen eye for detail and a deep understanding of coding standards and best practices. \n\nYour task is to review the code provided by the user, focusing on areas such as:\n- Code quality and readability\n- Adherence to coding standards\n- Potential bugs and security vulnerabilities\n- Performance optimization\n\nYou will:\n- Provide constructive feedback on the code\n- Suggest improvements and refactoring where necessary\n- Highlight any security concerns\n- Ensure the code follows best practices\n\nRules:\n- Be objective and professional in your feedback\n- Prioritize clarity and maintainability in your suggestions\n- Consider the specific context and requirements provided with the code
:GPTEL_TOOLS:
:GPTEL_MAX_TOKENS: 100000
:GPTEL_BOUNDS: ((response (1503 7598) (7606 13674) (13724 13821) (13856 19454) (19485 19519) (19522 23716)))
:END:

*** Help me update this script. It functions well, but I want to change one aspect. When prompted for a field that is a regexp type, I would like the type name to appear in parentheses instead of the long regex. For example, if I define a the phonenum type with "%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/" then use "%type: phone phonenum" I would prefer the displayed type to say (phonenum). The long regex is distracting and not helpful to me.

* Code Review & Solution

This is a well-structured script with good documentation. Your requested change is reasonable—displaying the typedef alias name is far more user-friendly than dumping a raw regex into the prompt.

** The Core Issue

Currently, =get_field_type()= resolves the alias to its underlying type string and discards the alias name. By the time =prompt_field()= receives the type, the friendly name (=phonenum=) is gone—only the regex remains.

To fix this cleanly, we need to *preserve the alias name* for display purposes while *keeping the resolved type* for validation. These are two separate concerns, so I recommend separating them.

** Recommended Approach

I suggest adding a small helper that returns the /display label/ for a field, and passing both a display label and the validation type into =prompt_field()=. This keeps validation logic untouched (lower risk) while giving you the display you want.

*** Step 1: Add a display-name resolver

Add this function near =get_field_type()=:

#+begin_src bash
# ---------------------------------------------------------------------------
# Determine a human-friendly type label for display purposes.
#   - If %type: names a base type, show that base type.
#   - If %type: names a %typedef: alias, show the alias name (NOT its
#     expanded definition), which is far more readable for things like
#     long regexp patterns.
#   Prints the label (may be empty if no type is declared).
# ---------------------------------------------------------------------------
get_field_type_label() {
    local block="$1"
    local field="$2"

    # Find the raw token(s) %type: declares for this field.
    local raw_type
    raw_type=$(echo "$block" | awk -v f="$field" '
        index($0, "%type:") == 1 {
            rest = substr($0, length("%type:") + 1)
            sub(/^[[:space:]]+/, "", rest)
            n = split(rest, parts, /[[:space:]]+/)
            names = parts[1]
            typedef = ""
            for (i = 2; i <= n; i++) {
                typedef = typedef (i==2 ? "" : " ") parts[i]
            }
            m = split(names, namearr, /,/)
            for (j = 1; j <= m; j++) {
                if (namearr[j] == f) {
                    print typedef
                    exit
                }
            }
        }
    ')

    [ -z "$raw_type" ] && return 0

    local base="${raw_type%% *}"
    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        # Concrete inline type: show it as-is (already concise).
        printf '%s' "$raw_type"
        ;;
    *)
        # Named alias: prefer the alias name itself for display.
        if [ -n "$(resolve_typedef "$block" "$base")" ]; then
            printf '%s' "$base"
        else
            printf '%s' "$raw_type"
        fi
        ;;
    esac
}
#+end_src

*** Step 2: Update =prompt_field()= to accept a display label

Add a fourth parameter used only for the prompt text:

#+begin_src bash
prompt_field() {
    local field="$1"
    local is_mandatory="$2" # "yes" / "no"
    local ftype="$3"        # resolved type, used for validation
    local display_type="$4" # friendly label, used only in the prompt

    # Fall back to the validation type if no display label was provided.
    [ -z "$display_type" ] && display_type="$ftype"

    local label="$field"
    local tags=""
    [ "$is_mandatory" = "yes" ] && tags="${tags} [MANDATORY]"
    [ -n "$display_type" ] && tags="${tags} (type: ${display_type})"

    # ... rest unchanged ...
#+end_src

The validation block inside still uses =$ftype= (the resolved type), so =phonenum= validates against the underlying regex exactly as before.

*** Step 3: Update the three call sites in =main()=

In each of the three prompt locations, compute the label alongside the type:

#+begin_src bash
# --- Allowed fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$f")
ftype_label=$(get_field_type_label "$block" "$f")
val=$(prompt_field "$f" "$mand" "$ftype" "$ftype_label")
#+end_src

#+begin_src bash
# --- Mandatory fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$f")
ftype_label=$(get_field_type_label "$block" "$f")
val=$(prompt_field "$f" "yes" "$ftype" "$ftype_label")
#+end_src

#+begin_src bash
# --- Additional free-form fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$newfield")
ftype_label=$(get_field_type_label "$block" "$newfield")
val=$(prompt_field "$newfield" "no" "$ftype" "$ftype_label")
#+end_src

** Result

With your example:

#+begin_src
%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/
%type: phone phonenum
#+end_src

The prompt now reads:

#+begin_src
phone (type: phonenum):
#+end_src

…while validation still enforces the underlying regex.

---

** Additional Observations (optional, not required for your change)

A few items worth noting for future maintenance:

1. *Lost exit status after command substitution* (existing bug):
   #+begin_src bash
   validated=$(validate_field_type "$value" "$ftype")
   if [ $? -ne 0 ]; then
   #+end_src
   =local validated=$(...)= would mask =$?=, but here you've correctly split the declaration from assignment in =prompt_field=—so this is actually fine. Worth keeping that pattern consistent if you ever refactor.

2. *Redundant awk invocations.* =get_field_type= and the new =get_field_type_label= share nearly identical awk logic to extract the raw type. If this script grows, consider a single helper that returns the raw =%type:= token, with thin wrappers for "resolved" vs "display" forms—reducing duplication and keeping the two in sync.

3. *=range= display* is already concise (=range 1 100=), so I deliberately left it as a concrete type in the label resolver. If you later add named range typedefs (e.g. =%typedef: score range 1 100=), the alias-name logic will handle them automatically and show =(type: score)=.

These are minor and don't block your change. The three-step modification above achieves exactly what you asked with minimal risk to the working validation logic.

***

* Code Review: Displaying Type Aliases Instead of Resolved Regex

Good request — showing =(phonenum)= is far more user-friendly than dumping a raw regex at the prompt. The challenge is that your current architecture *resolves the alias too early*: =get_field_type= returns the /resolved/ base type (e.g. =regexp /^.../=), so by the time you build the prompt label, the original alias name is already lost.

** The Core Issue

You have two competing needs:

1. *Validation* needs the /resolved/ type string (e.g. =regexp /pattern/=).
2. *Display* wants the /alias/ name (e.g. =phonenum=) when one exists.

The cleanest fix is to separate the *display name* from the *validation type*, rather than overloading a single return value.

** Recommended Approach

*** 1. Add a helper to get the raw (unresolved) declared type

Extract the lookup logic that's currently buried inside =get_field_type= so you can reuse it:

#+begin_src bash
# Returns the raw type string as declared in %type: (may be an alias).
get_field_raw_type() {
    local block="$1"
    local field="$2"

    echo "$block" | awk -v f="$field" '
        index($0, "%type:") == 1 {
            rest = substr($0, length("%type:") + 1)
            sub(/^[[:space:]]+/, "", rest)
            n = split(rest, parts, /[[:space:]]+/)
            names = parts[1]
            typedef = ""
            for (i = 2; i <= n; i++) {
                typedef = typedef (i==2 ? "" : " ") parts[i]
            }
            m = split(names, namearr, /,/)
            for (j = 1; j <= m; j++) {
                if (namearr[j] == f) {
                    print typedef
                    exit
                }
            }
        }
    '
}
#+end_src

Then refactor =get_field_type= to call it (DRY):

#+begin_src bash
get_field_type() {
    local block="$1"
    local field="$2"

    local raw_type
    raw_type=$(get_field_raw_type "$block" "$field")
    [ -z "$raw_type" ] && return 0

    local base="${raw_type%% *}"
    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        printf '%s' "$raw_type"
        ;;
    *)
        local resolved
        resolved=$(resolve_typedef "$block" "$base")
        if [ -n "$resolved" ]; then
            printf '%s' "$resolved"
        else
            printf '%s' "$raw_type"
        fi
        ;;
    esac
}
#+end_src

*** 2. Compute a display label

The friendliest behavior: show the alias when one exists, otherwise show the base type (and notably, *avoid showing regexes for inline =regexp= types too*, since you said the long regex is distracting). Add a small helper:

#+begin_src bash
# Produce a human-friendly type label for prompting.
#   - If %type: uses a named alias  -> show the alias name.
#   - If the resolved type is regexp -> show "regexp" (hide the pattern).
#   - Otherwise                      -> show the base type name.
display_type_label() {
    local block="$1"
    local field="$2"

    local raw resolved base
    raw=$(get_field_raw_type "$block" "$field")
    [ -z "$raw" ] && return 0

    base="${raw%% *}"
    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        # Inline/base type: hide regexp pattern, show just the keyword.
        printf '%s' "$base"
        ;;
    *)
        # It's an alias name — show it verbatim if it resolves.
        resolved=$(resolve_typedef "$block" "$base")
        if [ -n "$resolved" ]; then
            printf '%s' "$base"
        else
            printf '%s' "$raw"
        fi
        ;;
    esac
}
#+end_src

*** 3. Pass both values into =prompt_field=

=prompt_field= should validate against the resolved type but /display/ the label. Add a fourth parameter:

#+begin_src bash
prompt_field() {
    local field="$1"
    local is_mandatory="$2"  # "yes" / "no"
    local ftype="$3"         # resolved type, used for validation
    local type_label="$4"    # friendly label, used for display

    local label="$field"
    local tags=""
    [ "$is_mandatory" = "yes" ] && tags="${tags} [MANDATORY]"
    [ -n "$type_label" ] && tags="${tags} (type: ${type_label})"
    # ... rest unchanged ...
#+end_src

*** 4. Update the three call sites

In each of the three prompting loops in =main=, compute the label alongside the type:

#+begin_src bash
local ftype label
ftype=$(get_field_type "$block" "$f")
label=$(display_type_label "$block" "$f")

local val
val=$(prompt_field "$f" "$mand" "$ftype" "$label")
#+end_src

Apply the same pattern to the mandatory-fields loop and the free-form loop.

** Result

With your example:

#+begin_src
%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/
%type: phone phonenum
#+end_src

The prompt now reads:

#+begin_src
phone (type: phonenum):
#+end_src

…while validation still runs against the fully resolved =regexp /.../= pattern. Best of both worlds.

** Additional Observations (not blocking your change)

1. *=$?= capture bug* — In =prompt_field=:
   #+begin_src bash
   validated=$(validate_field_type "$value" "$ftype")
   if [ $? -ne 0 ]; then
   #+end_src
   This works /only because/ nothing runs between the assignment and the test. It's fragile. Prefer:
   #+begin_src bash
   if validated=$(validate_field_type "$value" "$ftype"); then
       printf '%s' "$validated"
   else
       continue
   fi
   #+end_src

2. *Repeated =awk= parsing* — =get_field_raw_type= is now called twice per field (once via =get_field_type=, once via =display_type_label=). For small recfiles this is negligible, but if you ever profile this, consider computing both the resolved type and the label in a single helper that returns both (e.g. via a delimiter or two global vars).

3. *=recins= exit-code capture* — Same =$?= fragility at the end of =main=:
   #+begin_src bash
   recins -t "$rectype" "${args[@]}" "$recfile"
   local status=$?
   #+end_src
   This is fine as written, but worth keeping consistent with the =if recins ...; then= idiom for robustness.

These are optional cleanups. The four changes above fully satisfy your stated requirement.

*** I think you have raised a great point. Let's modify this to compute both the resolved type and the label in a single helper that returns both to prevent repeated awk parsing.

- Consolidating the Type Lookup into a Single AWK Pass

Good call. Let's eliminate the redundant parsing by having one helper do the =awk= work once and return /both/ the validation type and the display label. The cleanest, bash-3.2-safe way to "return two values" is to print them on one line with a delimiter the data can't contain, then split with parameter expansion.

* The Combined Helper

A tab is the safest delimiter here — recfile field types are whitespace-separated tokens, but a literal /tab/ won't appear inside =%type:= declarations in practice, and we control both halves.

#+begin_src bash
# ---------------------------------------------------------------------------
# Look up a field's type ONCE and return two tab-separated values:
#
#       <resolved-type><TAB><display-label>
#
#   - resolved-type : the fully-resolved type string used for VALIDATION
#                     (e.g. "regexp /^.../", "int", "range 1 100").
#   - display-label : a human-friendly label used for the PROMPT
#                     (e.g. the alias "phonenum", or "regexp" for an
#                     inline pattern, or the base type name).
#
#   Both fields may be empty if the field has no declared type.
#   This performs a single awk pass over the descriptor block.
# ---------------------------------------------------------------------------
get_field_type_info() {
    local block="$1"
    local field="$2"

    # Single awk pass: pull the raw %type: token(s) for this field.
    local raw_type
    raw_type=$(echo "$block" | awk -v f="$field" '
        index($0, "%type:") == 1 {
            rest = substr($0, length("%type:") + 1)
            sub(/^[[:space:]]+/, "", rest)
            n = split(rest, parts, /[[:space:]]+/)
            names = parts[1]
            typedef = ""
            for (i = 2; i <= n; i++) {
                typedef = typedef (i==2 ? "" : " ") parts[i]
            }
            m = split(names, namearr, /,/)
            for (j = 1; j <= m; j++) {
                if (namearr[j] == f) {
                    print typedef
                    exit
                }
            }
        }
    ')

    # No declared type: emit an empty resolved/label pair.
    if [ -z "$raw_type" ]; then
        printf '\t'
        return 0
    fi

    local base="${raw_type%% *}"
    local resolved label

    case "$base" in
    int | real | bool | date | line | url | email | uuid | regexp | range)
        # Concrete inline type.
        resolved="$raw_type"
        # Hide the regexp pattern in the label; show just the keyword.
        label="$base"
        ;;
    *)
        # Named alias: resolve it (one typedef lookup, only when needed).
        local typedef
        typedef=$(resolve_typedef "$block" "$base")
        if [ -n "$typedef" ]; then
            resolved="$typedef"   # validate against the underlying type
            label="$base"         # but display the friendly alias name
        else
            # Unknown alias: fall back to raw for both.
            resolved="$raw_type"
            label="$raw_type"
        fi
        ;;
    esac

    printf '%s\t%s' "$resolved" "$label"
}
#+end_src

* Splitting the Result at the Call Sites

Use a literal tab in the parameter expansion. To keep it readable and avoid embedding a raw tab in the source, define a tab once near the top of the script:

#+begin_src bash
# Near the top of the script, after the shebang/comments:
TAB=$(printf '\t')
#+end_src

Then each call site becomes:

#+begin_src bash
local info ftype label
info=$(get_field_type_info "$block" "$f")
ftype="${info%%"$TAB"*}"   # everything before the tab  -> resolved type
label="${info#*"$TAB"}"    # everything after the tab   -> display label

local val
val=$(prompt_field "$f" "$mand" "$ftype" "$label")
#+end_src

Apply the identical three-line split in all three loops (allowed, mandatory, free-form), substituting =$f= / =$newfield= as appropriate.

#+begin_quote
/Note on quoting:/ =${info%%"$TAB"*}= quotes =$TAB= so the tab is treated literally rather than as a glob pattern. This is correct in bash 3.2. Since both halves always exist (the helper always prints exactly one tab), =%%= and =#= behave predictably even when one side is empty.
#+end_quote

* Retiring the Now-Redundant Helpers

With =get_field_type_info= in place, you can /delete/ both:

- =get_field_type=
- =display_type_label= (the earlier proposal)

…and you no longer need a standalone =get_field_raw_type=, since its logic now lives inline in the single pass. =resolve_typedef= stays — it's still called (lazily, only for actual aliases).

* Why This Is Better

| Aspect | Before | After |
|--------+--------+-------|
| =awk= passes over block (per field) | 2 | 1 |
| =resolve_typedef= calls (per field) | up to 2 | at most 1, only for aliases |
| Source of truth for "raw type" | duplicated in 2 functions | single location |
| Risk of label/type drifting out of sync | real | eliminated |

The validation behavior is unchanged — =phonenum= still validates against its underlying regex — while the prompt shows the clean =(type: phonenum)= label, and we've halved the parsing work per field.

* One Caveat Worth Flagging

If a descriptor block could ever legitimately contain a /tab character/ inside a =%type:= line, this delimiter scheme would break. In standard recutils files that doesn't happen (type tokens are space-separated), so a tab is safe. If you wanted to be bulletproof, you could split on a control character like =$'\x1f'= (ASCII Unit Separator) instead — same mechanism, even less chance of collision. For this script, tab is perfectly adequate.

*** Finally, help me fix the $?= fragility at the end of =main=.

- Fixing the =$?= Capture at the End of =main=

The current end-of-=main= pattern relies on =$?= surviving from one statement to the next, which works here but is fragile — any future edit that slips a command between the =recins= call and the capture will silently break the error handling. Folding the call directly into the =if= condition removes that footgun entirely.

- The Current Code

#+begin_src bash
    # ---- Build and run recins ---------------------------------------------
    echo "" >&2
    echo "Inserting record..." >&2
    if [ -n "$rectype" ]; then
        recins -t "$rectype" "${args[@]}" "$recfile"
    else
        recins "${args[@]}" "$recfile"
    fi

    local status=$?
    if [ "$status" -eq 0 ]; then
        echo "Record inserted successfully into '$recfile'." >&2
    else
        echo "recins failed (exit code $status)." >&2
        exit "$status"
    fi
#+end_src

The =$?= here actually reflects the =if=/=fi= block's exit status (which is the exit status of whichever =recins= branch ran), so it /works/ — but only by happy coincidence of nothing intervening.

- The Fix

Capture the status immediately after each branch, so it's bound tightly to the command that produced it:

#+begin_src bash
    # ---- Build and run recins ---------------------------------------------
    echo "" >&2
    echo "Inserting record..." >&2

    local status
    if [ -n "$rectype" ]; then
        recins -t "$rectype" "${args[@]}" "$recfile"
        status=$?
    else
        recins "${args[@]}" "$recfile"
        status=$?
    fi

    if [ "$status" -eq 0 ]; then
        echo "Record inserted successfully into '$recfile'." >&2
    else
        echo "recins failed (exit code $status)." >&2
        exit "$status"
    fi
#+end_src

- Why This Is Better

The =status=$?= now sits on the line /immediately/ following each =recins= invocation, inside the same branch. There is no possibility of an intervening command — and crucially, the intent is now self-evident to anyone editing later: "this captures the exit code of /this/ command."

#+begin_quote
/Note:/ =local status= is declared once before the =if= so the variable has function scope, but the /assignment/ happens inside each branch right after the relevant call. Splitting the declaration from the assignment also avoids the classic bash pitfall where =local status=$(...)= would mask the command's exit code — though that doesn't apply to a plain =$?= capture, it's a good habit to keep declaration and value-capture separate when =$?= matters.
#+end_quote

- An Even Cleaner Alternative

If you'd rather avoid the explicit =status= variable altogether, you can branch on =recins= directly. This requires duplicating the success/failure messaging slightly, or factoring it out:

#+begin_src bash
    # ---- Build and run recins ---------------------------------------------
    echo "" >&2
    echo "Inserting record..." >&2

    run_recins() {
        if [ -n "$rectype" ]; then
            recins -t "$rectype" "${args[@]}" "$recfile"
        else
            recins "${args[@]}" "$recfile"
        fi
    }

    if run_recins; then
        echo "Record inserted successfully into '$recfile'." >&2
    else
        status=$?
        echo "recins failed (exit code $status)." >&2
        exit "$status"
    fi
#+end_src

Here the =if= consumes =recins='s exit status directly, and =$?= inside the =else= branch correctly reflects the failed command (the =else= branch runs /because/ the condition was non-zero, and =$?= still holds that value at the first line of the branch).

#+begin_quote
/Caveat for the alternative:/ =run_recins= would need access to =rectype=, =args=, and =recfile=. Since these are =local= to =main=, defining =run_recins= /inside/ =main= (as shown) keeps them in scope. If you'd prefer not to nest a function definition inside =main=, stick with the first fix.
#+end_quote

- Recommendation

Go with the /first fix/. It's the smallest possible change, requires no nested function or scope juggling, reads clearly, and directly addresses the fragility you identified. The =status=$?= immediately after each call is the idiomatic, defensive pattern for capturing exit codes in bash.

***