Files
zrec/zrecins-opus-again.org
T
billmanz 4894ada68f Initial commit of the zrec function.
`zrec` exists on the internet now! For anyone who ever looks at
this, yes, this was entirely vibe-coded. Sorry, I saw a niche that
an LLM could help me fill. I find this to work for my usecase.
2026-06-18 09:14:05 -04:00

601 lines
23 KiB
Org Mode

:PROPERTIES:
:GPTEL_MODEL: claude-opus-4-8
:GPTEL_BACKEND: Claude
:GPTEL_SYSTEM: Act as a Code Review Specialist. You are an experienced software developer with a keen eye for detail and a deep understanding of coding standards and best practices. \n\nYour task is to review the code provided by the user, focusing on areas such as:\n- Code quality and readability\n- Adherence to coding standards\n- Potential bugs and security vulnerabilities\n- Performance optimization\n\nYou will:\n- Provide constructive feedback on the code\n- Suggest improvements and refactoring where necessary\n- Highlight any security concerns\n- Ensure the code follows best practices\n\nRules:\n- Be objective and professional in your feedback\n- Prioritize clarity and maintainability in your suggestions\n- Consider the specific context and requirements provided with the code
:GPTEL_TOOLS:
:GPTEL_MAX_TOKENS: 100000
:GPTEL_BOUNDS: ((response (1503 7598) (7606 13674) (13724 13821) (13856 19454) (19485 19519) (19522 23716)))
:END:
*** Help me update this script. It functions well, but I want to change one aspect. When prompted for a field that is a regexp type, I would like the type name to appear in parentheses instead of the long regex. For example, if I define a the phonenum type with "%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/" then use "%type: phone phonenum" I would prefer the displayed type to say (phonenum). The long regex is distracting and not helpful to me.
* Code Review & Solution
This is a well-structured script with good documentation. Your requested change is reasonable—displaying the typedef alias name is far more user-friendly than dumping a raw regex into the prompt.
** The Core Issue
Currently, =get_field_type()= resolves the alias to its underlying type string and discards the alias name. By the time =prompt_field()= receives the type, the friendly name (=phonenum=) is gone—only the regex remains.
To fix this cleanly, we need to *preserve the alias name* for display purposes while *keeping the resolved type* for validation. These are two separate concerns, so I recommend separating them.
** Recommended Approach
I suggest adding a small helper that returns the /display label/ for a field, and passing both a display label and the validation type into =prompt_field()=. This keeps validation logic untouched (lower risk) while giving you the display you want.
*** Step 1: Add a display-name resolver
Add this function near =get_field_type()=:
#+begin_src bash
# ---------------------------------------------------------------------------
# Determine a human-friendly type label for display purposes.
# - If %type: names a base type, show that base type.
# - If %type: names a %typedef: alias, show the alias name (NOT its
# expanded definition), which is far more readable for things like
# long regexp patterns.
# Prints the label (may be empty if no type is declared).
# ---------------------------------------------------------------------------
get_field_type_label() {
local block="$1"
local field="$2"
# Find the raw token(s) %type: declares for this field.
local raw_type
raw_type=$(echo "$block" | awk -v f="$field" '
index($0, "%type:") == 1 {
rest = substr($0, length("%type:") + 1)
sub(/^[[:space:]]+/, "", rest)
n = split(rest, parts, /[[:space:]]+/)
names = parts[1]
typedef = ""
for (i = 2; i <= n; i++) {
typedef = typedef (i==2 ? "" : " ") parts[i]
}
m = split(names, namearr, /,/)
for (j = 1; j <= m; j++) {
if (namearr[j] == f) {
print typedef
exit
}
}
}
')
[ -z "$raw_type" ] && return 0
local base="${raw_type%% *}"
case "$base" in
int | real | bool | date | line | url | email | uuid | regexp | range)
# Concrete inline type: show it as-is (already concise).
printf '%s' "$raw_type"
;;
*)
# Named alias: prefer the alias name itself for display.
if [ -n "$(resolve_typedef "$block" "$base")" ]; then
printf '%s' "$base"
else
printf '%s' "$raw_type"
fi
;;
esac
}
#+end_src
*** Step 2: Update =prompt_field()= to accept a display label
Add a fourth parameter used only for the prompt text:
#+begin_src bash
prompt_field() {
local field="$1"
local is_mandatory="$2" # "yes" / "no"
local ftype="$3" # resolved type, used for validation
local display_type="$4" # friendly label, used only in the prompt
# Fall back to the validation type if no display label was provided.
[ -z "$display_type" ] && display_type="$ftype"
local label="$field"
local tags=""
[ "$is_mandatory" = "yes" ] && tags="${tags} [MANDATORY]"
[ -n "$display_type" ] && tags="${tags} (type: ${display_type})"
# ... rest unchanged ...
#+end_src
The validation block inside still uses =$ftype= (the resolved type), so =phonenum= validates against the underlying regex exactly as before.
*** Step 3: Update the three call sites in =main()=
In each of the three prompt locations, compute the label alongside the type:
#+begin_src bash
# --- Allowed fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$f")
ftype_label=$(get_field_type_label "$block" "$f")
val=$(prompt_field "$f" "$mand" "$ftype" "$ftype_label")
#+end_src
#+begin_src bash
# --- Mandatory fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$f")
ftype_label=$(get_field_type_label "$block" "$f")
val=$(prompt_field "$f" "yes" "$ftype" "$ftype_label")
#+end_src
#+begin_src bash
# --- Additional free-form fields loop ---
local ftype ftype_label
ftype=$(get_field_type "$block" "$newfield")
ftype_label=$(get_field_type_label "$block" "$newfield")
val=$(prompt_field "$newfield" "no" "$ftype" "$ftype_label")
#+end_src
** Result
With your example:
#+begin_src
%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/
%type: phone phonenum
#+end_src
The prompt now reads:
#+begin_src
phone (type: phonenum):
#+end_src
…while validation still enforces the underlying regex.
---
** Additional Observations (optional, not required for your change)
A few items worth noting for future maintenance:
1. *Lost exit status after command substitution* (existing bug):
#+begin_src bash
validated=$(validate_field_type "$value" "$ftype")
if [ $? -ne 0 ]; then
#+end_src
=local validated=$(...)= would mask =$?=, but here you've correctly split the declaration from assignment in =prompt_field=—so this is actually fine. Worth keeping that pattern consistent if you ever refactor.
2. *Redundant awk invocations.* =get_field_type= and the new =get_field_type_label= share nearly identical awk logic to extract the raw type. If this script grows, consider a single helper that returns the raw =%type:= token, with thin wrappers for "resolved" vs "display" forms—reducing duplication and keeping the two in sync.
3. *=range= display* is already concise (=range 1 100=), so I deliberately left it as a concrete type in the label resolver. If you later add named range typedefs (e.g. =%typedef: score range 1 100=), the alias-name logic will handle them automatically and show =(type: score)=.
These are minor and don't block your change. The three-step modification above achieves exactly what you asked with minimal risk to the working validation logic.
***
* Code Review: Displaying Type Aliases Instead of Resolved Regex
Good request — showing =(phonenum)= is far more user-friendly than dumping a raw regex at the prompt. The challenge is that your current architecture *resolves the alias too early*: =get_field_type= returns the /resolved/ base type (e.g. =regexp /^.../=), so by the time you build the prompt label, the original alias name is already lost.
** The Core Issue
You have two competing needs:
1. *Validation* needs the /resolved/ type string (e.g. =regexp /pattern/=).
2. *Display* wants the /alias/ name (e.g. =phonenum=) when one exists.
The cleanest fix is to separate the *display name* from the *validation type*, rather than overloading a single return value.
** Recommended Approach
*** 1. Add a helper to get the raw (unresolved) declared type
Extract the lookup logic that's currently buried inside =get_field_type= so you can reuse it:
#+begin_src bash
# Returns the raw type string as declared in %type: (may be an alias).
get_field_raw_type() {
local block="$1"
local field="$2"
echo "$block" | awk -v f="$field" '
index($0, "%type:") == 1 {
rest = substr($0, length("%type:") + 1)
sub(/^[[:space:]]+/, "", rest)
n = split(rest, parts, /[[:space:]]+/)
names = parts[1]
typedef = ""
for (i = 2; i <= n; i++) {
typedef = typedef (i==2 ? "" : " ") parts[i]
}
m = split(names, namearr, /,/)
for (j = 1; j <= m; j++) {
if (namearr[j] == f) {
print typedef
exit
}
}
}
'
}
#+end_src
Then refactor =get_field_type= to call it (DRY):
#+begin_src bash
get_field_type() {
local block="$1"
local field="$2"
local raw_type
raw_type=$(get_field_raw_type "$block" "$field")
[ -z "$raw_type" ] && return 0
local base="${raw_type%% *}"
case "$base" in
int | real | bool | date | line | url | email | uuid | regexp | range)
printf '%s' "$raw_type"
;;
*)
local resolved
resolved=$(resolve_typedef "$block" "$base")
if [ -n "$resolved" ]; then
printf '%s' "$resolved"
else
printf '%s' "$raw_type"
fi
;;
esac
}
#+end_src
*** 2. Compute a display label
The friendliest behavior: show the alias when one exists, otherwise show the base type (and notably, *avoid showing regexes for inline =regexp= types too*, since you said the long regex is distracting). Add a small helper:
#+begin_src bash
# Produce a human-friendly type label for prompting.
# - If %type: uses a named alias -> show the alias name.
# - If the resolved type is regexp -> show "regexp" (hide the pattern).
# - Otherwise -> show the base type name.
display_type_label() {
local block="$1"
local field="$2"
local raw resolved base
raw=$(get_field_raw_type "$block" "$field")
[ -z "$raw" ] && return 0
base="${raw%% *}"
case "$base" in
int | real | bool | date | line | url | email | uuid | regexp | range)
# Inline/base type: hide regexp pattern, show just the keyword.
printf '%s' "$base"
;;
*)
# It's an alias name — show it verbatim if it resolves.
resolved=$(resolve_typedef "$block" "$base")
if [ -n "$resolved" ]; then
printf '%s' "$base"
else
printf '%s' "$raw"
fi
;;
esac
}
#+end_src
*** 3. Pass both values into =prompt_field=
=prompt_field= should validate against the resolved type but /display/ the label. Add a fourth parameter:
#+begin_src bash
prompt_field() {
local field="$1"
local is_mandatory="$2" # "yes" / "no"
local ftype="$3" # resolved type, used for validation
local type_label="$4" # friendly label, used for display
local label="$field"
local tags=""
[ "$is_mandatory" = "yes" ] && tags="${tags} [MANDATORY]"
[ -n "$type_label" ] && tags="${tags} (type: ${type_label})"
# ... rest unchanged ...
#+end_src
*** 4. Update the three call sites
In each of the three prompting loops in =main=, compute the label alongside the type:
#+begin_src bash
local ftype label
ftype=$(get_field_type "$block" "$f")
label=$(display_type_label "$block" "$f")
local val
val=$(prompt_field "$f" "$mand" "$ftype" "$label")
#+end_src
Apply the same pattern to the mandatory-fields loop and the free-form loop.
** Result
With your example:
#+begin_src
%typedef: phonenum regexp /^([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})$/
%type: phone phonenum
#+end_src
The prompt now reads:
#+begin_src
phone (type: phonenum):
#+end_src
…while validation still runs against the fully resolved =regexp /.../= pattern. Best of both worlds.
** Additional Observations (not blocking your change)
1. *=$?= capture bug* — In =prompt_field=:
#+begin_src bash
validated=$(validate_field_type "$value" "$ftype")
if [ $? -ne 0 ]; then
#+end_src
This works /only because/ nothing runs between the assignment and the test. It's fragile. Prefer:
#+begin_src bash
if validated=$(validate_field_type "$value" "$ftype"); then
printf '%s' "$validated"
else
continue
fi
#+end_src
2. *Repeated =awk= parsing*=get_field_raw_type= is now called twice per field (once via =get_field_type=, once via =display_type_label=). For small recfiles this is negligible, but if you ever profile this, consider computing both the resolved type and the label in a single helper that returns both (e.g. via a delimiter or two global vars).
3. *=recins= exit-code capture* — Same =$?= fragility at the end of =main=:
#+begin_src bash
recins -t "$rectype" "${args[@]}" "$recfile"
local status=$?
#+end_src
This is fine as written, but worth keeping consistent with the =if recins ...; then= idiom for robustness.
These are optional cleanups. The four changes above fully satisfy your stated requirement.
*** I think you have raised a great point. Let's modify this to compute both the resolved type and the label in a single helper that returns both to prevent repeated awk parsing.
- Consolidating the Type Lookup into a Single AWK Pass
Good call. Let's eliminate the redundant parsing by having one helper do the =awk= work once and return /both/ the validation type and the display label. The cleanest, bash-3.2-safe way to "return two values" is to print them on one line with a delimiter the data can't contain, then split with parameter expansion.
* The Combined Helper
A tab is the safest delimiter here — recfile field types are whitespace-separated tokens, but a literal /tab/ won't appear inside =%type:= declarations in practice, and we control both halves.
#+begin_src bash
# ---------------------------------------------------------------------------
# Look up a field's type ONCE and return two tab-separated values:
#
# <resolved-type><TAB><display-label>
#
# - resolved-type : the fully-resolved type string used for VALIDATION
# (e.g. "regexp /^.../", "int", "range 1 100").
# - display-label : a human-friendly label used for the PROMPT
# (e.g. the alias "phonenum", or "regexp" for an
# inline pattern, or the base type name).
#
# Both fields may be empty if the field has no declared type.
# This performs a single awk pass over the descriptor block.
# ---------------------------------------------------------------------------
get_field_type_info() {
local block="$1"
local field="$2"
# Single awk pass: pull the raw %type: token(s) for this field.
local raw_type
raw_type=$(echo "$block" | awk -v f="$field" '
index($0, "%type:") == 1 {
rest = substr($0, length("%type:") + 1)
sub(/^[[:space:]]+/, "", rest)
n = split(rest, parts, /[[:space:]]+/)
names = parts[1]
typedef = ""
for (i = 2; i <= n; i++) {
typedef = typedef (i==2 ? "" : " ") parts[i]
}
m = split(names, namearr, /,/)
for (j = 1; j <= m; j++) {
if (namearr[j] == f) {
print typedef
exit
}
}
}
')
# No declared type: emit an empty resolved/label pair.
if [ -z "$raw_type" ]; then
printf '\t'
return 0
fi
local base="${raw_type%% *}"
local resolved label
case "$base" in
int | real | bool | date | line | url | email | uuid | regexp | range)
# Concrete inline type.
resolved="$raw_type"
# Hide the regexp pattern in the label; show just the keyword.
label="$base"
;;
*)
# Named alias: resolve it (one typedef lookup, only when needed).
local typedef
typedef=$(resolve_typedef "$block" "$base")
if [ -n "$typedef" ]; then
resolved="$typedef" # validate against the underlying type
label="$base" # but display the friendly alias name
else
# Unknown alias: fall back to raw for both.
resolved="$raw_type"
label="$raw_type"
fi
;;
esac
printf '%s\t%s' "$resolved" "$label"
}
#+end_src
* Splitting the Result at the Call Sites
Use a literal tab in the parameter expansion. To keep it readable and avoid embedding a raw tab in the source, define a tab once near the top of the script:
#+begin_src bash
# Near the top of the script, after the shebang/comments:
TAB=$(printf '\t')
#+end_src
Then each call site becomes:
#+begin_src bash
local info ftype label
info=$(get_field_type_info "$block" "$f")
ftype="${info%%"$TAB"*}" # everything before the tab -> resolved type
label="${info#*"$TAB"}" # everything after the tab -> display label
local val
val=$(prompt_field "$f" "$mand" "$ftype" "$label")
#+end_src
Apply the identical three-line split in all three loops (allowed, mandatory, free-form), substituting =$f= / =$newfield= as appropriate.
#+begin_quote
/Note on quoting:/ =${info%%"$TAB"*}= quotes =$TAB= so the tab is treated literally rather than as a glob pattern. This is correct in bash 3.2. Since both halves always exist (the helper always prints exactly one tab), =%%= and =#= behave predictably even when one side is empty.
#+end_quote
* Retiring the Now-Redundant Helpers
With =get_field_type_info= in place, you can /delete/ both:
- =get_field_type=
- =display_type_label= (the earlier proposal)
…and you no longer need a standalone =get_field_raw_type=, since its logic now lives inline in the single pass. =resolve_typedef= stays — it's still called (lazily, only for actual aliases).
* Why This Is Better
| Aspect | Before | After |
|--------+--------+-------|
| =awk= passes over block (per field) | 2 | 1 |
| =resolve_typedef= calls (per field) | up to 2 | at most 1, only for aliases |
| Source of truth for "raw type" | duplicated in 2 functions | single location |
| Risk of label/type drifting out of sync | real | eliminated |
The validation behavior is unchanged — =phonenum= still validates against its underlying regex — while the prompt shows the clean =(type: phonenum)= label, and we've halved the parsing work per field.
* One Caveat Worth Flagging
If a descriptor block could ever legitimately contain a /tab character/ inside a =%type:= line, this delimiter scheme would break. In standard recutils files that doesn't happen (type tokens are space-separated), so a tab is safe. If you wanted to be bulletproof, you could split on a control character like =$'\x1f'= (ASCII Unit Separator) instead — same mechanism, even less chance of collision. For this script, tab is perfectly adequate.
*** Finally, help me fix the $?= fragility at the end of =main=.
- Fixing the =$?= Capture at the End of =main=
The current end-of-=main= pattern relies on =$?= surviving from one statement to the next, which works here but is fragile — any future edit that slips a command between the =recins= call and the capture will silently break the error handling. Folding the call directly into the =if= condition removes that footgun entirely.
- The Current Code
#+begin_src bash
# ---- Build and run recins ---------------------------------------------
echo "" >&2
echo "Inserting record..." >&2
if [ -n "$rectype" ]; then
recins -t "$rectype" "${args[@]}" "$recfile"
else
recins "${args[@]}" "$recfile"
fi
local status=$?
if [ "$status" -eq 0 ]; then
echo "Record inserted successfully into '$recfile'." >&2
else
echo "recins failed (exit code $status)." >&2
exit "$status"
fi
#+end_src
The =$?= here actually reflects the =if=/=fi= block's exit status (which is the exit status of whichever =recins= branch ran), so it /works/ — but only by happy coincidence of nothing intervening.
- The Fix
Capture the status immediately after each branch, so it's bound tightly to the command that produced it:
#+begin_src bash
# ---- Build and run recins ---------------------------------------------
echo "" >&2
echo "Inserting record..." >&2
local status
if [ -n "$rectype" ]; then
recins -t "$rectype" "${args[@]}" "$recfile"
status=$?
else
recins "${args[@]}" "$recfile"
status=$?
fi
if [ "$status" -eq 0 ]; then
echo "Record inserted successfully into '$recfile'." >&2
else
echo "recins failed (exit code $status)." >&2
exit "$status"
fi
#+end_src
- Why This Is Better
The =status=$?= now sits on the line /immediately/ following each =recins= invocation, inside the same branch. There is no possibility of an intervening command — and crucially, the intent is now self-evident to anyone editing later: "this captures the exit code of /this/ command."
#+begin_quote
/Note:/ =local status= is declared once before the =if= so the variable has function scope, but the /assignment/ happens inside each branch right after the relevant call. Splitting the declaration from the assignment also avoids the classic bash pitfall where =local status=$(...)= would mask the command's exit code — though that doesn't apply to a plain =$?= capture, it's a good habit to keep declaration and value-capture separate when =$?= matters.
#+end_quote
- An Even Cleaner Alternative
If you'd rather avoid the explicit =status= variable altogether, you can branch on =recins= directly. This requires duplicating the success/failure messaging slightly, or factoring it out:
#+begin_src bash
# ---- Build and run recins ---------------------------------------------
echo "" >&2
echo "Inserting record..." >&2
run_recins() {
if [ -n "$rectype" ]; then
recins -t "$rectype" "${args[@]}" "$recfile"
else
recins "${args[@]}" "$recfile"
fi
}
if run_recins; then
echo "Record inserted successfully into '$recfile'." >&2
else
status=$?
echo "recins failed (exit code $status)." >&2
exit "$status"
fi
#+end_src
Here the =if= consumes =recins='s exit status directly, and =$?= inside the =else= branch correctly reflects the failed command (the =else= branch runs /because/ the condition was non-zero, and =$?= still holds that value at the first line of the branch).
#+begin_quote
/Caveat for the alternative:/ =run_recins= would need access to =rectype=, =args=, and =recfile=. Since these are =local= to =main=, defining =run_recins= /inside/ =main= (as shown) keeps them in scope. If you'd prefer not to nest a function definition inside =main=, stick with the first fix.
#+end_quote
- Recommendation
Go with the /first fix/. It's the smallest possible change, requires no nested function or scope juggling, reads clearly, and directly addresses the fragility you identified. The =status=$?= immediately after each call is the idiomatic, defensive pattern for capturing exit codes in bash.
***