Add per-request AI logging, DB batch queue, WS entity updates, and UI polish

- log_thread.py: thread-safe ContextVar bridge so executor threads can log individual LLM calls and archive searches back to the event loop - ai_log.py: init_thread_logging(), notify_entity_update(); WS now pushes entity_update messages when book data changes after any plugin or batch run - batch.py: replace batch_pending.json with batch_queue SQLite table; run_batch_consumer() reads queue dynamically so new books can be added while batch is running; add_to_queue() deduplicates - migrate.py: fix _migrate_v1 (clear-on-startup bug); add _migrate_v2 for batch_queue table - _client.py / archive.py / identification.py: wrap each LLM API call and archive search with log_thread start/finish entries - api.py: POST /api/batch returns {already_running, added}; notify_entity_update after identify pipeline - models.default.yaml: strengthen ai_identify confidence-scoring instructions; warn against placeholder data - detail-render.js: book log entries show clickable ID + spine thumbnail; book spine/title images open full-screen popup - events.js: batch-start handles already_running+added; open-img-popup action - init.js: entity_update WS handler; image popup close listeners - overlays.css / index.html: full-screen image popup overlay - eslint.config.js: add new globals; fix no-redeclare/no-unused-vars for multi-file global architecture; all lint errors resolved Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 12:10:54 +03:00
parent fd32be729f
commit b94f222c96
41 changed files with 2566 additions and 586 deletions
--- a/config/functions.default.yaml
+++ b/config/functions.default.yaml
@@ -30,14 +30,16 @@ functions:
      rate_limit_seconds: 0
      timeout: 30

-  # ── Book identification: raw_text → {title, author, year, isbn, publisher, confidence}
+  # ── Book identification: VLM result + archive results → ranked identification blocks
+  # is_vlm: true means the model also receives the book's spine and title-page images.
  book_identifiers:
    identify:
      model: ai_identify
      confidence_threshold: 0.8
      auto_queue: false
      rate_limit_seconds: 0
-      timeout: 30
+      timeout: 60
+      is_vlm: true

  # ── Archive searchers: query → [{source, title, author, year, isbn, publisher}, ...]
  archive_searchers:
--- a/config/models.default.yaml
+++ b/config/models.default.yaml
@@ -42,9 +42,33 @@ models:
    credentials: openrouter
    model: "google/gemini-flash-1.5"
    prompt: |
-      # ${RAW_TEXT}      — text read from the book spine (multi-line)
-      # ${OUTPUT_FORMAT} — JSON schema injected by BookIdentifierPlugin
-      The following text was read from a book spine:
+      # ${RAW_TEXT}           — text read from the book spine (multi-line)
+      # ${ARCHIVE_RESULTS}    — JSON array of candidate records from library archives
+      # ${OUTPUT_FORMAT}      — JSON schema injected by BookIdentifierPlugin
+      Text read from the book spine:
      ${RAW_TEXT}
-      Identify this book. Search for it if needed. Return ONLY valid JSON, no explanation:
+
+      Archive search results (may be empty):
+      ${ARCHIVE_RESULTS}
+
+      Your task:
+      1. Search the web for this book if needed to find additional information.
+      2. Combine the spine text, archive results, and your web search into identification candidates.
+      3. Collapse candidates that are clearly the same book (same title + author + year + publisher) into one entry, listing all contributing sources.
+      4. Rank candidates by confidence (highest first). Assign a score 0.0-1.0.
+      5. Remove any candidates you believe are irrelevant or clearly wrong.
+
+      IMPORTANT — confidence scoring rules:
+      - The score must reflect how well the found information matches the spine text and recognized data.
+      - If the only available evidence is a title with no author, year, publisher, or corroborating archive results, the score must not exceed 0.5.
+      - Base confidence on: quality of spine text match, number of matching fields, archive result corroboration, and completeness of the identified record.
+      - A record with title + author + year that appears in multiple archive sources warrants a high score; a record with only a guessed title warrants a low score.
+
+      IMPORTANT — output format rules:
+      - The JSON schema below is a format specification only. Do NOT use it as a source of example data.
+      - Do NOT return placeholder values such as "The Great Gatsby", "Unknown Author", "Example Publisher", or any other generic example text unless that exact text literally appears on the spine.
+      - Return only real books that could plausibly match what is shown on this spine.
+      - If you cannot identify the book with reasonable confidence, return an empty array [].
+
+      Return ONLY valid JSON matching the schema below, no explanation:
      ${OUTPUT_FORMAT}
--- a/config/ui.default.yaml
+++ b/config/ui.default.yaml
@@ -1,3 +1,5 @@
 # UI settings. Override in ui.user.yaml.
 ui:
-  boundary_grab_px: 14  # pixel grab threshold for dragging boundary lines
+  boundary_grab_px: 14   # pixel grab threshold for dragging boundary lines
+  spine_padding_pct: 0.30  # extra fraction of book width added on each side of spine crop
+  ai_log_max_entries: 100  # max AI request log entries kept in memory