> ## Documentation Index
> Fetch the complete documentation index at: https://docs.case.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Get OCR word bounding boxes

> Retrieves word-level OCR bounding box data for a processed PDF document. Each word includes its text, normalized bounding box coordinates (0-1 range), confidence score, and global word index. Use this data to highlight specific text ranges in a PDF viewer based on word indices from search results.


## OpenAPI

````yaml /openapi.json get /vault/{id}/objects/{objectId}/ocr-words
openapi: 3.1.0
info:
  title: Case.dev API
  description: >-
    The AI-native platform for legal technology. Build smarter legal
    applications with our suite of AI-powered APIs.
  version: 1.0.0
  contact:
    name: Case.dev Support
    email: support@casemark.com
    url: https://case.dev
  license:
    name: Proprietary
    url: https://case.dev/terms
servers:
  - url: https://api.case.dev
    description: Production
security:
  - bearerAuth: []
tags:
  - name: Vaults
    description: Secure document storage with semantic search and GraphRAG
  - name: Memory
    description: >-
      Persistent memory for AI agents with semantic search and 12 generic
      indexed tag fields
  - name: OCR
    description: Extract text from PDFs, images, and scanned documents
  - name: Voice
    description: Audio transcription and text-to-speech
  - name: LLMs
    description: Access 40+ language models through a unified API
  - name: Search
    description: Web search, AI answers, and deep research
  - name: Mail
    description: Managed inboxes for agent email workflows
  - name: Media
    description: Transcript retrieval and captioned media clip generation
  - name: Legal
    description: Legal research tools including citation verification
  - name: Privilege
    description: Privilege detection for e-discovery and litigation workflows
  - name: Compute
    description: Serverless GPU and CPU infrastructure
  - name: Format
    description: Document formatting and template rendering (MD/JSON to PDF/DOCX)
  - name: SuperDoc
    description: Document conversion and template automation
  - name: Webhooks
    description: Webhook endpoint management
  - name: System
    description: Public system metadata and discovery endpoints
  - name: Usage
    description: Usage reporting and webhook subscriptions
  - name: Database
    description: Serverless PostgreSQL databases with instant branching
  - name: Translation
    description: Language detection and translation for multilingual legal workflows
  - name: Skills
    description: Search and read legal AI skills for agents
  - name: Agents
    description: >-
      Create, manage, and execute AI agents with tool access, sandbox
      environments, and async run workflows
  - name: Matters
    description: Matter-native legal workspaces and orchestration primitives
  - name: Applications Projects
    description: Web application project management
  - name: Applications Deployments
    description: Web application deployment management
  - name: Applications Domains
    description: Custom domain configuration for applications
  - name: Applications Env Vars
    description: Environment variable management for applications
paths:
  /vault/{id}/objects/{objectId}/ocr-words:
    get:
      tags:
        - Vaults
      summary: Get OCR word bounding boxes
      description: >-
        Retrieves word-level OCR bounding box data for a processed PDF document.
        Each word includes its text, normalized bounding box coordinates (0-1
        range), confidence score, and global word index. Use this data to
        highlight specific text ranges in a PDF viewer based on word indices
        from search results.
      operationId: getVaultObjectOcrWords
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
          description: The vault ID
        - name: objectId
          in: path
          required: true
          schema:
            type: string
          description: The object ID
        - name: page
          in: query
          required: false
          schema:
            type: integer
          description: >-
            Filter to a specific page number (1-indexed). If omitted, returns
            all pages.
        - name: wordStart
          in: query
          required: false
          schema:
            type: integer
          description: >-
            Filter to words starting at this index (inclusive). Useful for
            retrieving words for a specific chunk.
        - name: wordEnd
          in: query
          required: false
          schema:
            type: integer
          description: >-
            Filter to words ending at this index (inclusive). Useful for
            retrieving words for a specific chunk.
      responses:
        '200':
          description: Successfully retrieved OCR word data
          content:
            application/json:
              schema:
                type: object
                properties:
                  objectId:
                    type: string
                    description: The object ID
                  pageCount:
                    type: integer
                    description: Total number of pages in the document
                  totalWords:
                    type: integer
                    description: Total number of words extracted from the document
                  pages:
                    type: array
                    description: Per-page word data with bounding boxes
                    items:
                      type: object
                      properties:
                        page:
                          type: integer
                          description: Page number (1-indexed)
                        words:
                          type: array
                          items:
                            type: object
                            properties:
                              text:
                                type: string
                                description: The word text
                              bbox:
                                type: array
                                items:
                                  type: number
                                minItems: 4
                                maxItems: 4
                                description: >-
                                  Bounding box [x0, y0, x1, y1] normalized to
                                  0-1 range
                              confidence:
                                type: number
                                nullable: true
                                description: OCR confidence score (0-1)
                              wordIndex:
                                type: integer
                                description: >-
                                  Global word index across the entire document
                                  (0-based)
                  createdAt:
                    type: string
                    format: date-time
                    description: When the OCR data was extracted
              example:
                objectId: obj_abc123
                pageCount: 5
                totalWords: 2500
                pages:
                  - page: 1
                    words:
                      - text: The
                        bbox:
                          - 0.12
                          - 0.71
                          - 0.15
                          - 0.75
                        confidence: 0.98
                        wordIndex: 0
                      - text: witness
                        bbox:
                          - 0.16
                          - 0.71
                          - 0.28
                          - 0.75
                        confidence: 0.99
                        wordIndex: 1
                createdAt: '2024-01-15T10:30:00Z'
        '400':
          description: Bad request - missing parameters or object not processed yet
        '401':
          description: Unauthorized - invalid API key
        '403':
          description: Forbidden - API key lacks vault service access
        '404':
          description: >-
            Object or vault not found, or no OCR word data available (non-PDF
            document)
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API Key
      description: API key starting with `sk_case_`

````