NYC Nonprofit Search
API: initializing…

Search API

A same-origin API implemented via a Service Worker. Supports GET or POST, returns JSON with top results (default 5). Include confidence (0–1), SoftTF-IDF, and Jaro–Winkler per item.

Endpoints

  • GET /api/search?q=<query>&field=name|ein&limit=5&topk=80&tokThresh=0.85&minSoft=0&phonetic=true
  • POST /api/search with JSON body using the same fields.

Parameters

ParamTypeDefaultDescription
qstringQuery text (org name) or EIN (9 digits for field=ein).
fieldstringnamename uses SoftTF-IDF+JW; ein is exact.
limitint5Max items to return.
topkint80Candidate count from TF-IDF blocker.
tokThreshfloat0.85Min Jaro–Winkler for token matches in SoftTF-IDF.
minSoftfloat0Min SoftTF-IDF to keep (after small boosts).
phoneticbooltrueUse NYSIIS fallback when no lexical hits.

Responses

200 OK

{
  "meta": {
    "query": "alzheimer nyc",
    "field": "name",
    "limit_requested": 5,
    "limit_returned": 5,
    "candidates_considered": 80,
    "took_ms": 7,
    "thresholds": {"topk":80,"tokThresh":0.85,"minSoft":0,"phonetic":true},
    "corpus_size": 199873
  },
  "results": [
    {
      "rank": 1,
      "ein": "01XXXXXXX",
      "name": "ALZHEIMER'S ASSOCIATION GREATER NEW YORK CHAPTER",
      "city": "NEW YORK", "state": "NY", "zip": "100XX",
      "ntee": "H92",
      "assets": 12345678, "income": 2345678, "revenue": 1987654,
      "confidence": 0.964,
      "soft_tfidf": 0.951,
      "jw_full": 0.987,
      "hits": 6,
      "explain": {"numbersMatched":0,"rareTokenMatches":2}
    }
  ]
}

400 (missing q or bad params), 503 (index still warming).

Examples

# GET (name, top 5 default)
curl -G "http://localhost:8000/api/search" \
  --data-urlencode "q=habitat for humanity nyc"

# GET (ein)
curl -G "http://localhost:8000/api/search" \
  --data-urlencode "q=002022084" --data-urlencode "field=ein"

# POST (JSON)
curl -X POST "http://localhost:8000/api/search" \
  -H "Content-Type: application/json" \
  -d '{"q":"alzheimer new york","field":"name","limit":10,"topk":120,"tokThresh":0.86,"minSoft":0.1,"phonetic":true}'

Scoring & Confidence

SoftTF-IDF sums TF-IDF-weighted token matches where within-token similarity ≥ tokThresh (Jaro–Winkler). We add small domain boosts for exact numbers (“PS 118”, “Chapter 12”) and rare token matches. confidence is a calibrated blend: 0.8×SoftTF-IDF + 0.2×JW(full), clipped to [0,1]. For EIN exact matches, confidence is 1.0.