| EIN | NAME | CITY | STATE | ZIP | NTEE | RULING | ASSETS | INCOME | REVENUE | SoftTF-IDF | JW(full) | CONF | Hits |
|---|
GET /api/search?q=<query>&field=name|ein&limit=5&topk=80&tokThresh=0.85&minSoft=0&phonetic=truePOST /api/search with JSON body using the same fields.| Param | Type | Default | Description |
|---|---|---|---|
q | string | – | Query text (org name) or EIN (9 digits for field=ein). |
field | string | name | name uses SoftTF-IDF+JW; ein is exact. |
limit | int | 5 | Max items to return. |
topk | int | 80 | Candidate count from TF-IDF blocker. |
tokThresh | float | 0.85 | Min Jaro–Winkler for token matches in SoftTF-IDF. |
minSoft | float | 0 | Min SoftTF-IDF to keep (after small boosts). |
phonetic | bool | true | Use NYSIIS fallback when no lexical hits. |
200 OK
{
"meta": {
"query": "alzheimer nyc",
"field": "name",
"limit_requested": 5,
"limit_returned": 5,
"candidates_considered": 80,
"took_ms": 7,
"thresholds": {"topk":80,"tokThresh":0.85,"minSoft":0,"phonetic":true},
"corpus_size": 199873
},
"results": [
{
"rank": 1,
"ein": "01XXXXXXX",
"name": "ALZHEIMER'S ASSOCIATION GREATER NEW YORK CHAPTER",
"city": "NEW YORK", "state": "NY", "zip": "100XX",
"ntee": "H92",
"assets": 12345678, "income": 2345678, "revenue": 1987654,
"confidence": 0.964,
"soft_tfidf": 0.951,
"jw_full": 0.987,
"hits": 6,
"explain": {"numbersMatched":0,"rareTokenMatches":2}
}
]
}
400 (missing q or bad params), 503 (index still warming).
# GET (name, top 5 default)
curl -G "http://localhost:8000/api/search" \
--data-urlencode "q=habitat for humanity nyc"
# GET (ein)
curl -G "http://localhost:8000/api/search" \
--data-urlencode "q=002022084" --data-urlencode "field=ein"
# POST (JSON)
curl -X POST "http://localhost:8000/api/search" \
-H "Content-Type: application/json" \
-d '{"q":"alzheimer new york","field":"name","limit":10,"topk":120,"tokThresh":0.86,"minSoft":0.1,"phonetic":true}'
SoftTF-IDF sums TF-IDF-weighted token matches where within-token similarity ≥ tokThresh (Jaro–Winkler). We add small domain boosts for exact numbers (“PS 118”, “Chapter 12”) and rare token matches. confidence is a calibrated blend: 0.8×SoftTF-IDF + 0.2×JW(full), clipped to [0,1]. For EIN exact matches, confidence is 1.0.