Python API¶
NBER-CLI exposes the same core functionality as importable Python functions. The API is asynchronous because lookup, search, and downloads perform network I/O.
Feed cache helpers are synchronous because they perform local SQLite work and a synchronous RSS fetch.
API Boundaries¶
NBER-CLI has three layers, and the stability contract is different for each:
- Top-level public API: the names listed in
nber_cli.__all__. Importing them from thenber_clipackage is the supported way to use the package. This is the only layer with a stability promise. Removing or renaming a name in__all__is treated as a breaking change. - Module-level helpers: non-underscore names defined in modules such as
nber_cli.formatters,nber_cli.fetcher,nber_cli.config_store, andnber_cli.cli. These are usable directly and useful for advanced callers, but they are not part of the public top-level contract and may change between minor versions. - Compatibility wrappers: a few names exist only for backward compatibility with callers written against earlier versions. The
feedmodule re-exportsinit_feed_database,migrate_feed_database, andget_feed_database_path, which now forward to the database layer. They are kept for one minor release after their replacement ships and may be removed later.
__all__ is the source of truth for what is "officially exported" from the package. If a name is not in __all__ and not documented as a module-level helper, treat it as private even if it is not underscore-prefixed.
Install¶
For a script outside a project:
Search Papers¶
import asyncio
from nber_cli import search_nber, search_results
async def main() -> None:
results = await search_nber("labor economics", per_page=20)
payload = search_results(results)
print(payload["total_results"])
for paper in payload["results"]:
print(paper["id"], paper["title"])
asyncio.run(main())
Fetch Paper Metadata¶
import asyncio
from nber_cli import get_nber, info
async def main() -> None:
paper = await get_nber(25000)
payload = info(paper)
print(payload["title"])
print(payload["abstract"])
asyncio.run(main())
Download a PDF¶
import asyncio
from pathlib import Path
from nber_cli import download_paper, download_paper_to_file
async def main() -> None:
await download_paper("w34567", Path("papers"))
await download_paper_to_file("w25000", Path("papers/w25000.pdf"))
asyncio.run(main())
Download Multiple Papers¶
import asyncio
from pathlib import Path
from nber_cli import download_multiple_papers
async def main() -> None:
result = await download_multiple_papers(
["w34567", "w25000", "w32000"],
Path("papers"),
)
print(f"Downloaded {len(result.paths)} papers")
for failure in result.failures:
print(f"Failed: {failure.paper_id} - {failure.error}")
asyncio.run(main())
Work with the Local Database¶
Initialize the default database:
Fetch the NBER RSS feed and display new cached items:
from nber_cli import feed_results, fetch_feed
result = fetch_feed()
payload = feed_results(result)
print(payload["new_count"])
for item in payload["results"]:
print(item["id"], item["title"])
Display all fetched RSS items and limit the returned items:
Move the database:
from pathlib import Path
from nber_cli import migrate_database
old_path, new_path = migrate_database(Path("~/data/nber.db"))
Clean cached feed database records:
from nber_cli import clean_feed_cache
preview = clean_feed_cache(days=30, dry_run=True)
print(preview.matched_count)
result = clean_feed_cache(days=30)
print(result.deleted_count)
clean_feed_cache deletes local cache records only. If deleted records still appear in the RSS feed, a later fetch_feed call may return them as new items again.
Read or write the paper metadata cache directly:
from nber_cli import read_info_cache, write_info_cache
from nber_cli import get_nber
import asyncio
paper = read_info_cache(None, "w25000")
if paper is None:
paper = asyncio.run(get_nber(25000))
write_info_cache(None, paper)
print(paper.title)
Read the paper metadata cache through the high-level helper, which respects the user-config TTL and the global cache toggle:
import asyncio
from nber_cli.info_cache import get_paper_with_info_cache_result
async def main() -> None:
result = await get_paper_with_info_cache_result(25000)
if result.from_cache:
print("Served from local info cache")
print(result.paper.title)
asyncio.run(main())
refresh=True skips the cache lookup and re-fetches from NBER before optionally writing back:
import asyncio
from nber_cli.info_cache import get_paper_with_info_cache_result
async def main() -> None:
result = await get_paper_with_info_cache_result(25000, refresh=True)
print(result.paper.title)
asyncio.run(main())
Manage the user config (~/.nber-cli/config.json):
from nber_cli import (
get_info_cache_settings,
set_info_cache_enabled,
set_info_cache_ttl_days,
)
print(get_info_cache_settings())
set_info_cache_enabled(False)
set_info_cache_ttl_days(7)
Clean cached paper metadata:
from nber_cli import clear_info_cache, count_info_cache
print(f"Cached rows: {count_info_cache()}")
preview = clear_info_cache(days=30, dry_run=True)
print(f"Matched: {preview.matched_count}")
result = clear_info_cache(days=30)
print(f"Deleted: {result.deleted_count}")
The same clear_info_cache function also supports delete_all=True and start_date / end_date filters that mirror clean_feed_cache.
Database and Logging Helpers¶
These helpers are part of the top-level public API and are safe to call from user code. They wrap the SQLite layer that info, search, download, and feed use internally. Logging and cache writers fail soft: when a database error occurs, the helper prints a one-line warning to stderr and returns None (for recorders) instead of raising, so they do not break the calling command. Cache readers return None or 0 on partial database errors. Callers that need stronger guarantees should talk to SQLite directly.
get_database_path(db_path=None) -> Path¶
Return the resolved SQLite database path. When db_path is None, NBER-CLI uses the path configured in ~/.nber-cli/config.json, or falls back to the default ~/.nber-cli/nber.db, or the legacy ~/.nber-cli/feed.db file when present. The returned path is always absolute. The database file is not required to exist.
get_schema_version(db_path=None) -> int¶
Return the current PRAGMA user_version of the database. Returns 0 when the file does not exist. The package sets the user version to 2 after init_database or an automatic v1-to-v2 upgrade.
record_query(db_path, keyword, conditions, result_count)¶
Append a row to query_log. conditions is a JSON-serialisable dict describing the filters actually applied to the call. Failures (for example a read-only filesystem or a corrupted database) print warning: failed to record_query: ... to stderr and return without raising.
record_download(db_path, paper_id, status, saved_path=None, error=None)¶
Append a row to download_log. status is typically "success" or "failed". Failures are swallowed and printed to stderr; the calling download command still exits with its normal code.
record_info(db_path, paper_id)¶
Append a row to info_log for the looked-up paper. paper_id may be an int or a string with or without the w prefix. Failures are swallowed and reported to stderr only.
is_info_cache_enabled() -> bool¶
Return the current global info_cache toggle, as configured in ~/.nber-cli/config.json.
get_info_cache_ttl_days() -> int¶
Return the current info_cache refresh interval in days, as configured in ~/.nber-cli/config.json. Defaults to 30 when the field is missing or non-positive.
is_info_cache_expired(last_fetched_at, ttl_days=None, *, now=None) -> bool¶
Return True when the timestamp string last_fetched_at is older than ttl_days (or the configured TTL when ttl_days is None). ttl_days <= 0 is treated as "always expired". Malformed timestamps are treated as expired. now is for testing and accepts a datetime.
touch_info_cache(db_path, paper_id)¶
Update the info_cache row for paper_id: set last_fetched_at to the current UTC time and increment fetch_count. Because this is invoked on every cache hit, the TTL check uses the touch time, not the original write time. This is what makes the cache a sliding TTL rather than a fixed window from the first write. The function is a no-op when the row does not exist or the database is missing. Errors are logged to stderr and otherwise ignored.
parse_feed_xml(xml_text) -> list[NBERFeedItem]¶
Parse raw NBER RSS XML into a list of NBERFeedItem objects. Items must carry a link or guid that matches r"/papers/(w\d+)"; items without a paper ID are skipped. The parser repairs unescaped < characters followed by whitespace or a digit only inside title and description text, then retries strict parsing. Other malformed XML raises a ValueError beginning with "invalid NBER RSS XML" and includes the line and column when available. The function performs no network I/O and never touches the database; feed.fetch_feed wraps it to persist items and write a feed_fetches summary.
Data Models¶
NBER¶
| Field | Type | Description |
|---|---|---|
paper_id |
int |
Numeric paper ID. |
title |
str |
Paper title. |
authors |
list[str] |
Author names. |
date |
str |
Publication date as exposed by NBER. |
abstract |
str |
Paper abstract. |
url |
str or None |
NBER paper URL when available. |
published_version |
str or None |
Published-version text when available. |
topic |
str or None |
Topic metadata when available. |
programs |
str or None |
Program metadata when available. |
NBERSearchResults¶
| Field | Type | Description |
|---|---|---|
query |
str |
Original query. |
total_results |
int |
NBER result count. |
results |
list[NBER] |
Papers on the current page. |
page |
int |
Current page. |
per_page |
int |
Results per page. |
start_date |
str or None |
Applied start date. |
end_date |
str or None |
Applied end date. |
NBERFeedItem¶
| Field | Type | Description |
|---|---|---|
paper_id |
str |
Paper ID, for example w35254. |
title |
str |
Paper title parsed from the RSS item. |
authors |
list[str] |
Author names parsed from the RSS title. |
abstract |
str |
RSS item description. |
url |
str |
Canonical NBER paper URL without the RSS fragment. |
source_url |
str |
Original RSS item URL. |
guid |
str |
RSS item GUID. |
NBERFeedFetchResult¶
| Field | Type | Description |
|---|---|---|
source_url |
str |
RSS feed URL. |
database_path |
Path |
SQLite cache database path. |
total_fetched |
int |
Number of RSS items fetched. |
new_count |
int |
Number of fetched items that were not already in the cache. |
display_all |
bool |
Whether returned items include all fetched items. |
items |
list[NBERFeedItem] |
Items selected for display or structured output. |
max_items |
int or None |
Display limit when provided. |
NBERFeedCleanResult¶
| Field | Type | Description |
|---|---|---|
database_path |
Path |
SQLite cache database path. |
matched_count |
int |
Number of cache records matching the clean criteria. |
deleted_count |
int |
Number of cache records deleted. |
mode |
str |
Clean mode: days, all, or date-range. |
days |
int or None |
Day threshold for days mode. |
start_date |
str or None |
Inclusive start date for date-range mode. |
end_date |
str or None |
Inclusive end date for date-range mode. |
dry_run |
bool |
Whether the operation only counted matching records. |
NBERInfoCacheClearResult¶
| Field | Type | Description |
|---|---|---|
database_path |
Path |
SQLite cache database path. |
matched_count |
int |
Number of cache records matching the clean criteria. |
deleted_count |
int |
Number of cache records deleted. |
mode |
str |
Clean mode: days, all, or date-range. |
days |
int or None |
Day threshold for days mode. |
start_date |
str or None |
Inclusive start date for date-range mode. |
end_date |
str or None |
Inclusive end date for date-range mode. |
dry_run |
bool |
Whether the operation only counted matching records. |
InfoCacheSettings¶
| Field | Type | Description |
|---|---|---|
cache_enabled |
bool |
Global toggle for the info_cache lookup. |
cache_ttl_days |
int |
Cache refresh interval in days. |
InfoCacheLookupResult¶
| Field | Type | Description |
|---|---|---|
paper |
NBER |
The paper returned by the lookup. |
from_cache |
bool |
True when the paper was served from the local info_cache. |
DownloadFailure¶
| Field | Type | Description |
|---|---|---|
paper_id |
str |
The paper ID that failed to download. |
error |
BaseException |
The exception raised during the download attempt. |
DownloadBatchResult¶
| Field | Type | Description |
|---|---|---|
paths |
list[Path] |
Paths of successfully downloaded PDFs. |
failures |
list[DownloadFailure] |
Failed downloads with their errors. |
Formatter Helpers¶
Use formatter helpers when you want stable dictionaries for JSON output or MCP-style responses:
info(paper)returns core metadata.related(paper)returns related optional fields.search_results(results)returns a structured search payload.feed_results(result)returns a structured feed fetch payload.
For human-readable text output, use the text formatters from nber_cli.formatters:
info_text(paper, include_all=False)returns a formatted text string with paper details. Setinclude_all=Trueto include topic, programs, and published version.search_results_text(results)returns a formatted text string with search results.feed_results_text(result)returns a formatted text string with feed items.
JSON Output Structures¶
--format json for info, search, and feed fetch produces the same dictionaries that the matching *_results formatters build. The JSON payload is always written to stdout, while the cache hit hint (for info) and any error message are written to stderr. This split lets scripts capture the payload with > redirection or pipes without picking up hint or error text.
info --format json¶
Produced by info(paper) plus, when --all is set, related(paper) and a conditional published_version field:
| Field | Type | Always present | Notes |
|---|---|---|---|
id |
str |
yes | paper_id formatted as wNNNN. |
title |
str |
yes | Empty string when NBER does not expose one. |
authors |
list[str] |
yes | Empty list when NBER does not expose any. |
date |
str |
yes | Publication date as exposed by NBER; may be empty. |
abstract |
str |
yes | Empty string when NBER does not expose an abstract. |
url |
str |
no | Present only when the paper has a non-empty NBER URL. |
topic |
str |
only with --all |
None-able; emitted as null when unknown. |
programs |
str |
only with --all |
None-able; emitted as null when unknown. |
published_version |
str |
only with --all and truthy |
Omitted entirely when NBER does not expose one. |
search --format json¶
Produced by search_results(results):
| Field | Type | Always present | Notes |
|---|---|---|---|
query |
str |
yes | The original query. |
total_results |
int |
yes | NBER-reported total. |
page |
int |
yes | Current page. |
per_page |
int |
yes | Page size, one of 20, 50, 100. |
start_date |
str |
no | Present only when the call applied a start date. |
end_date |
str |
no | Present only when the call applied an end date. |
results |
list[object] |
yes | Per-paper dictionaries with the same fields as search_result(paper). |
Each entry in results carries id, title, authors, date, abstract, and url. Unlike info, url is always emitted (possibly empty).
feed fetch --format json¶
Produced by feed_results(result):
| Field | Type | Notes |
|---|---|---|
source_url |
str |
The RSS feed URL that was fetched. |
database_path |
str |
Absolute path of the SQLite database the items were written to. |
total_fetched |
int |
Total items parsed from the feed. |
new_count |
int |
Items that were not already in the local cache. |
display_all |
bool |
true when results includes all fetched items, false when limited to new ones. |
max_items |
int or null |
The cap from --max-items when provided. |
displayed_count |
int |
Number of items actually included in results. |
results |
list[object] |
Per-item dictionaries: id, title, authors, abstract, url, source_url, guid. |
Compatibility Notes¶
The JSON structures are the published output contract used by both the CLI and the MCP tools. Additive fields (new optional keys) may appear in a minor version. Renaming or removing an existing key, or changing the type of an existing field, is treated as a breaking change. Scripts that consume --format json should treat unknown keys as ignored data rather than asserting on the full key set.