Python API¶

NBER-CLI exposes the same core functionality as importable Python functions. The API is asynchronous because lookup, search, and downloads perform network I/O.

Feed cache helpers are synchronous because they perform local SQLite work and a synchronous RSS fetch.

API Boundaries¶

NBER-CLI has three layers, and the stability contract is different for each:

Top-level public API: the names listed in nber_cli.__all__. Importing them from the nber_cli package is the supported way to use the package. This is the only layer with a stability promise. Removing or renaming a name in __all__ is treated as a breaking change.
Module-level helpers: non-underscore names defined in modules such as nber_cli.formatters, nber_cli.fetcher, nber_cli.config_store, and nber_cli.cli. These are usable directly and useful for advanced callers, but they are not part of the public top-level contract and may change between minor versions.
Compatibility wrappers: a few names exist only for backward compatibility with callers written against earlier versions. The feed module re-exports init_feed_database, migrate_feed_database, and get_feed_database_path, which now forward to the database layer. They are kept for one minor release after their replacement ships and may be removed later.

__all__ is the source of truth for what is "officially exported" from the package. If a name is not in __all__ and not documented as a module-level helper, treat it as private even if it is not underscore-prefixed.

Install¶

uv add nber-cli

For a script outside a project:

uvx --from nber-cli python your_script.py

Search Papers¶

import asyncio

from nber_cli import search_nber, search_results


async def main() -> None:
    results = await search_nber("labor economics", per_page=20)
    payload = search_results(results)
    print(payload["total_results"])
    for paper in payload["results"]:
        print(paper["id"], paper["title"])


asyncio.run(main())

Fetch Paper Metadata¶

import asyncio

from nber_cli import get_nber, info


async def main() -> None:
    paper = await get_nber(25000)
    payload = info(paper)
    print(payload["title"])
    print(payload["abstract"])


asyncio.run(main())

Download a PDF¶

import asyncio
from pathlib import Path

from nber_cli import download_paper, download_paper_to_file


async def main() -> None:
    await download_paper("w34567", Path("papers"))
    await download_paper_to_file("w25000", Path("papers/w25000.pdf"))


asyncio.run(main())

Download Multiple Papers¶

import asyncio
from pathlib import Path

from nber_cli import download_multiple_papers


async def main() -> None:
    result = await download_multiple_papers(
        ["w34567", "w25000", "w32000"],
        Path("papers"),
    )
    print(f"Downloaded {len(result.paths)} papers")
    for failure in result.failures:
        print(f"Failed: {failure.paper_id} - {failure.error}")


asyncio.run(main())

Work with the Local Database¶

Initialize the default database:

from nber_cli import init_database

db_path = init_database()
print(db_path)

Fetch the NBER RSS feed and display new cached items:

from nber_cli import feed_results, fetch_feed

result = fetch_feed()
payload = feed_results(result)
print(payload["new_count"])
for item in payload["results"]:
    print(item["id"], item["title"])

Display all fetched RSS items and limit the returned items:

from nber_cli import fetch_feed

result = fetch_feed(display_all=True, max_items=5)

Move the database:

from pathlib import Path

from nber_cli import migrate_database

old_path, new_path = migrate_database(Path("~/data/nber.db"))

Clean cached feed database records:

from nber_cli import clean_feed_cache

preview = clean_feed_cache(days=30, dry_run=True)
print(preview.matched_count)

result = clean_feed_cache(days=30)
print(result.deleted_count)

clean_feed_cache deletes local cache records only. If deleted records still appear in the RSS feed, a later fetch_feed call may return them as new items again.

Read or write the paper metadata cache directly:

from nber_cli import read_info_cache, write_info_cache
from nber_cli import get_nber
import asyncio

paper = read_info_cache(None, "w25000")
if paper is None:
    paper = asyncio.run(get_nber(25000))
    write_info_cache(None, paper)
print(paper.title)

Read the paper metadata cache through the high-level helper, which respects the user-config TTL and the global cache toggle:

import asyncio

from nber_cli.info_cache import get_paper_with_info_cache_result


async def main() -> None:
    result = await get_paper_with_info_cache_result(25000)
    if result.from_cache:
        print("Served from local info cache")
    print(result.paper.title)


asyncio.run(main())

refresh=True skips the cache lookup and re-fetches from NBER before optionally writing back:

import asyncio

from nber_cli.info_cache import get_paper_with_info_cache_result


async def main() -> None:
    result = await get_paper_with_info_cache_result(25000, refresh=True)
    print(result.paper.title)


asyncio.run(main())

Manage the user config (~/.nber-cli/config.json):

from nber_cli import (
    get_info_cache_settings,
    set_info_cache_enabled,
    set_info_cache_ttl_days,
)

print(get_info_cache_settings())
set_info_cache_enabled(False)
set_info_cache_ttl_days(7)

Clean cached paper metadata:

from nber_cli import clear_info_cache, count_info_cache

print(f"Cached rows: {count_info_cache()}")

preview = clear_info_cache(days=30, dry_run=True)
print(f"Matched: {preview.matched_count}")

result = clear_info_cache(days=30)
print(f"Deleted: {result.deleted_count}")

The same clear_info_cache function also supports delete_all=True and start_date / end_date filters that mirror clean_feed_cache.

Database and Logging Helpers¶

These helpers are part of the top-level public API and are safe to call from user code. They wrap the SQLite layer that info, search, download, and feed use internally. Logging and cache writers fail soft: when a database error occurs, the helper prints a one-line warning to stderr and returns None (for recorders) instead of raising, so they do not break the calling command. Cache readers return None or 0 on partial database errors. Callers that need stronger guarantees should talk to SQLite directly.

`get_database_path(db_path=None) -> Path`¶

Return the resolved SQLite database path. When db_path is None, NBER-CLI uses the path configured in ~/.nber-cli/config.json, or falls back to the default ~/.nber-cli/nber.db, or the legacy ~/.nber-cli/feed.db file when present. The returned path is always absolute. The database file is not required to exist.

`get_schema_version(db_path=None) -> int`¶

Return the current PRAGMA user_version of the database. Returns 0 when the file does not exist. The package sets the user version to 2 after init_database or an automatic v1-to-v2 upgrade.

`record_query(db_path, keyword, conditions, result_count)`¶

Append a row to query_log. conditions is a JSON-serialisable dict describing the filters actually applied to the call. Failures (for example a read-only filesystem or a corrupted database) print warning: failed to record_query: ... to stderr and return without raising.

`record_download(db_path, paper_id, status, saved_path=None, error=None)`¶

Append a row to download_log. status is typically "success" or "failed". Failures are swallowed and printed to stderr; the calling download command still exits with its normal code.

`record_info(db_path, paper_id)`¶

Append a row to info_log for the looked-up paper. paper_id may be an int or a string with or without the w prefix. Failures are swallowed and reported to stderr only.

`is_info_cache_enabled() -> bool`¶

Return the current global info_cache toggle, as configured in ~/.nber-cli/config.json.

`get_info_cache_ttl_days() -> int`¶

Return the current info_cache refresh interval in days, as configured in ~/.nber-cli/config.json. Defaults to 30 when the field is missing or non-positive.

`is_info_cache_expired(last_fetched_at, ttl_days=None, *, now=None) -> bool`¶

Return True when the timestamp string last_fetched_at is older than ttl_days (or the configured TTL when ttl_days is None). ttl_days <= 0 is treated as "always expired". Malformed timestamps are treated as expired. now is for testing and accepts a datetime.

`touch_info_cache(db_path, paper_id)`¶

Update the info_cache row for paper_id: set last_fetched_at to the current UTC time and increment fetch_count. Because this is invoked on every cache hit, the TTL check uses the touch time, not the original write time. This is what makes the cache a sliding TTL rather than a fixed window from the first write. The function is a no-op when the row does not exist or the database is missing. Errors are logged to stderr and otherwise ignored.

`parse_feed_xml(xml_text) -> list[NBERFeedItem]`¶

Parse raw NBER RSS XML into a list of NBERFeedItem objects. Items must carry a link or guid that matches r"/papers/(w\d+)"; items without a paper ID are skipped. The parser repairs unescaped < characters followed by whitespace or a digit only inside title and description text, then retries strict parsing. Other malformed XML raises a ValueError beginning with "invalid NBER RSS XML" and includes the line and column when available. The function performs no network I/O and never touches the database; feed.fetch_feed wraps it to persist items and write a feed_fetches summary.

Data Models¶

NBER¶

Field	Type	Description
`paper_id`	`int`	Numeric paper ID.
`title`	`str`	Paper title.
`authors`	`list[str]`	Author names.
`date`	`str`	Publication date as exposed by NBER.
`abstract`	`str`	Paper abstract.
`url`	`str` or `None`	NBER paper URL when available.
`published_version`	`str` or `None`	Published-version text when available.
`topic`	`str` or `None`	Topic metadata when available.
`programs`	`str` or `None`	Program metadata when available.

NBERSearchResults¶

Field	Type	Description
`query`	`str`	Original query.
`total_results`	`int`	NBER result count.
`results`	`list[NBER]`	Papers on the current page.
`page`	`int`	Current page.
`per_page`	`int`	Results per page.
`start_date`	`str` or `None`	Applied start date.
`end_date`	`str` or `None`	Applied end date.

NBERFeedItem¶

Field	Type	Description
`paper_id`	`str`	Paper ID, for example `w35254`.
`title`	`str`	Paper title parsed from the RSS item.
`authors`	`list[str]`	Author names parsed from the RSS title.
`abstract`	`str`	RSS item description.
`url`	`str`	Canonical NBER paper URL without the RSS fragment.
`source_url`	`str`	Original RSS item URL.
`guid`	`str`	RSS item GUID.

NBERFeedFetchResult¶

Field	Type	Description
`source_url`	`str`	RSS feed URL.
`database_path`	`Path`	SQLite cache database path.
`total_fetched`	`int`	Number of RSS items fetched.
`new_count`	`int`	Number of fetched items that were not already in the cache.
`display_all`	`bool`	Whether returned items include all fetched items.
`items`	`list[NBERFeedItem]`	Items selected for display or structured output.
`max_items`	`int` or `None`	Display limit when provided.

NBERFeedCleanResult¶

Field	Type	Description
`database_path`	`Path`	SQLite cache database path.
`matched_count`	`int`	Number of cache records matching the clean criteria.
`deleted_count`	`int`	Number of cache records deleted.
`mode`	`str`	Clean mode: `days`, `all`, or `date-range`.
`days`	`int` or `None`	Day threshold for `days` mode.
`start_date`	`str` or `None`	Inclusive start date for `date-range` mode.
`end_date`	`str` or `None`	Inclusive end date for `date-range` mode.
`dry_run`	`bool`	Whether the operation only counted matching records.

NBERInfoCacheClearResult¶

Field	Type	Description
`database_path`	`Path`	SQLite cache database path.
`matched_count`	`int`	Number of cache records matching the clean criteria.
`deleted_count`	`int`	Number of cache records deleted.
`mode`	`str`	Clean mode: `days`, `all`, or `date-range`.
`days`	`int` or `None`	Day threshold for `days` mode.
`start_date`	`str` or `None`	Inclusive start date for `date-range` mode.
`end_date`	`str` or `None`	Inclusive end date for `date-range` mode.
`dry_run`	`bool`	Whether the operation only counted matching records.

InfoCacheSettings¶

Field	Type	Description
`cache_enabled`	`bool`	Global toggle for the `info_cache` lookup.
`cache_ttl_days`	`int`	Cache refresh interval in days.

InfoCacheLookupResult¶

Field	Type	Description
`paper`	`NBER`	The paper returned by the lookup.
`from_cache`	`bool`	`True` when the paper was served from the local `info_cache`.

DownloadFailure¶

Field	Type	Description
`paper_id`	`str`	The paper ID that failed to download.
`error`	`BaseException`	The exception raised during the download attempt.

DownloadBatchResult¶

Field	Type	Description
`paths`	`list[Path]`	Paths of successfully downloaded PDFs.
`failures`	`list[DownloadFailure]`	Failed downloads with their errors.

Formatter Helpers¶

Use formatter helpers when you want stable dictionaries for JSON output or MCP-style responses:

from nber_cli import feed_results, info, related, search_results

info(paper) returns core metadata.
related(paper) returns related optional fields.
search_results(results) returns a structured search payload.
feed_results(result) returns a structured feed fetch payload.

For human-readable text output, use the text formatters from nber_cli.formatters:

from nber_cli.formatters import feed_results_text, info_text, search_results_text

info_text(paper, include_all=False) returns a formatted text string with paper details. Set include_all=True to include topic, programs, and published version.
search_results_text(results) returns a formatted text string with search results.
feed_results_text(result) returns a formatted text string with feed items.

JSON Output Structures¶

--format json for info, search, and feed fetch produces the same dictionaries that the matching *_results formatters build. The JSON payload is always written to stdout, while the cache hit hint (for info) and any error message are written to stderr. This split lets scripts capture the payload with > redirection or pipes without picking up hint or error text.

`info --format json`¶

Produced by info(paper) plus, when --all is set, related(paper) and a conditional published_version field:

Field	Type	Always present	Notes
`id`	`str`	yes	`paper_id` formatted as `wNNNN`.
`title`	`str`	yes	Empty string when NBER does not expose one.
`authors`	`list[str]`	yes	Empty list when NBER does not expose any.
`date`	`str`	yes	Publication date as exposed by NBER; may be empty.
`abstract`	`str`	yes	Empty string when NBER does not expose an abstract.
`url`	`str`	no	Present only when the paper has a non-empty NBER URL.
`topic`	`str`	only with `--all`	`None`-able; emitted as `null` when unknown.
`programs`	`str`	only with `--all`	`None`-able; emitted as `null` when unknown.
`published_version`	`str`	only with `--all` and truthy	Omitted entirely when NBER does not expose one.

`search --format json`¶

Produced by search_results(results):

Field	Type	Always present	Notes
`query`	`str`	yes	The original query.
`total_results`	`int`	yes	NBER-reported total.
`page`	`int`	yes	Current page.
`per_page`	`int`	yes	Page size, one of `20`, `50`, `100`.
`start_date`	`str`	no	Present only when the call applied a start date.
`end_date`	`str`	no	Present only when the call applied an end date.
`results`	`list[object]`	yes	Per-paper dictionaries with the same fields as `search_result(paper)`.

Each entry in results carries id, title, authors, date, abstract, and url. Unlike info, url is always emitted (possibly empty).

`feed fetch --format json`¶

Produced by feed_results(result):

Field	Type	Notes
`source_url`	`str`	The RSS feed URL that was fetched.
`database_path`	`str`	Absolute path of the SQLite database the items were written to.
`total_fetched`	`int`	Total items parsed from the feed.
`new_count`	`int`	Items that were not already in the local cache.
`display_all`	`bool`	`true` when `results` includes all fetched items, `false` when limited to new ones.
`max_items`	`int` or `null`	The cap from `--max-items` when provided.
`displayed_count`	`int`	Number of items actually included in `results`.
`results`	`list[object]`	Per-item dictionaries: `id`, `title`, `authors`, `abstract`, `url`, `source_url`, `guid`.

Compatibility Notes¶

The JSON structures are the published output contract used by both the CLI and the MCP tools. Additive fields (new optional keys) may appear in a minor version. Renaming or removing an existing key, or changing the type of an existing field, is treated as a breaking change. Scripts that consume --format json should treat unknown keys as ignored data rather than asserting on the full key set.