Bulk Data Client

clients.bulk_data - Client for USPTO bulk data API.

This module provides a client for interacting with the USPTO Open Data Portal (ODP) Bulk Data API. It allows you to search for and download bulk data products.

class pyUSPTO.clients.bulk_data.BulkDataClient(config=None, base_url=None)[source]

Bases: BaseUSPTOClient[BulkDataResponse]

Client for interacting with the USPTO bulk data API.

ENDPOINTS = {'download_file': 'api/v1/datasets/products/files/{productIdentifier}/{fileName}', 'product_by_id': 'api/v1/datasets/products/{product_id}', 'products_search': 'api/v1/datasets/products/search'}

__init__(config=None, base_url=None)[source]

Initialize the BulkDataClient.

Parameters:

config (USPTOConfig | None) – USPTOConfig instance containing API key and settings. If not provided, creates config from environment variables (requires USPTO_API_KEY).
base_url (str | None) – Optional base URL override for the USPTO Bulk Data API. If not provided, uses config.bulk_data_base_url or default.

download_file(file_data, destination=None, file_name=None, overwrite=False, extract=False)[source]

Download a file from the bulk data API.

Does not extract archives (tar.gz, zip) by default. The download uses base class helpers for consistent behavior across all clients.

Parameters:

file_data (FileData) – FileData object containing download info and product_identifier.
destination (str | None) – Directory to save/extract to. Defaults to current directory.
file_name (str | None) – Override filename. Defaults to file_data.file_name.
overwrite (bool) – Whether to overwrite existing files. Defaults to False.
extract (bool) – Whether to auto-extract archives. Defaults to False.

Returns:

Path to downloaded file or extracted directory.

Return type:

str

Raises:

FileExistsError – If file exists and overwrite=False.

Examples

Download and extract a file: >>> product = client.get_product_by_id(“product-123”, include_files=True, extract=True) >>> file_data = product.product_file_bag.file_data_bag[0] >>> path = client.download_file(file_data, destination=”./downloads”)

Download without extraction: >>> path = client.download_file(file_data, extract=False)

get_product_by_id(product_id, file_data_from_date=None, file_data_to_date=None, offset=None, limit=None, include_files=None, latest=None)[source]

Get a specific bulk data product by ID.

Parameters:

product_id (str) – The product identifier.
file_data_from_date (str | None) – Filter files by data from date (YYYY-MM-DD).
file_data_to_date (str | None) – Filter files by data to date (YYYY-MM-DD).
offset (int | None) – Number of product file records to skip.
limit (int | None) – Number of product file records to collect.
include_files (bool | None) – Whether to include product files in the response.
latest (bool | None) – Whether to return only the latest product file.

Returns:

The requested product.

Return type:

BulkDataProduct

Raises:

ValueError – If product not found in response.

Examples

Get product without files: >>> product = client.get_product_by_id(“patent-grant-data-text”)

Get product with files: >>> product = client.get_product_by_id( … “patent-grant-data-text”, … include_files=True, … latest=True … )

paginate_products(**kwargs)[source]

Paginate through all products matching the search criteria.

Parameters:: **kwargs (Any) – Keyword arguments passed to search_products
Yields:: BulkDataProduct objects

search_products(query=None, offset=None, limit=None, facets=None, fields=None)[source]

Search for Bulk Data Products.

Note: The USPTO Bulk Data API only supports full-text search in the query parameter. Field-specific queries (e.g., field:value) do not work despite being documented in the API swagger specification.

Parameters:

query (str | None) – Full-text search query string. Field-specific syntax like “productIdentifier:value” is not supported by the API.
offset (int | None) – Number of product records to skip.
limit (int | None) – Number of product records to collect.
facets (bool | None) – Whether to enable facets in the response.
fields (list[str] | None) – List of field names to include in the response.

Returns:

Response containing matching products.

Return type:

BulkDataResponse

Examples

Search with full-text query: >>> response = client.search_products(query=”Patent”, limit=50)