Bulk Data Client

clients.bulk_data - Client for USPTO bulk data API.

This module provides a client for interacting with the USPTO Open Data Portal (ODP) Bulk Data API. It allows you to search for and download bulk data products.

class pyUSPTO.clients.bulk_data.BulkDataClient(api_key=None, base_url=None, config=None)[source]

Bases: BaseUSPTOClient[BulkDataResponse]

Client for interacting with the USPTO bulk data API.

ENDPOINTS = {'download_file': 'api/v1/datasets/products/files/{productIdentifier}/{fileName}', 'product_by_id': 'api/v1/datasets/products/{product_id}', 'products_search': 'api/v1/datasets/products/search'}

__init__(api_key=None, base_url=None, config=None)[source]

Initialize the BulkDataClient.

Parameters:

api_key (Optional[str]) – Optional API key for authentication
base_url (Optional[str]) – The base URL of the API, defaults to config.bulk_data_base_url or “https://api.uspto.gov/api/v1/datasets”
config (Optional[USPTOConfig]) – Optional USPTOConfig instance

download_file(file_data, destination=None, file_name=None, overwrite=False, extract=True)[source]

Download a file from the bulk data API.

Automatically extracts archives (tar.gz, zip) by default. The download uses base class helpers for consistent behavior across all clients.

Parameters:

file_data (FileData) – FileData object containing download info and product_identifier.
destination (Optional[str]) – Directory to save/extract to. Defaults to current directory.
file_name (Optional[str]) – Override filename. Defaults to file_data.file_name.
overwrite (bool) – Whether to overwrite existing files. Defaults to False.
extract (bool) – Whether to auto-extract archives. Defaults to True.

Returns:

Path to downloaded file or extracted directory.

Return type:

str

Raises:

FileExistsError – If file exists and overwrite=False.

Examples

Download and extract a file: >>> product = client.get_product_by_id(“product-123”, include_files=True) >>> file_data = product.product_file_bag.file_data_bag[0] >>> path = client.download_file(file_data, destination=”./downloads”)

Download without extraction: >>> path = client.download_file(file_data, extract=False)

get_product_by_id(product_id, file_data_from_date=None, file_data_to_date=None, offset=None, limit=None, include_files=None, latest=None)[source]

Get a specific bulk data product by ID.

Parameters:

product_id (str) – The product identifier.
file_data_from_date (Optional[str]) – Filter files by data from date (YYYY-MM-DD).
file_data_to_date (Optional[str]) – Filter files by data to date (YYYY-MM-DD).
offset (Optional[int]) – Number of product file records to skip.
limit (Optional[int]) – Number of product file records to collect.
include_files (Optional[bool]) – Whether to include product files in the response.
latest (Optional[bool]) – Whether to return only the latest product file.

Returns:

The requested product.

Return type:

BulkDataProduct

Raises:

ValueError – If product not found in response.

Examples

Get product without files: >>> product = client.get_product_by_id(“patent-grant-data-text”)

Get product with files: >>> product = client.get_product_by_id( … “patent-grant-data-text”, … include_files=True, … latest=True … )

paginate_products(post_body=None, **kwargs)[source]

Paginate through all products matching the search criteria.

Supports both GET and POST requests.

Parameters:

post_body (Optional[dict[str, Any]]) – Optional POST body for complex search queries
**kwargs (Any) – Keyword arguments for GET-based pagination

Yields:

BulkDataProduct objects

Return type:

Iterator[BulkDataProduct]

search_products(query=None, offset=None, limit=None, facets=None, fields=None)[source]

Search for Bulk Data Products.

Note: The USPTO Bulk Data API only supports full-text search in the query parameter. Field-specific queries (e.g., field:value) do not work despite being documented in the API swagger specification.

Parameters:

query (Optional[str]) – Full-text search query string. Field-specific syntax like “productIdentifier:value” is not supported by the API.
offset (Optional[int]) – Number of product records to skip.
limit (Optional[int]) – Number of product records to collect.
facets (Optional[bool]) – Whether to enable facets in the response.
fields (Optional[list[str]]) – List of field names to include in the response.

Returns:

Response containing matching products.

Return type:

BulkDataResponse

Examples

Search with full-text query: >>> response = client.search_products(query=”Patent”, limit=50)