Bulk Data Client
clients.bulk_data - Client for USPTO bulk data API.
This module provides a client for interacting with the USPTO Open Data Portal (ODP) Bulk Data API. It allows you to search for and download bulk data products.
- class pyUSPTO.clients.bulk_data.BulkDataClient(config=None, base_url=None)[source]
Bases:
BaseUSPTOClient[BulkDataResponse]Client for interacting with the USPTO bulk data API.
- ENDPOINTS = {'download_file': 'api/v1/datasets/products/files/{productIdentifier}/{fileName}', 'product_by_id': 'api/v1/datasets/products/{product_id}', 'products_search': 'api/v1/datasets/products/search'}
- __init__(config=None, base_url=None)[source]
Initialize the BulkDataClient.
- Parameters:
config (
USPTOConfig|None) – USPTOConfig instance containing API key and settings. If not provided, creates config from environment variables (requires USPTO_API_KEY).base_url (
str|None) – Optional base URL override for the USPTO Bulk Data API. If not provided, uses config.bulk_data_base_url or default.
- download_file(file_data, destination=None, file_name=None, overwrite=False, extract=False)[source]
Download a file from the bulk data API.
Does not extract archives (tar.gz, zip) by default. The download uses base class helpers for consistent behavior across all clients.
- Parameters:
file_data (
FileData) – FileData object containing download info and product_identifier.destination (
str|None) – Directory to save/extract to. Defaults to current directory.file_name (
str|None) – Override filename. Defaults to file_data.file_name.overwrite (
bool) – Whether to overwrite existing files. Defaults to False.extract (
bool) – Whether to auto-extract archives. Defaults to False.
- Returns:
Path to downloaded file or extracted directory.
- Return type:
- Raises:
FileExistsError – If file exists and overwrite=False.
Examples
Download and extract a file: >>> product = client.get_product_by_id(“product-123”, include_files=True, extract=True) >>> file_data = product.product_file_bag.file_data_bag[0] >>> path = client.download_file(file_data, destination=”./downloads”)
Download without extraction: >>> path = client.download_file(file_data, extract=False)
- get_product_by_id(product_id, file_data_from_date=None, file_data_to_date=None, offset=None, limit=None, include_files=None, latest=None)[source]
Get a specific bulk data product by ID.
- Parameters:
product_id (
str) – The product identifier.file_data_from_date (
str|None) – Filter files by data from date (YYYY-MM-DD).file_data_to_date (
str|None) – Filter files by data to date (YYYY-MM-DD).offset (
int|None) – Number of product file records to skip.limit (
int|None) – Number of product file records to collect.include_files (
bool|None) – Whether to include product files in the response.latest (
bool|None) – Whether to return only the latest product file.
- Returns:
The requested product.
- Return type:
- Raises:
ValueError – If product not found in response.
Examples
Get product without files: >>> product = client.get_product_by_id(“patent-grant-data-text”)
Get product with files: >>> product = client.get_product_by_id( … “patent-grant-data-text”, … include_files=True, … latest=True … )
- paginate_products(**kwargs)[source]
Paginate through all products matching the search criteria.
- Parameters:
**kwargs (
Any) – Keyword arguments passed to search_products- Yields:
BulkDataProduct objects
- search_products(query=None, offset=None, limit=None, facets=None, fields=None)[source]
Search for Bulk Data Products.
Note: The USPTO Bulk Data API only supports full-text search in the query parameter. Field-specific queries (e.g., field:value) do not work despite being documented in the API swagger specification.
- Parameters:
- Returns:
Response containing matching products.
- Return type:
Examples
Search with full-text query: >>> response = client.search_products(query=”Patent”, limit=50)