Appendix: Technical Implementation Details¶

This appendix contains technical implementation details for developers and power users who need to understand the internal workings of pyBDL. For user-facing documentation, see the main sections.

Rate Limiting Implementation¶

Architecture¶

The rate limiting system consists of three main components:

RateLimiter: Thread-safe synchronous rate limiter
AsyncRateLimiter: Asyncio-compatible asynchronous rate limiter
PersistentQuotaCache: Thread-safe persistent storage for quota usage

Algorithm¶

The rate limiter uses a sliding window algorithm with multiple time periods:

Each quota period maintains a deque of timestamps for recent API calls
When acquire() is called:
- Old timestamps (outside the current window) are removed
- If current count >= limit, calculate wait time or raise exception
- Record current timestamp for all periods
- Save state to persistent cache
The longest wait time across all periods is used (most restrictive limit)

Time Handling¶

The rate limiter uses time.monotonic() instead of time.time() to ensure: - Clock adjustments (NTP, daylight saving) don't affect quota calculations - Accurate elapsed time measurements - Consistent behavior across different system clock configurations

Thread Safety¶

RateLimiter: Uses threading.Lock() for thread-safe operations
AsyncRateLimiter: Uses asyncio.Lock() for async-safe operations
PersistentQuotaCache: Uses threading.Lock() for thread-safe cache access

Both limiters can be safely used in concurrent environments.

Cache Implementation¶

The persistent cache uses atomic file writes:

Write quota data to a temporary file (quota_cache.json.tmp)
Atomically rename temp file to final location (quota_cache.json)
This ensures cache integrity even if the process crashes during write

Cache keys are unified for sync and async limiters: - Anonymous users: anon_<period> - Registered users: reg_<period>

This allows sync and async limiters to share quota state.

Exception Hierarchy¶

GUSBDLError (base exception)
└── RateLimitError
    └── RateLimitDelayExceeded

GUSBDLError: Base exception for all GUS BDL API errors
RateLimitError: Raised when rate limit is exceeded
RateLimitDelayExceeded: Raised when required delay exceeds max_delay

Rate Limiter Configuration Options¶

RateLimiter and AsyncRateLimiter support the following parameters:

quotas: Dictionary mapping period (seconds) to limit or (anon_limit, reg_limit) tuple
is_registered: Whether the user is registered (affects quota selection)
cache: Optional PersistentQuotaCache instance for persistent storage
max_delay: Maximum seconds to wait (None = wait forever, 0 = raise immediately)
raise_on_limit: If True, raise exception immediately; if False, wait
buffer_seconds: Small buffer time added to wait calculations (default: 0.05s)

Configuration Implementation Details¶

Cache File Management¶

The request cache system stores responses in JSON files:

Cache location¶

Project-local (default): .cache/pybdl/ directory in the project root
Global: Platform-specific cache directory:
Linux: ~/.cache/pybdl/
macOS: ~/Library/Caches/pybdl/
Windows: %LOCALAPPDATA%\\pybdl\\cache\\

Cache file structure¶

Cache files are named based on request parameters: - Format: {method}_{endpoint_hash}.json - Hash includes: URL, query parameters, headers (API key excluded)

Cache expiry¶

Responses are cached with timestamps
Expired entries are automatically ignored
Cache files are not automatically cleaned (can be manually deleted)

Internal cache helpers¶

get_default_cache_path(): Returns platform-appropriate cache directory
get_cache_file_path(filename, use_global_cache=False, custom_path=None): Returns a file path inside the resolved cache directory
resolve_cache_file_path(filename, use_global_cache=False, custom_file=None): Resolves an explicit file path or falls back to the default cache directory

Caching Internals¶

pyBDL uses hishel on top of httpx for both synchronous and asynchronous HTTP caching.

HTTP client selection¶

Sync without cache: httpx.Client
Sync with cache: hishel.SyncCacheClient
Async without cache: httpx.AsyncClient
Async with cache: hishel.AsyncCacheClient

Cache backends¶

cache_backend="file":
Stores cached responses in http_cache.db
The file lives in the same directory as the quota cache file
Sync and async clients point to the same cache database
cache_backend="memory":
Uses SQLite :memory:
Cache is process-local and not persisted
Sync and async clients each get their own in-memory cache
cache_backend=None:
Bypasses Hishel entirely and uses plain httpx clients

Cache file placement¶

When the file backend is enabled, pyBDL resolves the quota cache path first and then places the HTTP cache beside it:

<cache_dir>/quota_cache.json
<cache_dir>/http_cache.db

If quota_cache_file is explicitly set, that file's parent directory is reused. Otherwise pyBDL uses the default project-local or global cache directory, depending on configuration.

Cache expiration model¶

cache_expire_after is applied as the default TTL for stored responses
Expired entries are treated as stale and will not be reused as fresh cache hits
A later request for the same URL may refresh the stored entry

Quota interaction with cache¶

Rate limiting and caching are intentionally coordinated:

pyBDL reserves a quota slot before making a request
the HTTP client returns either a network response or a cached response
if the response was served from cache, the reservation is released

This design keeps quota accounting safe in mixed sync/async scenarios while ensuring cached responses do not consume quota in normal use.

Practical quota effects¶

A cache miss counts against quota
A cache hit does not count against quota after refund
File-backed cache can reduce repeated quota usage across separate runs
Memory-backed cache only helps within the current process lifetime

Proxy Configuration Internals¶

The proxy configuration is handled at the HTTP client level:

Proxy stack¶

Uses httpx.Client / hishel.SyncCacheClient for synchronous requests
Uses httpx.AsyncClient / hishel.AsyncCacheClient for asynchronous requests
Proxy authentication uses HTTP Basic Auth

Proxy configuration precedence¶

Direct parameter in BDLConfig
Environment variables (BDL_PROXY_URL, etc.)
Default values (None)

Supported proxy URL forms¶

HTTP proxy: http://proxy.example.com:8080
HTTPS proxy: https://proxy.example.com:8080
SOCKS proxy: Not directly supported (requires additional configuration)

Proxy authentication¶

Username and password are sent via HTTP Basic Auth headers
Credentials are not logged or exposed in error messages
For security, prefer environment variables over hardcoded credentials

Access Layer Implementation¶

DataFrame Conversion¶

The access layer converts API responses to pandas DataFrames through several steps:

Column Name Normalization: camelCase → snake_case using regex patterns
Data Type Inference:
- Attempts numeric conversion (int/float)
- Detects boolean values
- Preserves strings/objects

Nested Data Normalization¶

For data endpoints with nested values arrays:

Extract parent-level fields (e.g., id, name)
Flatten nested array: each nested item becomes a row
Combine parent fields with nested fields
Rename fields for clarity (e.g., id → unit_id)

Example transformation:

# API response:
[{"id": "1", "name": "Warsaw", "values": [{"year": 2021, "val": 1000}]}]

# Access layer output:
# DataFrame with columns: unit_id, unit_name, year, val

API Client Architecture¶

Request Handling¶

All API clients inherit from a base client class that handles:

Rate Limiting: Automatic quota enforcement before requests
Caching: Optional response caching (if enabled)
Error Handling: Converts HTTP errors to Python exceptions
Pagination: Automatic page fetching and aggregation
Internationalization: Language parameter handling

HTTP Client Selection¶

Synchronous: Uses httpx.Client (or hishel.SyncCacheClient when caching is enabled)
Asynchronous: Uses httpx.AsyncClient (or hishel.AsyncCacheClient when caching is enabled)
Both clients share the same configuration and rate limiting state

Response Processing¶

Parse JSON response
Extract data array or object
Handle pagination metadata
Return structured data (dict/list)

Error Handling¶

HTTP 4xx/5xx errors → GUSBDLError or subclasses
Rate limit errors → RateLimitError
Network errors → Standard Python exceptions
JSON parsing errors → ValueError

Seealso

- [Rate limiting](rate_limiting.md) — user-facing rate limiting documentation
- [Configuration](config.md) — user-facing configuration documentation
- [API clients](api_clients.md) — API client usage
- [Access layer](access_layer.md) — access layer documentation