Appendix: Technical Implementation Details¶
This appendix contains technical implementation details for developers and power users who need to understand the internal workings of pyBDL. For user-facing documentation, see the main sections.
Rate Limiting Implementation¶
Architecture¶
The rate limiting system consists of three main components:
- RateLimiter: Thread-safe synchronous rate limiter
- AsyncRateLimiter: Asyncio-compatible asynchronous rate limiter
- PersistentQuotaCache: Thread-safe persistent storage for quota usage
Algorithm¶
The rate limiter uses a sliding window algorithm with multiple time periods:
- Each quota period maintains a deque of timestamps for recent API calls
- When
acquire()is called:- Old timestamps (outside the current window) are removed
- If current count >= limit, calculate wait time or raise exception
- Record current timestamp for all periods
- Save state to persistent cache
- The longest wait time across all periods is used (most restrictive limit)
Time Handling¶
The rate limiter uses time.monotonic() instead of time.time() to
ensure: - Clock adjustments (NTP, daylight saving) don't affect quota
calculations - Accurate elapsed time measurements - Consistent behavior
across different system clock configurations
Thread Safety¶
- RateLimiter: Uses
threading.Lock()for thread-safe operations - AsyncRateLimiter: Uses
asyncio.Lock()for async-safe operations - PersistentQuotaCache: Uses
threading.Lock()for thread-safe cache access
Both limiters can be safely used in concurrent environments.
Cache Implementation¶
The persistent cache uses atomic file writes:
- Write quota data to a temporary file (
quota_cache.json.tmp) - Atomically rename temp file to final location (
quota_cache.json) - This ensures cache integrity even if the process crashes during write
Cache keys are unified for sync and async limiters: - Anonymous users:
anon_<period> - Registered users: reg_<period>
This allows sync and async limiters to share quota state.
Exception Hierarchy¶
- GUSBDLError: Base exception for all GUS BDL API errors
- RateLimitError: Raised when rate limit is exceeded
- RateLimitDelayExceeded: Raised when required delay exceeds
max_delay
Rate Limiter Configuration Options¶
RateLimiter and AsyncRateLimiter support the following parameters:
- quotas: Dictionary mapping period (seconds) to limit or (anon_limit, reg_limit) tuple
- is_registered: Whether the user is registered (affects quota selection)
- cache: Optional PersistentQuotaCache instance for persistent storage
- max_delay: Maximum seconds to wait (None = wait forever, 0 = raise immediately)
- raise_on_limit: If True, raise exception immediately; if False, wait
- buffer_seconds: Small buffer time added to wait calculations (default: 0.05s)
Configuration Implementation Details¶
Cache File Management¶
The request cache system stores responses in JSON files:
Cache location¶
- Project-local (default):
.cache/pybdl/directory in the project root - Global: Platform-specific cache directory:
- Linux:
~/.cache/pybdl/ - macOS:
~/Library/Caches/pybdl/ - Windows:
%LOCALAPPDATA%\\pybdl\\cache\\
Cache file structure¶
Cache files are named based on request parameters: - Format:
{method}_{endpoint_hash}.json - Hash includes: URL, query parameters,
headers (API key excluded)
Cache expiry¶
- Responses are cached with timestamps
- Expired entries are automatically ignored
- Cache files are not automatically cleaned (can be manually deleted)
Internal cache helpers¶
get_default_cache_path(): Returns platform-appropriate cache directoryget_cache_file_path(filename, use_global_cache=False, custom_path=None): Returns a file path inside the resolved cache directoryresolve_cache_file_path(filename, use_global_cache=False, custom_file=None): Resolves an explicit file path or falls back to the default cache directory
Caching Internals¶
pyBDL uses hishel on top of httpx for both synchronous and
asynchronous HTTP caching.
HTTP client selection¶
- Sync without cache:
httpx.Client - Sync with cache:
hishel.SyncCacheClient - Async without cache:
httpx.AsyncClient - Async with cache:
hishel.AsyncCacheClient
Cache backends¶
cache_backend="file":- Stores cached responses in
http_cache.db - The file lives in the same directory as the quota cache file
- Sync and async clients point to the same cache database
cache_backend="memory":- Uses SQLite
:memory: - Cache is process-local and not persisted
- Sync and async clients each get their own in-memory cache
cache_backend=None:- Bypasses Hishel entirely and uses plain
httpxclients
Cache file placement¶
When the file backend is enabled, pyBDL resolves the quota cache path first and then places the HTTP cache beside it:
If quota_cache_file is explicitly set, that file's parent directory is
reused. Otherwise pyBDL uses the default project-local or global cache
directory, depending on configuration.
Cache expiration model¶
cache_expire_afteris applied as the default TTL for stored responses- Expired entries are treated as stale and will not be reused as fresh cache hits
- A later request for the same URL may refresh the stored entry
Quota interaction with cache¶
Rate limiting and caching are intentionally coordinated:
- pyBDL reserves a quota slot before making a request
- the HTTP client returns either a network response or a cached response
- if the response was served from cache, the reservation is released
This design keeps quota accounting safe in mixed sync/async scenarios while ensuring cached responses do not consume quota in normal use.
Practical quota effects¶
- A cache miss counts against quota
- A cache hit does not count against quota after refund
- File-backed cache can reduce repeated quota usage across separate runs
- Memory-backed cache only helps within the current process lifetime
Proxy Configuration Internals¶
The proxy configuration is handled at the HTTP client level:
Proxy stack¶
- Uses
httpx.Client/hishel.SyncCacheClientfor synchronous requests - Uses
httpx.AsyncClient/hishel.AsyncCacheClientfor asynchronous requests - Proxy authentication uses HTTP Basic Auth
Proxy configuration precedence¶
- Direct parameter in
BDLConfig - Environment variables (
BDL_PROXY_URL, etc.) - Default values (None)
Supported proxy URL forms¶
- HTTP proxy:
http://proxy.example.com:8080 - HTTPS proxy:
https://proxy.example.com:8080 - SOCKS proxy: Not directly supported (requires additional configuration)
Proxy authentication¶
- Username and password are sent via HTTP Basic Auth headers
- Credentials are not logged or exposed in error messages
- For security, prefer environment variables over hardcoded credentials
Access Layer Implementation¶
DataFrame Conversion¶
The access layer converts API responses to pandas DataFrames through several steps:
- Column Name Normalization: camelCase → snake_case using regex patterns
- Data Type Inference:
- Attempts numeric conversion (int/float)
- Detects boolean values
- Preserves strings/objects
Nested Data Normalization¶
For data endpoints with nested values arrays:
- Extract parent-level fields (e.g.,
id,name) - Flatten nested array: each nested item becomes a row
- Combine parent fields with nested fields
- Rename fields for clarity (e.g.,
id→unit_id)
Example transformation:
# API response:
[{"id": "1", "name": "Warsaw", "values": [{"year": 2021, "val": 1000}]}]
# Access layer output:
# DataFrame with columns: unit_id, unit_name, year, val
API Client Architecture¶
Request Handling¶
All API clients inherit from a base client class that handles:
- Rate Limiting: Automatic quota enforcement before requests
- Caching: Optional response caching (if enabled)
- Error Handling: Converts HTTP errors to Python exceptions
- Pagination: Automatic page fetching and aggregation
- Internationalization: Language parameter handling
HTTP Client Selection¶
- Synchronous: Uses
httpx.Client(orhishel.SyncCacheClientwhen caching is enabled) - Asynchronous: Uses
httpx.AsyncClient(orhishel.AsyncCacheClientwhen caching is enabled) - Both clients share the same configuration and rate limiting state
Response Processing¶
- Parse JSON response
- Extract data array or object
- Handle pagination metadata
- Return structured data (dict/list)
Error Handling¶
- HTTP 4xx/5xx errors →
GUSBDLErroror subclasses - Rate limit errors →
RateLimitError - Network errors → Standard Python exceptions
- JSON parsing errors →
ValueError
Seealso