Access Layer ============= The access layer is the **primary user-facing interface** of pyBDL. It provides a clean, pandas DataFrame-based API that automatically handles data conversion and normalization. Overview -------- The access layer sits on top of the raw API clients and provides: - **Automatic DataFrame conversion**: All responses are converted to pandas DataFrames - **Column name normalization**: camelCase API fields are converted to snake_case - **Data type inference**: Proper types (integers, floats, booleans) are automatically detected - **Nested data normalization**: Complex nested structures are flattened into tabular format The main client provides two interfaces: - **Access layer** (default): Returns pandas DataFrames - use `bdl.levels`, `bdl.data`, etc. - **API layer**: Returns raw dictionaries - use `bdl.api.levels`, `bdl.api.data`, etc. For most users, the access layer is recommended as it provides a more Pythonic and data-analysis-friendly interface. Quick Start ----------- .. code-block:: python from pybdl import BDL, BDLConfig # Initialize client bdl = BDL(BDLConfig(api_key="your-api-key")) # Access layer returns DataFrames levels_df = bdl.levels.list_levels() print(levels_df.head()) # Data is ready for analysis print(levels_df.dtypes) print(levels_df.columns) Key Features ------------ DataFrame Conversion ~~~~~~~~~~~~~~~~~~~~ All access layer methods return pandas DataFrames, making data immediately ready for analysis: .. code-block:: python # Get variables as DataFrame variables_df = bdl.variables.list_variables() # Use pandas operations directly filtered = variables_df[variables_df['name'].str.contains('population', case=False)] sorted_vars = variables_df.sort_values('name') Column Name Normalization ~~~~~~~~~~~~~~~~~~~~~~~~~ API responses use camelCase (e.g., ``variableId``, ``unitName``), but the access layer converts these to snake_case (e.g., ``variable_id``, ``unit_name``) for Pythonic access: .. code-block:: python df = bdl.variables.get_variable("3643") # Columns are: variable_id, name, description (not variableId, Name, Description) print(df.columns) Data Type Inference ~~~~~~~~~~~~~~~~~~~ The access layer automatically infers and converts data types: .. code-block:: python df = bdl.data.get_data_by_variable("3643", years=[2021]) # year column is Int64, val column is float print(df.dtypes) Nested Data Normalization ~~~~~~~~~~~~~~~~~~~~~~~~~ The data endpoints return nested structures. The access layer automatically flattens them: .. code-block:: python # API returns: [{"id": "1", "name": "Warsaw", "values": [{"year": 2021, "val": 1000}, ...]}] # Access layer returns flat DataFrame: df = bdl.data.get_data_by_variable("3643", years=[2021]) # Columns: unit_id, unit_name, year, val, attr_id print(df.head()) Available Endpoints ------------------- The access layer provides endpoints for all BDL API resources: .. list-table:: Available Access Endpoints :header-rows: 1 * - Endpoint - Access Method - Description * - Aggregates - ``bdl.aggregates`` - Aggregation level metadata * - Attributes - ``bdl.attributes`` - Attribute metadata * - Data - ``bdl.data`` - Statistical data access * - Levels - ``bdl.levels`` - Administrative unit levels * - Measures - ``bdl.measures`` - Measure unit metadata * - Subjects - ``bdl.subjects`` - Subject hierarchy * - Units - ``bdl.units`` - Administrative units * - Variables - ``bdl.variables`` - Variable metadata * - Years - ``bdl.years`` - Available years Endpoint Details ---------------- Levels ~~~~~~ Administrative unit aggregation levels (country, voivodeship, county, municipality): .. code-block:: python # List all levels levels_df = bdl.levels.list_levels() # Get specific level level_df = bdl.levels.get_level(1) # Level 1 = country # Get metadata metadata_df = bdl.levels.get_levels_metadata() Subjects ~~~~~~~~ Subject categories and hierarchy: .. code-block:: python # List all top-level subjects subjects_df = bdl.subjects.list_subjects() # Get subjects under a parent child_subjects = bdl.subjects.list_subjects(parent_id="P0001") # Search subjects results = bdl.subjects.search_subjects(name="population") # Get specific subject subject_df = bdl.subjects.get_subject("P0001") Variables ~~~~~~~~~ Statistical variables (indicators): .. code-block:: python # List all variables variables_df = bdl.variables.list_variables() # Filter variables filtered = bdl.variables.list_variables( category_id="P0001", name="population" ) # Search variables results = bdl.variables.search_variables(name="unemployment") # Get specific variable variable_df = bdl.variables.get_variable("3643") Data ~~~~ Statistical data retrieval: .. code-block:: python # Get data by variable (most common) df = bdl.data.get_data_by_variable( variable_id="3643", years=[2021], unit_level=2 # Voivodeship level ) # Get data for multiple years df = bdl.data.get_data_by_variable( variable_id="3643", years=[2020, 2021, 2022], unit_level=2 ) # Get data with aggregate filter df = bdl.data.get_data_by_variable( variable_id="3643", years=[2021], aggregate_id=1 ) # Get data by administrative unit df = bdl.data.get_data_by_unit( unit_id="020000000000", variable_ids=["3643"], years=[2021] ) # Get data for a locality df = bdl.data.get_data_by_variable_locality( variable_id="3643", unit_parent_id="1465011", years=[2021] ) # Get data by unit locality df = bdl.data.get_data_by_unit_locality( unit_id="1465011", variable_id="3643", years=[2021] ) The data endpoints automatically normalize nested ``values`` arrays into flat rows. Units ~~~~~ Administrative units (regions, cities, etc.): .. code-block:: python # List units by level voivodeships = bdl.units.list_units(level=2) # Level 2 = voivodeship # Search units warsaw = bdl.units.search_units(name="Warsaw") # Get specific unit unit_df = bdl.units.get_unit("020000000000") # List localities (statistical localities) localities = bdl.units.list_localities(level=6) # Level 6 = municipality # Search localities warsaw_localities = bdl.units.search_localities(name="Warsaw", level=6) # Get specific locality locality_df = bdl.units.get_locality("1465011") Attributes ~~~~~~~~~~ Data attributes (dimensions): .. code-block:: python # List all attributes attributes_df = bdl.attributes.list_attributes() # Get specific attribute attr_df = bdl.attributes.get_attribute("1") Measures ~~~~~~~~ Measure units: .. code-block:: python # List all measures measures_df = bdl.measures.list_measures() # Get specific measure measure_df = bdl.measures.get_measure(1) Aggregates ~~~~~~~~~~ Aggregation types: .. code-block:: python # List all aggregates aggregates_df = bdl.aggregates.list_aggregates() # Get specific aggregate aggregate_df = bdl.aggregates.get_aggregate("1") Years ~~~~~ Available years for data: .. code-block:: python # List all available years years_df = bdl.years.list_years() # Get specific year metadata year_df = bdl.years.get_year(2021) Pagination ---------- Most list methods support pagination: .. code-block:: python # Fetch all pages (default, max_pages=None) all_data = bdl.variables.list_variables() # Fetch only first page first_page = bdl.variables.list_variables(max_pages=1, page_size=50) # Limit number of pages limited = bdl.variables.list_variables(max_pages=5, page_size=100) Parameters: - ``max_pages``: Maximum number of pages to fetch. ``None`` (default) fetches all pages, ``1`` fetches only the first page, ``N`` fetches up to N pages - ``page_size``: Number of results per page (default: 100 from config or 100) Async Usage ----------- All access layer methods have async versions (prefixed with ``a``): .. code-block:: python import asyncio from pybdl import BDL async def main(): bdl = BDL() # Async methods return DataFrames levels_df = await bdl.levels.alist_levels() variables_df = await bdl.variables.alist_variables() # Can run multiple requests concurrently levels_task = bdl.levels.alist_levels() variables_task = bdl.variables.alist_variables() levels_df, variables_df = await asyncio.gather(levels_task, variables_task) return levels_df, variables_df asyncio.run(main()) Available async methods: - ``alist_levels()``, ``alist_variables()``, ``alist_subjects()``, etc. - ``aget_level()``, ``aget_variable()``, ``aget_subject()``, etc. - ``aget_data_by_variable()``, ``aget_data_by_unit()``, etc. Examples -------- Basic Usage ~~~~~~~~~~~ .. code-block:: python from pybdl import BDL, BDLConfig bdl = BDL(BDLConfig(api_key="your-api-key")) # Get administrative levels levels = bdl.levels.list_levels() print(f"Found {len(levels)} administrative levels") # Get variables related to population population_vars = bdl.variables.search_variables(name="population") print(f"Found {len(population_vars)} population-related variables") # Get data for a specific variable data = bdl.data.get_data_by_variable( variable_id="3643", years=[2021], unit_level=2 # Voivodeship level ) print(data.head()) Filtering and Analysis ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get all variables variables = bdl.variables.list_variables() # Filter using pandas economic_vars = variables[variables['name'].str.contains('economic', case=False)] # Get data for multiple variables for var_id in economic_vars['id'].head(5): data = bdl.data.get_data_by_variable(var_id, years=[2021]) print(f"Variable {var_id}: {len(data)} records") Getting Data ~~~~~~~~~~~~ .. code-block:: python # Get data df = bdl.data.get_data_by_variable("3643", years=[2021]) # DataFrame includes IDs and values print(df[['unit_name', 'attr_name', 'val']].head()) # Group by attribute name by_attr = df.groupby('attr_name')['val'].mean() print(by_attr) Working with Nested Data ~~~~~~~~~~~~~~~~~~~~~~~~~ The data endpoints automatically normalize nested structures: .. code-block:: python # API returns nested structure, but access layer flattens it df = bdl.data.get_data_by_variable("3643", years=[2021]) # Each row represents one data point # Columns: unit_id, unit_name, year, val, attr_id, attr_name print(df.head()) # Easy to analyze avg_by_unit = df.groupby('unit_name')['val'].mean() print(avg_by_unit) # Get data for multiple years multi_year_df = bdl.data.get_data_by_variable("3643", years=[2020, 2021, 2022]) # Analyze trends over time yearly_avg = multi_year_df.groupby('year')['val'].mean() print(yearly_avg) See :doc:`examples` for more comprehensive real-world examples. API Reference ------------- .. automodule:: pybdl.access :members: :undoc-members: :show-inheritance: .. seealso:: - :doc:`main_client` for main client usage - :doc:`api_clients` for low-level API access - :doc:`examples` for real-world examples - :doc:`config` for configuration options