This function constructs time series of counts for Germany's municipalities (Gemeinden) and districts (Kreise).
xwalk_ags(
data,
ags,
time,
xwalk,
variables = NULL,
strata = NULL,
weight = NULL,
fuzzy_time = FALSE,
verbose = TRUE
)A data frame or a data frame extension (e.g. a tibble).
Name of the character variable (quoted) with municipality AGS (Gemeinden, 8 digits) or district AGS (Kreise, 5 digits).
Name of the variable (quoted) identifying the year (YYYY format). Values will be coerced to integers.
Name of the crosswalk. The following crosswalks are available:
xd19, xd20 for district-level data
between 1990-2019/2020.
xm19, xm20 for municipality-level
data between 1990-2019/2020.
Either a vector of names (quoted) for
variables to interpolate or NULL to disable interpolation and
return the data matched with the xwalk.
Vector of variable names (quoted) or NULL. See
details.
Name of the interpolation weight or NULL.
The following are available:
pop: Population weights.
size: Area weights.
emp: Weights based on the number of employees (1998 onwards).
If FALSE the crosswalk and the data
are matched exactly by ags and time. If TRUE
they are matched exactly by ags and as best as possible on
time. See details below.
If TRUE the function outputs information on
the number of matched and unmatched rows.
If interpolation is requested, the crosswalked and interpolated
data are returned. If interpolation is not requested, the data matched
with the crosswalk are returned. The following variables are added:
row_id row number of data before matching.
ags[*] the crosswalked AGS.
year_xw the matched year from the crosswalk.
[*]_conv the interpolation weight.
diff the absolute difference between year_xw
and time.
This function facilitates the use of crosswalks constructed by the BBSR for municipalities and districts in Germany (Milbert 2010). The crosswalks map one year's set of district/municipality identifiers to later year's identifiers and provide weights to perform area or population weighted interpolation.
All data rows with NAs in either the ags or time
variable are excluded. The same applies to all rows with a value in
ags or time that never appears in the crosswalk.
Fuzzy matching uses the absolute difference between the year reported in the data and a crosswalk year. If there is a tie, crosswalk years from before the year reported in the data are preferred.
If area or population weighted interpolation is requested (i.e., when
variables are supplied), the combination of the variables set
in ags, time and strata need to uniquely
identify a row in data.
Caution: Data from https://www.regionalstatistik.de/ sometimes includes
annual values for merged units (e.g., Städteregion Aachen, 05334)) and
for their former parts (Kreis Aachen, 05354 and Stadt Aachen, 05313).
When such data is crosswalked with fuzzy_time=TRUE and
interpolated, the final counts will be off by approximately factor 2.
The reason is that the final output is the sum of the interpolated counts
for the parts and the measured count of the merged unit.
Milbert, Antonia. 2010. "Gebietsreformen–politische Entscheidungen und Folgen für die Statistik." BBSR-Berichte kompakt 6/2010. Bundesinsitut für Bau-, Stadt-und Raumfoschung.
data(btw_sn)
btw_sn_ags20 <- xwalk_ags(
data = btw_sn,
ags = "district",
time = "year",
xwalk = "xd20",
variables = c("voters", "valid"),
weight = "pop"
)
#>
#> Total number of obs: 155
#>
#> Excluded obs:
#> id/time NA AGS unk Year unk
#> 0 0 0
#>
#> Matched obs:
#> exact fuzzy
#> 126 NA
#>
#> Unmatched obs: 29
#>
head(btw_sn_ags20)
#> # A tibble: 6 × 4
#> # Groups: year [1]
#> year ags20 voters valid
#> <dbl> <chr> <dbl> <dbl>
#> 1 1998 14511 234333. 190286.
#> 2 1998 14521 344668 284724
#> 3 1998 14522 297625. 242744.
#> 4 1998 14523 228689 181538
#> 5 1998 14524 306745. 247699.
#> 6 1998 14612 402716. 328183.