04-data.qmd

## Data

I construct a house price index from a granular and rich set of house price data. The housing stock and floorspace, population, and employment data are all obtained from the German Regional Statistical Offices Database.

### House prices {#sec-hedonic-hse}

I use the RWI-GEO-Real Estate Data of the FDZ Ruhr at RWI [@rwi_redhk_2020] to construct quality-adjusted house prices.[^04-data-1] The data are highly detailed (at a scale of 1km^2^ grid), cover all of Germany, and have been available since 2007. Moreover, the data come with a rich set of property characteristics, enabling us to compute a hedonic price index to quality-adjust house prices.

[^04-data-1]: The original data is provided by *ImmobilienScout24*, Germany's largest online platform for listing real estate (for both selling and renting houses and apartments). The house prices are self-reported offer prices by the respective home seller or agent and may therefore differ from the actual transaction prices.

I construct a mix-adjusted house price index from the following panel hedonic regression 
$$
\ln P_{hit} = \delta_{it} + \mathbf{X}_{hit}\boldsymbol{\beta} + e_{hit},
$$ {#eq-hedonic-hse} 
where $h$ indexes houses, $i$ districts and $t$ years 2008-2019, $P$ price of houses in euros per m^2^, $\delta_{it}$ denotes district-year fixed effects that are of main interest to estimate, and $\mathbf{X}$ includes a set of house characteristics.[^04-data-2] In @eq-hedonic-hse, the estimated intercepts $\widehat{\delta}_{it}$ represent the quality-adjusted prices for each district $i$ in every year $t$. After estimating @eq-hedonic-hse with fixed effects, the hedonic price index is given by $\widehat{\delta}_{it} = \ln P_{hit} - \mathbf{X}\boldsymbol{\widehat{\beta}}$.

[^04-data-2]: House attributes included in the hedonic regression are: floorspace, plot area, number of rooms, number of floors, number of bedrooms, number of bathrooms, type of the house, type of heating, years of construction and renovation, condition and facilities of the property, whether the property has a basement, a guest washroom, is or in a protected building, and is usable as a holiday house. The mix-adjusted house price index computation is based on the method described by @ahlfeldt_etal_2020.

### Housing quantity

Housing quantity is measured by housing units (stock) and housing services proxied by total residential floorspace. As houses vary in both observable and unobservable characteristics, they need to be standardized to account for these differences. In fact, the housing production literature views houses as only differing in the housing services they provide, which are homogeneous and perfectly divisible [@epple_etal_2010; @combes_etal_2021].

In light of that, this paper uses total residential floorspace as the main measure of housing quantity, as a stock count of houses or buildings may not accurately reflect the true level of housing supply in a city or region. Moreover, using housing units as a measure of housing supply does not account for differences in size or other attributes. For example, newly built or renovated houses are often larger and better equipped with features than older houses, which units may not capture. Data on housing quantity variables such as residential units, floorspace, and construction activities (i.e., permits and completions) were obtained from the Regional Atlas of Germany.

### Geographic data

I use geographical data to construct land development intensity, land unavailability, and terrain ruggedness index (TRI) measures.[^04-data-3]

[^04-data-3]: TRI is an objective measure of terrain heterogeneity. I computed TRI according to @riley_etal_1999, which calculates TRI by comparing changes in elevation between a central pixel and its eight neighbors as the square root of the squared sum of these elevation differences. Grid cell level TRI values were then averaged across grid cells within districts for TRI value at the district level. TRI is derived from the same DEM data.

From the Digital Elevation Model (DEM) of Germany at $200\times200$ meter resolution, I calculated slope to extract the share of land corresponding to steep slopes, which makes up the undevelopable land (along with land covered by wetland and water bodies) measure.[^04-data-4] An area exhibiting steep slopes (for example, above 15% in the US context) is considered unsuitable for construction [@saiz_2010].

[^04-data-4]: [Digitales Geländemodell Gitterweite 200 m (DGM200)](https://gdz.bkg.bund.de/index.php/default/digitale-geodaten/digitale-gelandemodelle/digitales-gelandemodell-gitterweite-200-m-dgm200.html)

From the Corine Land Cover (CLC) Germany, compiled by the [German Remote Sensing Data Center (DFD) of DLR](https://www.dlr.de/eoc/en/desktopdefault.aspx/tabid-11882/20871_read-48836) and the [Federal Agency for Cartography and Geodesy (BKG)](https://www.bkg.bund.de), I calculated the exact share of land covered with wetlands and water bodies to measure land unavailability. More, development intensity is constructed from the exact share of "artificial surfaces", a comprehensive land cover class that includes continuous and discontinuous urban fabric, defined by CLC.

#### Undevelopable and developed land {#sec-constraints}

I define undevelopable or unavailable land as land covered by wetlands and water bodies or as potentially developable land with an average slope greater than 15%. Undevelopable land may also include already developed land (an area covered by "artificial surfaces" such as buildings) if we rule out redevelopment through renovation or demolition as a development option. In other words, land already developed may not be regarded as undevelopable as it can be redeveloped. In this paper, already-developed land is not part of the undevelopable land. I use the land cover classes defined by [Corine Land Cover (CLC)](https://land.copernicus.eu/user-corner/technical-library/corine-land-cover-nomenclature-guidelines/html/index.html) that are relevant to Germany.

The area with a slope greater than 15% is defined over the district's total "developable stock" of land. I define the developable stock as the district's total administrative area, excluding the area covered with wetlands and water bodies. In other words, areas covered by forests or agriculture make up the developable stock. Note that developable stock does not exclude areas with a slope greater than 15%. The fraction of area with a slope greater than 15% is defined over this quantity, i.e., developable stock as a denominator.

More concisely, denoting the district's total administrative area by $T$, developed land by $T^{\text{artificial}}$, area covered by wetland by $T^{\text{wetland}}$, water bodies by $T^{\text{water}}$, agriculture by $T^{\text{agri}}$, and forests by $T^{\text{forest}}$, then developable stock $T^{\text{developable}}$ is given by 
$$
T^{\text{developable}} = T - T^{\text{wetland}} - T^{\text{water}} = T^{\text{agri}}  + T^{\text{forest}}.
$$

Then, the fraction of developable land that is lost to steep slopes is defined as $r^{steep} = \frac{T^{steep}}{T^{\text{developable}}}$, where $T^{steep} = T^{\text{developable}} \cdot\mathbb{1}\left[{\text{slope} > 15\%}\right]$, the developable area with a slope greater than 15%.

The fraction of undevelopable land is defined as the ratio of the total undevelopable land to the total administrative land of the district. Undevelopable land $T^{\text{undevelopable}}$, is given by
$$
T^{\text{undevelopable}} = T^{steep} + T^{\text{wetland}} + T^{\text{water}}.
$$ 
Then, the share of undevelopable land (out of the total land), $r^{\text{undevelopable}} = \frac{T^{\text{undevelopable}}}{T}$. Similarly, the share of "developed land" is $r^{\text{developed}} = \frac{T^{\text{artificial}}}{T}$.

Undevelopable share is this paper's main measure of geographical constraint, with TRI and slope as alternative measures. Finally, the developed share measures the existing level of development intensity.

### Bartik shocks: Predicted employment growth

In the main regression analysis, the fundamental source of variation in changes in housing demand is predicted employment, also known as Bartik or local labor demand shock. I use employment data decomposed by seven industries to construct this shock using the 2008 industry employment levels in German districts and the national industry-specific employment growth rates from 2008 to 2019. Labor demand shock is local employment growth in each district that would have resulted, given the district's industry composition in the initial period (2008), had employment in each industry developed over time (2008-2019) in the same way as at the national (Germany) level.

Finally, other controls, including the price of land, population, income, and other socioeconomic control variables used in this study, are all obtained from the @atlasde_2022.