Sign up

Primary and secondary data: concepts, concerns, errors, and issues. (Features).
Subject:
Valuation (Analysis)
Valuation (Information management)
Valuation (Methods)
Author:
Rabianski, Joseph S.
Pub Date:
01/01/2003
Publication:
Name: Appraisal Journal Publisher: The Appraisal Institute Audience: Trade Format: Magazine/Journal Subject: Business; Real estate industry Copyright: COPYRIGHT 2003 The Appraisal Institute ISSN: 0003-7087
Issue:
Date: Jan, 2003 Source Volume: 71 Source Issue: 1
Topic:
Event Code: 260 General services; 310 Science & research Computer Subject: Company systems management
Geographic:
Geographic Scope: United States Geographic Code: 1USA United States

Accession Number:
96694281
Full Text:
abstract

Appraisals and market studies use two types of data--primary and secondary data. The appraiser or market analyst must know what they are and what affects them. All data used in appraisals and market studies should be current, relevant, reliable, accurate, and conceptually correct. This article presents a discussion of each of these terms and their significance in the context of the data and in the analysis. The article then discusses the nature of potential errors that can affect primary and secondary data. Several categories of errors can exist. The analyst needs to be able to recognize the error, understand its significance and evaluate the applicability of that data in the analysis.

**********

As the foundation of any appraisal or market study, data needs to be current, relevant, reliable, accurate, and conceptually correct. Data can be primary or secondary and the appraiser or market analyst needs to know which type he is using and understand its characteristics and limitations. He also needs to recognize the nature of potential errors that can affect each type of data and how these errors can be handled. This article provides definitions, practical methodology, and a discussion of issues relating to the use of primary and secondary data.

Primary and secondary Data

Primary and secondary data are defined in The Dictionary of Real Estate Appraisal as follows:

* Primary data--Information that researchers gather first hand. (1)

* Secondary data--Information from secondary sources, i.e., not directly compiled by the analyst; may include published or unpublished work based on research that relies on primary sources of any material other than primary sources used to prepare a written work. (2)

The Appraisal of Real Estate provides only a brief discussion of primary data when it states that "specific data and competitive supply and demand data, which deal with the subject property and comparable properties in the subject market, is most often obtained by appraisers themselves and therefore qualifies as primary data." (3) This statement is correct but short of being an explicit definition.

The Appraisal of Real Estate also provides only a brief discussion of secondary data when it states that "this secondary data describes the general real estate market and is usually collected by a research firm or government agency." (4) Again, this is close to a definition, but it is not an explicit definition.

Consider the following definitions for primary and secondary data taken from marketing literature:

* Primary data is facts and information collected specifically for the purpose of the investigation at hand.

* Secondary data is facts and information gathered not for the immediate study at hand but for some other purpose. (5)

Secondary data has been gathered by others for their own purposes, but the data could be useful in the analysis of a wide range of real property. In general, secondary data exists in published sources.

These definitions relate these two forms of data. Secondary data is generated by means of primary data gathering techniques. One person or entity's primary data become another person or entity's secondary data. The best example of this relationship is the data of the U.S. Bureau of the Census. The Bureau is responsible for counting the population every ten years to establish the districts of representation in the House of Representatives. Over time, this charge "to count the population" evolved into the analysis that the Bureau currently conducts of the demographic and economic characteristics of the population, characteristics of retail trade, composition of the industrial sector, nature of the housing stock, etc. This wide array of data is all primary data to the Bureau of the Census.

When census data is used in a market evaluation for an appraisal or a market study, it becomes secondary data. Appraisers and market analysts use many data sets that are gathered as primary data. Any demographic and economic data generated by any government agency (federal, state, and local) for whatever purpose they deem necessary becomes someone else's secondary data. The data on a subject property submitted to a database of comparable sales goes in as primary data gathered for and used in a specific appraisal. When another appraiser retrieves this information, it becomes secondary data when used in that appraiser's report.

Appraisers need to understand the issues involved in primary data gathering because they are involved in primary data gathering whenever they do field research and because the secondary data they use from published sources was previously gathered as primary data.

Primary and secondary data are used in appraisals, highest and best use studies, market analysis sections of appraisals, and full-scale market studies. A conceptual link between these two types of data and their use appears in the "Levels of Study" discussion presented in the Appraisal Institute's Course 520, "Highest and Best Use and Market Analysis" and in Chapter 2 of Market Analysis for Valuation Appraisals, which is published by the Appraisal Institute. A detailed discussion of this relationship appears later in this paper.

Methods for Obtaining Primary Data

The analyst can obtain primary data through the process of direct observation or by explicit questioning of people.

Observation. Observation as a data gathering technique focuses attention on an observable fact or inanimate entity such as a building or on an observable action or behavior by an animate entity such as a homeowner or shopper. Observation of an inanimate object is the easier of the two activities, but it is not free from error or misinterpretation.

Relatively accurate information can be gathered by observing inanimate objects such as land and improvements. The size, shape, frontage, depth, topography, and soil, as well as the physical features and conditions of the improvements, can be described and measured with a high degree of accuracy.

Activity can be observed with varying degrees of accuracy. This category of data includes pedestrian and automobile traffic counts, directions of traffic flow at intersections, male and female shopper proportions, number of cars in parking lots, vacant space in property markets, volume of new construction, space absorption, queuing problems at checkout counters, and other activity which can be counted or described. Inaccuracy in the measurement of activity is typically greater than the degree of inaccuracy in the measurement of an inanimate or inactive item.

The activity that an observer sees is generally a true fact. Unless the shopper is deceiving the observer, the observed purchases reflect the shopper's needs and the observed pattern of behavior reflects the shopper's true intent. Observation should not interpret the action or behavior; it should report it. Observation occurs at the place and time of the action, and proper recording of all pertinent facts by the observer at the time of the observation should prevent loss of information due to forgetfulness, disorganization, or mismanagement of the facts.

Observation does not provide accurate demographic, economic, and psychographic data. This data must be inferred from observation. How old does that man appear to be? What is that woman's household income? Attempting to sort shoppers by age category is difficult unless very broad age groupings are used. Accurate inference of household, family, or per capita income from manner of dress or automobile driven to a shopping center is not possible.

Psychographic data include attitudes, habits, lifestyle, perceptions, preferences, and tastes that cannot always be ascertained by observation. Consider the following. A man bought a townhouse in the urban center (an observable fact) so he must work downtown. No, he wants to live near sports venues and commute against the flow of rush-hour traffic. A woman bought running shoes (an observable fact) so she must have a personal orientation toward fitness. No, they are a present for her daughter. Any inferences drawn from observed data are subject to the biases and misinterpretations of the observer. Inappropriate inferences such as these can affect the estimates in a demand analysis and the determination of financial feasibility in a highest and best use study.

When observation provides true data on current actions or behaviors, it might not provide data about or identify what will be done in the future or what was done in the past. A true observation could be made of a single action never done before and never envisioned for the future by the individuals being observed. Conversely, an observation could have been made of a true response by these individuals, but it might not translate to other individuals with similar attributes or characteristics.

Inferences about psychographic variables from observation must be made with great care. For example, a market analyst might discover that young, male professionals purchased renovated loft housing in the old industrial section of the downtown area of the city. These unmarried buyers worked in the offices downtown. Twelve units were sold to 12 members of this market segment who represented 10% of the single male and female workers in the downtown area. What is the future demand for loft apartments constructed in old industrial buildings in the downtown area? Can the analyst expect the other 108 single male and female workers to buy in this market? These 12 purchasers could be unique in their attitudes, desires, lifestyle, tastes, and preferences. Future demand could well be zero.

Questioning. The questioning process can provide data on the economic, demographic, and psychographic characteristics of the population in a given market. An individual can provide the analyst with information about age and income as well as about personal characteristics such as attitudes, opinions, habits, tastes, and preferences. But data obtained from questioning can be inaccurate because a question might be misunderstood, an answer might be fabricated, or the subject of the interview might choose not to reply. The question(s) might be improperly phrased and, thereby, elicit inappropriate or incorrect answers. The sample of people established for the interviews might not represent the population group that the analyst wanted to study. Actions, attitudes, and mannerisms of the interviewer might creep into the questioning process and generate responses that are nor appropriate or correct. The importance of survey research in market and marketability analysis requires the analyst to be aware of these potenti al problems.

Primary data gathering procedures, by their very nature, can lead to error. Some procedures lead to more error, or less accuracy; than others. The appraiser must be aware of these issues when gathering primary data and when using secondary data that originated as primary data. Some forms of secondary data are more accurate than other forms.

Primary and Secondary Data as Inputs to the Levels of Market Analysis

Primary and secondary data are significant inputs into appraisal and market analysis. Both forms of data enter into the analysis that underpins the three valuation approaches and the market analysis that enters into those analyses. An explicit discussion of the nature of the market analysis is presented in the Appraisal Institute's Market Analysis for Valuation Appraisals. (6) Chapter 2 of that text is titled "Levels of Market Analysis."

Figure 2.5 in Chapter 2 of Market Analysis for Valuation Appraisals serves as a good focal point for a more complete discussion of the significance of primary and secondary data in appraisals and market analysis. The columns titled "Work Item" and "Level of Study" are taken directly from Figure 2.5 and placed in Exhibit A of this article. The column on the right in Exhibit A tided "Primary and Secondary Data" is an addition to the original table; it shows when primary and secondary data are used for the various levels of market studies.

The "Primary and Secondary Data" column in Exhibit A presents a brief statement about the nature and use of the primary and secondary data in regard to each "Work Item" and "Level of Study." The significance of Exhibit A is that it displays how primary and secondary data enter the appraiser's analysis at many different levels of study and in each of the three valuation approaches. Therefore, it is important for the appraiser to appreciate the concern this article expresses for the errors, issues, and problems that can arise in gathering and using both primary and secondary data.

Sampling and Non-Sampling Errors

Statisticians use two terms to discuss the full extent of the issues related to handling data: "sampling" and "non-sampling" error. When primary data is generated by either observation or questioning, the resulting data contains whatever bias and error arose in the process of data gathering. The questioning process can produce both sampling and non-sampling errors. These errors are present in the data when they are summarized and presented as secondary data.

Sampling Error

Sampling error raises statistical issues surrounding the sample selection. It occurs when the sample chosen by the analyst does not accurately reflect the total group (the population) being studied. In a typical appraisal assignment, this would be the error arising when the best comparable properties are not selected for the appraisal or the survey of competition or when an inferior (inaccurate) population forecast is chosen for the study. In a statistical sense, sampling error happens because the sample is not a random sample of the population, i.e., each element of the population does not have an equal chance of being selected. This error could arise if the population is stratified but the sample represents only one or some of the strata. For example, if the true population is bimodal with regard to age, but the sample contains only the young, there has been a sampling error.

Non-sampling Error

Non-sampling error arises from problems during the observation or questioning phase of primary data gathering. Five general types of non-sampling error could arise in this phase: frame error, measurement error, sequence bias, interview bias, or non-response bias.

* Frame error occurs when the list that the analyst generates to represent the population omits certain individuals whose opinions, attitudes, or other characteristics will not otherwise be represented. For example, a telephone survey cannot contact people who do not have access to a telephone. An email survey cannot contact people who do not have access to a computer. Frame error can also occur when physical entities are considered. An appraiser might omit the best comparable from the sample of comparable properties used in the appraisal, thus introducing frame error.

* Measurement error (response error) arises when the individuals who respond to the questions give information that is not true. It also occurs when the analyst misinterprets observable facts. For example, if the physical condition of a structure is actually worse than the analyst believes, the measure of deterioration used in the appraisal analysis is incorrect. If the total number of shoppers in a store is misinterpreted as the number of buyers (the true customers), the estimate of total sales volume and of value would be overstated

* Sequence bias occurs when the order of the questions on a questionnaire or in an interview suggests or induces an idea or opinion in the mind of the respondent as a direct consequence of the manner in which the questions are sequenced.

* Interviewer bias occurs because of the presence or influence of an interviewer in a face-to-face or telephone interview. The interviewer might unknowingly bring out an untrue response to sensitive questions, e.g., the respondent may craft an answer to please the interviewer instead of answering truthfully or the interviewer might record a verbal response incorrectly because the statement is interpreted with the interviewer's bias. Interviewer bias can also occur if the interviewer asks questions that are designed to generate a given response, i.e., leading questions.

* Non-response bias occurs because individuals in the sample do not respond even though the analyst tried to contact them, when individuals who are contacted refuse to participate, or when individuals do not answer certain questions.

Sampling or non-sampling errors can underlie the generation of primary data. These errors then are carried forward when the data is summarized and presented as secondary data. Such errors are often undetectable when the secondary data is employed and can cause errors in the analysis. For ample, a traffic count could be underestimated because of a mechanical malfunction of the counting machine The traffic count is then recorded into the database, and subsequent users of this data rely on an underestimated traffic volume. This error could result in failure to widen a road if those making the decision to do so base their justification on an erroneous traffic count.

Bias and Errors in Secondary Data

Primary data is gathered to provide a body of descriptive statistics such as the U.S. Census publications and local traffic counts and made available to others. Primary data is gathered to answer specific questions that are not necessarily the same ones being addressed in an appraisal or market analysis. Very often the underlying purpose of the primary data, subsequently presented as secondary data, is not known. This can present a problem for the appraiser as shown in the following discussion.

Secondary data might be available at no cost to the public or obtained by membership in a trade organization or through a subscription to its publications. Often private data sources provide summarized versions of raw data collected by organizations involved in various types of research. These groups generate and make available the data as a by-product of their work. In other instances, the private proprietary organizations compile and market a database specifically to supplement secondary data available from public sources.

Secondary data is often available from both the original source, which collects and organizes the data, and from sources that simply summarize data collected by others and market the information. For example, the original source of secondary data for population characteristics is the U.S. Census of Population. When data is obtained directly from census publications, all the backup information is provided about data collection techniques and statistical methodologies used, possible inaccuracies, and other valuable background inputs. A detailed breakdown of all population characteristics for numerous geographical subdivisions is also available. However, an analyst might find it more convenient to use summarized versions of this population data presented in the Statistical Abstract of the U.S., which does not collect and organize, but simply distributes the secondary data. Organizations that perform the task of summarizing and distributing data are referred to as secondary sources of secondary data. The analyst should be aware of the advantages and disadvantages of using original sources of secondary data and of obtaining data that has been summarized by others.

Secondary data exposes the analysis in which it is used to a variety of possible errors and bias, but precautions are available to deal with them. The analyst will not be able to remove or overcome some of the errors, but knowledge of their existence will help in drawing informed conclusions and establishing some level of confidence in the judgments that result.

The analyst strives for accuracy by reducing the error in his or her investigation. Four categories of potential error that can reduce accuracy in secondary data are

1. Sampling and non-sampling errors (Discussed in a previous section)

2. Errors that invalidate the data

3. Errors that require data reformulation

4. Errors that reduce reliability

Secondary data should be checked for errors to verify its accuracy. If such validation cannot be accomplished, then secondary data should be regarded as suspect. Whenever it is possible, visual and/or statistical techniques should be employed to eliminate errors or to explicitly take them into account. Before moving to a discussion of these errors, a discussion of several related concepts is appropriate.

Issues Concerning Secondary Data

Both primary and secondary data should be accurate, reliable, precise, unbiased, valid, appropriate, and timely.

Accuracy. The accuracy of the data used in an analysis should be checked. The analyst generally collects primary data from a sample and, in specifying and selecting the sample, the analyst must be careful to reflect the nature of the true population accurately. This same principle underlies the use of secondary data; it must accurately reflect what is being studied. Accurate data reflects the true population parameter. A way of envisioning the issue of accuracy is to imagine a target that looks like Exhibit B, a bull's-eye with cross hairs. The intersection of the cross hairs at the center of the bull's-eye is the true population parameter. Data is accurate when several estimates of the population parameter are centered on the intersection of the cross hairs. In Exhibit B, the pattern of the asterisks (*) is accurate; the pattern of the number signs (#) is not accurate; it does not reflect the population's true parameters.

Reliability. (7) Reliability refers to reproducibility or replication of estimates. If the analyst measures the same variable several times, the data is reliable if the estimates are approximately the same. In other words, if the analyst draws two or more samples from the same population and the results are close, there is reliability in the sampling process. If the analyst uses two or more techniques to measure the same value (e.g., population), and the estimates are close together, the estimates can be considered reliable. The reliability of the sales comparison approach would be enhanced if the appraiser drew two sets or groups of comparable properties, made the necessary adjustments, and obtained approximately the same value indication. The reliability of the cost approach would be enhanced if the reproduction cost estimate from the Marshall and Swift Manual (adjusted as necessary even for location and time) is similar to the estimate obtained from local market sources.

In Exhibit B, the pattern of the asterisks represents reliable data because the pattern is tight. The pattern of the number signs is not reliable. The pattern of the dollar signs is also reliable, but it is not accurate.

To highlight the significance of these concepts, consider the following discussion. When the sales comparison and the cost approaches yield markedly different value indications even after a thoughtful and thorough reconciliation, there is a problem of reliability If the sales comparison approach and the cost approach yield the same value indication but they do not correctly evaluate the effects of subject property's location, the value indications can be reliable but not accurate.

Bias. Bias is the deviation of a statistical estimate from the true parameter the statistical procedure is designed to estimate. It is systematic error introduced into an analysis by the failure to follow proper procedure or by other errors in the database. The analyst strives for unbiased estimates. In Exhibit B the pattern of dollar signs is biased. It is a tight, reliable pattern that is not accurate. Some form of bias error pushes the pattern off to the left and up.

Validity. Validation is the process of checking to make sure proper procedures were followed in collecting, organizing, and analyzing the data. Data that has been validated is considered more accurate because more is known about its origin and characteristics. Consequently, more confidence can be placed in the use of validated data. An example of validation is the process of review appraisal, either internal or external to the firm. The array of comparables is checked to see that the best are used in the analysis and adjustments to the comparables are checked for appropriateness. Initial figures and growth rates in a discounted cash flow analysis (DCF) are checked for substantiation from market evidence.

Appropriateness. The analyst is also concerned that the data is appropriate. It must measure what it is supposed to measure; the sample must be taken from the correct population. Rent comparables are not sales comparables. Neighborhood shopping centers are not direct competitors of regional malls. The most egregious form of this error is boilerplate in a report. For example, the study focuses on residential or retail sales in a specified neighborhood, but the analyst provides data for the metropolitan area of the state.

Timeliness. The data must reflect the time period that governs the analysis. If current sales are the issue, 1990 income levels and customer counts are not timely in a 2002 study. Historical boilerplate included in many studies is a form of this error. A study focuses, for example, on the sales potential of the shopping center in the near term from 2002 through 2010, but the data reported in the study includes retail sales back to 1970.

Errors That Can Invalidate the Data

Secondary data might be contaminated and rendered invalid for use because of actions or attitudes of the person(s) or the orientation of the organization assembling the data. Data might reflect manipulation; contamination by ineptness, confusion, or carelessness; or concept error.

Manipulation. The organization gathering the data might manipulate or reorganize the data to meet a purpose that is unknown to others. The data could have been reorganized so that the collecting agency could show that its organizational goals were met. Similarly, the data might be manipulated to generate adverse conclusions about situations that the collecting agency opposes. If any such manipulation occurs, or even if there is a reasonable suspicion that it has occurred, the data should not be used.

This situation could occur if a city or county agency such as the building inspection department shows inspection dates and approvals in its records, but the inspections were not made or were made only superficially. The volume of inspections makes the department look good on paper, but the work was not done. In another situation, the county does nor have the funds to widen a road, so the agency deliberately undercounts the traffic volume.

Contamination by Ineptness, Confusion, or Carelessness. Organizations might collect, organize, and distribute data without properly specifying the particulars of the collection process, their data assembly procedures, or any data synthesis that was used. They also might not care about the data's quality and validity. The organization's staff may not know how to collect data. Whenever ineptness, confusion, or carelessness is suspected, the analyst should not use the data. This situation can occur in organizations that have a well-defined primary function and collect data only as a secondary activity. In such situations, the necessary care might not be given to the data collection activities. A hint to the existence of this problem arises when the appraiser asks for the data and the organization cannot find it or when the staff takes an inordinate amount of time to retrieve the data. Disorganization in the office might also suggest ineptness, confusion, or carelessness. For example, if the office staff is not s ure of just who handles the data, the analyst might want to look for the data from another source.

Concept Error. Another broad class of error that can invalidate data is called concept error. Data containing concept error can still be used, however, if the analyst can obtain information about the nature of the error. Concept error is defined as the error that arises because of the difference between the concept to be measured and the indicator, or specific item, that is used to measure that concept. Market analysis is replete with indicator variables that are surrogates for the data that the analyst cannot obtain. For example, the analyst may be seeking information about household income, which includes wages, salaries, rental income, interest income, and dividends. The indicator used to measure household income might report only wage and salary data. In this case, the indicator contains a large component of household income but does not include all sources of income that the household can receive. Use of this indicator variable might cause only a small error in a neighborhood in which all households are headed by wage earners, but it could cause a large error in a retirement community.

Concept error also can occur if the analyst is looking for information on the number of families that reside in a particular market area. If the analyst is seeking information about groups of individuals who are related by blood, marriage, or adoption and uses the indicator variable "households," the indicator overstates the number of families because it includes "primary individual households" (single person households) in addition to "family households." Although there is a distortion because of the concept differences in this case, adjustments can be made to bring the indicator data closer to the concept. The analyst may discover that the average family household comprises 3.5 persons and that 92 percent of all households are families. Adjustments could be made based on these figures to make the indicator data more useful. However, the nature of the adjustment and the extent of any error in the adjustment should be identified so that others may judge the reliability of the conclusions.

In another example, the analyst could be seeking information about the cost of living in a particular market area to find out the "real" expenditures for goods and services by households residing in that area. If the indicator variable used is the national consumer price index (CPI) data, the indicator may not accurately represent local expenditures. The cost of filling the market basket for the typical household nationwide may not be the same as the cost paid to fill that same basket by the typical household residing in the market area. Moreover, there may be a substantial difference between the typical national household and the typical local household.

An error can result from an indicator variable that does not grasp the complexity of the concept variable. A minimum number of customers in a trade area may have been postulated for the success of a new store. However, the critical issue is nor the number of customers; rather, it is the purchasing power directed toward the product or service being provided by that store. The 6,000 people living in the store's trade area may be irrelevant as a success criterion if the

* Tastes and preferences of the people do not favor the product or service of the store

* Preference patterns of the people change over time

* Income level of the people changes

* Prices of substitute products changes relative to the peoples' income level

* Price of the product or service changes relative to the peoples' income level

* Price of substitute products changes relative to the price of the product being offered

Finally, marker analysts sometimes try to gauge the purchasing power in a retail trade area by multiplying the number of households by the median household income. Here the appropriate income measure is the mean household income. Median income distorts the measure of purchasing power in an income distribution that is skewed.

Concept error can, but does not necessarily, invalidate the data and the analysis. The analyst may decide to use the data even though concept error is present and handle it with a variety of techniques. The decision to use the data depends on the following considerations.

* The size of the discrepancy between the concept and the indicator. If the size of the discrepancy is small and the indicator responds similarly to the same causal factors that affect the concept, the data could be used.

* The purpose of the analysis. An exploratory study is able to tolerate larger errors than a study that is designed to test fairly explicit hypotheses.

* The availability of valid or accurate data. If accurate data exists, it should be used. But this statement must be tempered by a recognition of the cost of accurate data and the time constraints under which a study is being made. If accurate data is costly and cannot be obtained in a timely manner, the analyst might decide to use data that contains some degree of concept error. In such instances, the analyst should understand the nature of the concept error as well as its magnitude and direction.

In any event, the analyst needs to be aware that concept error may be present in the data used for the analysis. In too many instances the existence of concept error is never realized and its possible effect on the study is never considered.

Errors That Require Data Reformulation

Secondary data is sometimes not directly useful to the analyst because it does nor adequately measure the concept being studied. Errors commonly result from the following four types of situations:

* Changing circumstances

* Inappropriate transformations

* Inappropriate temporal extrapolations

* temporal temporal recognition

Errors Caused by Changing Circumstances. This type of error is caused by a change that affects a data series but is not readily apparent in that data series. For example, a change in the geographic boundaries of a Metropolitan Statistical Area (MSA) can occur when the Bureau of the Census adds a county to the MSA. If the geographic size of an MSA increased between 1990 and 2000 because a new county was added in 2000, MSA population statistics for each of these years would nor be comparable until the analyst added the county to the 1990 MSA figure. In reality, the population growth figure from 1990 to 2000 reflects a growth in population of the 1990 spatial area and the addition of the population in the county that was added to the MSA in 2000. Since this additional county's population did not appear in 1990, its inclusion in the population figure for 2000 exaggerates the real growth rate in population.

A change in the underlying unit of measurement can occur. For example, a data series that presented monthly statistics is now presented on a bimonthly basis. For the sake of consistency, the analyst might need to choose one or the other presentation format. The analyst must either combine previous monthly data into bimonthly groupings as currently used or split the current bimonthly data into monthly statistics.

The unit of measurement could also have changed because of a shift in the collection time period. If sales that were initially measured from September 1 to August 31 of the following year are now given for the calendar year, there is a point in the data where a 9-month unit of measure is followed by a 15-month unit of measure. This could appear as a spike in the data that does not really exist.

An error can arise because the concept being measured is redefined over time and across space. An example is the gross leaseable area in an office building. It may have been presented as an interior measure in the past but, due to space shortages, current standards use the exterior dimensions of the structure. Moreover, the concept of net leaseable area can also change with local customs regarding space and its inclusion in or exclusion from the common-area definition.

Errors That Arise from Inappropriate Transformations. Original data is often presented in secondary data sources in categories that were created to make the data more presentable in a tabular format, or the original categories do not reflect the analyst's needs to handle the task at hand. Moreover, data may be presented as a ratio that made sense for their original purpose but do not make sense in the context of the analyst's current study. To make this type of error more specific, consider the following situations.

The indicator variable can use the wrong base measurement. For example, the data might be housing occupancy costs (utility payments, insurance, property tax) per capita when in fact housing occupancy costs are more appropriate on a household basis. Another example would be educational expenses per capita when the more appropriate measure would be educational expenses per student. In each of these examples, the per capita base was appropriate for the local government's study but is not the most appropriate measure for the real estate analyst's current study.

Secondary data can be presented in groupings such as household income, which distribute the population characteristic. The categories can change from one reporting to another. For example, the top income category in a series might have been $75,000 and up; then a decade later, the top income category might be $100,000 and up. In this instance, a new category or categories will have to be created for the $75,000 to $100,000 range. This classification problem can exist within a data series and/or between data series. Sometimes the analyst can reassemble one set of categories to resemble the categories or classification scheme used in another data series. Inappropriate base measures or inappropriate and possibly changing classification of data result in data that does not perfectly provide the information needed to resolve the analyst's problem. Unless the analyst transforms the data, the analysis can be flawed.

Errors from Inappropriate Temporal Extrapolations. Secondary data often is not available for the intervening periods (months, quarters or years) between published reports. Data for intervening period(s) has to be interpolated from the two nearest reporting years. An example is the situation in which the analyst needs population information for a specific census tract for 1998, but the secondary data exists for only 1995 and 2000. Using only these two points, the interpolation for 1998 can be made as a straight line or as an exponential rate of change at an increasing or decreasing rate. Without knowing the true path of change between these two points, any one of three answers can be obtained for the 1998 figure. Typically, the interpolation would be made using an average, annual, straight- line rate of change between 1995 and 2000.

The shape of the curve showing exponential change can be estimated by analyzing another related data series in which the variable moves in approximately the same direction and at the same magnitude. For example, employment data exists for each year between 1995 and 2000 and there is a stable relationship between employment and population. Here the known pattern of change in employment can reflect the unknown pattern of change in population from 1995 to 2000.

Errors from Inappropriate Temporal Recognition. The most common error of this type arises from a misunderstanding of the time dimension of the secondary data. The 2000 census reports information as of April 1, 2000, for almost all of the variables. The most notable exception is the income variable, for which data is gathered as of 1999. Very often this data is used as 2000 income, which can either overestimate or underestimate the true income for the individual and the market area.

An even more glaring error is the use of data in the year of its publication instead of the year for which and during which it was gathered. There is always a time lag between the time the primary data is gathered and the time it is made available. A 2002 publication more than likely contains data gathered at an earlier date such as 2001; in some instances the time lag can be even longer.

Errors That Reduce Reliability

A data set is reliable if successive counts produce the same result. Reliability is not accuracy; the data set is accurate only if it is free from procedural and measurement errors. An inaccurate data set can be reliable if it maintains the same degree of inaccuracy.

The reliability of data is a function of the organization that gathers, organizes, records, and publishes the secondary data. Several issues should be considered when evaluating the organization that is collecting and disseminating the data such as whether data collection is the stated purpose of the organization or merely a secondary or adjunct function. Another issue is whether the individuals and staff that undertake data collection are trained and experienced in data-collection procedures. The analyst should also determine if the organization has adequate resources to do a thorough job.

Errors causing the analyst to question the reliability of the data fall into three categories: clerical, changes in collection procedures, and failure to use correct data.

Clerical Errors. Clerical errors are a frequent occurrence, and they happen to the most careful people. To detect the existence of clerical errors, the data might be displayed in an easily comprehended manner (e.g., a scatter plot diagram or a simple table). In this way, outliers (data entries that are substantially different from the rest of the data set) can be detected more easily. This procedure will allow the analyst to catch the misplaced decimal, the added zero, or the extra digit.

Another error of this type is the transposition of numbers in a series of numbers with the same number of digits. The true number 2907 can become 2097 or even 9270. A plot of the values will allow the analyst to catch this outlier error.

Error Due to Changes in Collection Procedures. When error results from a change in collection procedures, the data generated may be quite different from previous data in the same data set. This error can arise because of different methods of collection or different circumstances surrounding the collection. For example, the time of collection (time of day, day of week, season, year, etc.) might have changed. The manner in which the data is summarized might also change. The use of the scatter plot or a simple review of the raw data could reveal discontinuity or a jump in the data points attributable to the change in the collection procedures.

Error Due to Corrected Data. Data can be inconsistent from one report to another in the same published series because of errors that have been discovered, corrected, and then reflected in subsequent versions of the data set. Most often these are clerical errors. The analyst needs to use the most recent version to reduce errors. Also, if possible, the analyst should know when data is checked and when a clean version of that data is printed. When using secondary data that has been reorganized at some point, always check it against the newest versions of that data set. Another situation of data series correction occurs when secondary data vendors are able to calibrate their prior estimates or forecasts for a decennial year such as 2000 against actual Census numbers for 2000. The previous estimates or forecasts are adjusted and corrected in subsequent publications. Here the same advice is offered: always check the data against the newest versions of the data set.

Interesting Questions

Is information about comparable properties primary or secondary data?

Property sales in the form of deeds are recorded in the county clerk's office to provide constructive notice of ownership; this is an example of primary data. When a sale price extrapolated from transfer tax stamps is used for a comparable property, it is secondary data. The information about the attributes or characteristics of a property that a listing agent places on a listing form to describe the property is primary data. When the appraiser uses that same data from a multiple listing service (MLS) database to select a comparable property; the data becomes secondary data. When the appraiser inspects the subject property, measuring its size; describing its shape, topography, structural components, and characteristics; and inspecting its interior design and condition, primary data is being gathered. When that same data is placed into a database, it becomes someone else's secondary data. When the appraiser inspects a prospective comparable property and specifies its attributes, characteristics, and condition, it is primary data. When this information is used in a subsequent report for another subject property, the result is secondary data.

Is all secondary data the same?

There are primary sources of secondary data and secondary sources of secondary data. Census publications are primary sources of secondary data. When this secondary data is taken into a process that augments, modifies, summarizes, synthesizes, updates, or in any way manipulates the data, the output of that process is a secondary source of secondary data. All private vendors who sell updated census information for any spatial area are secondary sources of secondary data. The Chamber of Commerce data is a secondary source of secondary data. The significance of this distinction is simply that each time data passes through a process, the chance of error increases.

Summary

This article deals with issues the appraiser and the analyst must face when obtaining and using data in an appraisal or a real estate market analysis. The first task is to decide whether to use primary or secondary data. That decision is often based on the type of data needed for the study and whether it is available from secondary sources. The characteristics of the two types of data and the collection issues associated with each need to be recognized. Then the data must be examined to determine if it is sufficiently accurate, reliable, unbiased, valid, appropriate, and timely for the analyst's use. Finally; the types of errors found in data, the risks they present and the options for handling them must be weighed. These topics as well as methods of determining the adequacy of secondary data are discussed in this article.

(1.) The Appraisal Institute, The Dictionary of Real Estate Appraisal, 4th ed. (Chicago: The Appraisal Institute, 2002), 219.

(2.) Ibid., 259.

(3.) Appraisal Institute, The Appraisal of Real Estate, 12th ed. (Chicago: The Appraisal Institute, 2001), 135.

(4.) Ibid., 438

(5.) Gilbert A. Churchill, Marketing Research: Methodological Foundations, 3rd ed., (Chicago: The Dryden Press, 1983), 740-741.

(6.) Stephan F. Fanning, Terry V. Grissom, and Thomas D. Pearson, Market Analysis for Valuation Appraisals, (Chicago: The Appraisal Institute, 1994).

(7.) The term precision is often used as a synonym for reliability.

Joseph S. Rabianski, PhD, is the chairman of the department of real estate and the Richard Bowers & Company professor of real estate at Georgia State University in Atlanta. He has taught graduate and undergraduate classes in real estate market analysis for three decades. He was co-author of the Appraisal Institute's first market analysis course in 1982. He has been an approved instructor of five courses and a developer of several seminars in the 1980s and early 1 990s for the Appraisal Institute. His articles have appeared in previous issues of The Appraisal journal and he is a co-author of an Appraisal Institute text, Shopping Center Appraisal and Analysis. Contact: Department of Real Estate, P.O. Box 4020, Georgia State University, Atlanta, GA 30302-4020. (404) 651-4609; fax: (404) 651-3396; email:jrabianski@gsu.edu.
Exhibit A

Levels of Market Analysis

                                          Level of Study
Work Item                    A         B         C         D

Location

General description--        X         X         X         X
  city and neighborhood

Specific analysis of                   X         X         X
  site linkages


Specific analysis                      X         X         X
  of urban growth
  determinants


Detailed competitive                             X         X
  location rating




Detailed probable future                                   X
  land use analysis




Demand Analysis

General evidence of          X         X         X         X
  sales/leasing activity





General city growth          X         X         X         X
  trends





Analysis of overall                    X         X         X
  market absorption from
  secondary sources




Demand forecast by                               X         X
  specific projections
  of population,
  employment, and
  income


Demand forecast for                              X         X
  subject market
  segment




Direct attitudinal                                         X
  survey of target
  market

Competitive Supply
Analysis

Vacancy rates for                      X         X         X
  selected comparables

Vacancy rate from                                X         X
  secondary data--broad
  market surveys

Field research on all                            X         X
  competitive properties

Research on proposed                             X         X
  properties--field
  inspection, building
  permit analysis,
  identification of
  potential sites

Detail competitive                               X         X
  amenities rating




Direct interviews                                          X
  with developers


Highest and Best Use
  Conclusion and
  Marketability or Timing

Vacant Land

Probable use and timing,     X
  but no specific timetable
  for development

Generalized land use plan

* Probable use supported               X
  by present value analysis




* Timing supported by                  X         X         X
  secondary data






Specific land use plan

* Probable use supported by            X         X         X
  present value analysis




* Land plan drawn to site                        X         X



* Timing based on marginal                       X         X
  demand and competitive
  rating analysis





* Cost estimate of subject                                 X
  development




* Value impact analysis of                                 X
  alternative marketing/
  development strategy




Improved Properties

General ad hoc judgments     X











NOI projection supported by  X         X         X         X
  performance of selected
  comparables





Use, timing, NOI projection            X         X         X
  supported by analysis of
  secondary data





Capture rate/NOI projection                      X         X
  supported by marginal
  demand of market segment
  and competitive ratings



Risk analysis of NOI                                       X
  forecast




Value impact analysis of                                   X
  alternative marketing/
  development strategies





Work Item                    Primary And Secondary Data

Location

General description--        Secondary data in each level of study.
  city and neighborhood

Specific analysis of         Primary data in each level of study
  site linkages                gathered by observation in the form
                               of field research.

Specific analysis            Primary data in each level of study
  of urban growth              gathered by observation in the form
  determinants                 of field research. Secondary data
                               from public and private sources.

Detailed competitive         Primary data in each level of study
  location rating              gathered by observation in the form
                               of field research. Could involve
                               primary data gathered by questioning
                               of market participants and experts.

Detailed probable future     Primary data gathered by observation
  land use analysis            in the form of field research
                               and could involve primary data
                               gathered by questioning of market
                               participants and experts.

Demand Analysis

General evidence of          Secondary data in each level. Level C
  sales/leasing activity       and D studies can be enhanced by
                               primary data gathered by observation
                               of development patterns and
                               questioning of market participants
                               and experts.

General city growth          Secondary data in each level. Level C
  trends                       and D studies can be enhanced by
                               primary data gathered by observation
                               of development patterns and
                               questioning of market participants
                               and experts.

Analysis of overall          Secondary descriptive data can be
  market absorption from       supplemented in level C and D
  secondary sources            studies by primary data gathered by
                               questioning market experts and
                               participants to generate predictions
                               of future absorption phenomenon.

Demand forecast by           Secondary descriptive data and, where
  specific projections         available, secondary predictive
  of population,               data. Primary data in the form of
  employment, and              judgment leading to the prediction
  income                       and primary data gathered by
                               questioning market experts.

Demand forecast for          Secondary descriptive data and, where
  subject market               available, secondary predictive
  segment                      data. Primary data in the form of
                               judgment leading to the prediction
                               and primary data gathered by
                               questioning market experts

Direct attitudinal           Primary data gathered by questioning
  survey of target             of market participants and experts.
  market

Competitive Supply
Analysis

Vacancy rates for            Primary data gathered by observation;
  selected comparables         secondary data, where available.

Vacancy rate from            Secondary data
  secondary data--broad
  market surveys

Field research on all        Primary data gathered by observation;
  competitive properties       secondary data where available.

Research on proposed         Primary data gathered by observation
  properties--field            of potential sites and by
  inspection, building         observation of public records;
  permit analysis,             secondary data generated from
  identification of            public records where available.
  potential sites

Detail competitive           Primary data gathered by observation
  amenities rating             of the existing and proposed
                               competitive properties; primary
                               data gathered by questioning of
                               market participants and experts.

Direct interviews            Primary data gathered by questioning
  with developers              of developers, public officials,
                               and market experts.

Highest and Best Use
  Conclusion and
  Marketability or Timing

Vacant Land

Probable use and timing,     Primary data gathered by observation of
  but no specific timetable    the subject site and the market area it
  for development              will serve.

Generalized land use plan

* Probable use supported     Primary data gathered by observation of
  by present value analysis    the subject site and the market area it
                               will serve. Secondary data for market
                               norms for the line items in the
                               discounted DCF analysis.

* Timing supported by        Primary data gathered by observation of
  secondary data               the subject site and the market area it
                               will serve. Secondary data for market
                               norms for the line items in the
                               discounted DCF analysis. Secondary data
                               used to predict the appropriate time to
                               bring the development to the market.

Specific land use plan

* Probable use supported by  Primary data gathered by observation of
  present value analysis       the subject site and the market area it
                               will serve. Secondary data for market
                               norms for the line items in the
                               discounted DCF analysis.

* Land plan drawn to site    Primary data gathered by observation of
                               the subject site and the competitive
                               properties.

* Timing based on marginal   Primary data gathered by observation of
  demand and competitive       the subject site and the market area it
  rating analysis              will serve. Secondary data for market
                               norms for the line items in the
                               discounted DCF analysis. Secondary data
                               used to predict the appropriate time to
                               bring the development to the market.

* Cost estimate of subject   Secondary data from cost service manuals
  development                  to estimate land development costs.
                               Primary data gathered by questioning
                               local land developers and contractors
                               about costs.

* Value impact analysis of   Primary data gathered by questioning
  alternative marketing/       potential users regarding marketing
  development strategy         tactics for the property and the target
                               market's evaluation of the structural
                               and design amenities than can be offered
                               by the property.

Improved Properties

General ad hoc judgments     Primary data gathered by observation to
                               obtain a sense of competitive properties
                               in the market. Primary data gathered by
                               questioning about the values for line
                               items in the DCF analysis. Secondary data
                               for operating expense norms. Primary
                               data gathered by questioning market
                               experts and market participants about
                               customer beliefs concerning what the
                               market does offer and what it could
                               offer.

NOI projection supported by  Primary data gathered by observation of
  performance of selected      the subject site and the market area it
  comparables                  will serve. Secondary data for market
                               norms for the line items in the
                               discounted DCF analysis. Secondary data
                               used to predict the appropriate time to
                               bring the development to the market.

Use, timing, NOI projection  Primary data gathered by observation of
  supported by analysis of     the subject site and the market area it
  secondary data               will serve. Secondary data for market
                               norms for the line items in the
                               discounted DCF analysis. Secondary data
                               used to predict the appropriate time to
                               bring the development to the market.

Capture rate/NOI projection  Secondary data from cost service manual
  supported by marginal        to estimate land development costs and
  demand of market segment     the construction cost of the
  and competitive ratings      improvements. Primary data gathered by
                               questioning local land developers and
                               contractors about costs.

Risk analysis of NOI         Primary data gathered by questioning of
  forecast                     market experts about the prospects for
                               rent growth, vacancy rates, sale price,
                               interest rates, and inflation rates over
                               the holding period.

Value impact analysis of     Primary data gathered by questioning,
  alternative marketing/       potential users regarding marketing
  development strategies       tactics for the property and the target
                               market's evaluation of the structural
                               and design amenities explicitly offered
                               by the property.
Gale Copyright:
Copyright 2003 Gale, Cengage Learning. All rights reserved.