Locating Health Statistics

This guide includes health statistic guides, tools, and resources as well as related general statistics resources

Why can't information be found on X?

Short Answer  Statistics you are looking for have not been collected or they are not easily located.

Long Answer

Most organizations and agencies collect statistics solely to support their objectives which do not always coincide with the needs of the searcher. At times the statistics one is searching for is neither collected at all or is analyzed in a way that does not relate to ones research question.* At times the data you are looking for may not been collected, or it has not been analyzed in the manner in which you need. You may need to obtain the original data and perform your own analysis.*** Remember that data collection is done to meet organizational needs which may not reflect your needs!

Statistical information is generated by a large number of international, national, state, and local entities.

Many entities publish statistics on-line. However the particular number, answer or data you need may not be retrievable through search engines. Data contained in Web sites is often hidden because search engines do not search the entire contents of Web pages. 

Statistics collected by government agencies and nonprofit organizations are more likely to be publicly available than those collected by for profit entities. However not all government statistics are mandated to be made public or provided free of charge. Private organizations often charge for obtaining copies of their findings. **

Presently subject access through most indexing sources (as PubMed and CINAHL) is still developing. So many statistics are thus challenging to locate because of inadequate subject headings.  This is especially true of federal government resources. When possible, see if the reporting agency has the statistics you need. If not, then you will need to search the journal literature in a bibliographic database like PubMed or Be sure to read the entire retrieved articles for any and all contained statistical information. Some topics as disability statistics are challenging to find partly because the terms (as disability) are  open to a wide range of definitions. Again, lack of good controlled vocabularies hinder data location. 

Also note that data collection coverage by sources do not always go back as far as one would wish. For example, US government agencies have only been mandated to collect certain statistics since 1956***.  Also agencies and organizations are decentralized. They may vary in how they collect, describe, and report their findings over time. ***

It may take several years for an entity to collect, analyze and publish statistical information on a topic or group of topics. (Click here for a diagram of the processes involved in publishing health statistics). This is especially true when large populations are involved, as the US Census.  The quality of statistics varies among organizations and agencies. Factors include how the data was collected and how the data was analyzed. **


A strategy for finding statistical information



Please do not hesitate to contact a Mulford or Carlson Reference Librarian with any challenging research question (whether or not they are statistic related).

Here is a strategy that may be useful, including those times when a librarian is temporarily unavailable (as nights/weekends).

  • Think about what kind of statistics are needed. For example, is it about disease occurrence or epidemiology? vital statistics? demographics?
  • Name the variables being addressed, as a population group, time period, or geographic location.
  • Consider resources (print or on-line) that may contain the needed information.
    • Think about which government or private organization might collect the statistic needed and search their Web sites as well as the library catalog (author search by agency/organization).
    • Also consider secondary sources as journal literature (as Pubmed or CINAHL) and agency reports.
      Many agencies publish statistics in articles (A list of UT journals and research databases may be found here)

      If a related statistic is found, locate the Web site of the agency or organization.
      Search the found agency/organization Web site for needed informatio

      Search any bibliographies at the agency's Web site for articles or reports that may contain the needed information.


Why search is not a solved (by google) problem, and why Universities Should Care: Ophir Frieder’s Talk

Many consider “searching” a solved problem, and for digital text processing, this belief is factually based.  The problem is that many “real world” search applications involve “complex documents”, and such applications are far from solved.  Complex documents, or less formally, “real world documents”, comprise of a mixture of images, text, signatures, tables, etc., and are often available only in scanned hardcopy formats.   Some of these documents are corrupted.  Some of these documents, particularly of historical nature, contain multiple languages.  Accurate search systems for such document collections are currently unavailable.

The talk discussed three projects. The first project involved developing methods to search collections of complex digitized documents which varied in format, length, genre, and digitization quality; contained diverse fonts, graphical elements, and handwritten annotations; and were subject to errors due to document deterioration and from the digitization process. A second project involved developing methods to enable searchers who arrive with sparse, fragmentary, error-ridden clues  about places and people to successfully find relevant  connected  information in the Archives Section of the United States Holocaust Memorial Museum. A third project involved monitoring Twitter for public health events without relying on a prespecified hypothesis.

Across these projects, Frieder raised a number of themes:

  • Searching on complex objects is very different from searching the web. Substantial portions of complex objects are invisible to current search. And current search engines do understand the semantics of relationships within and among objects — making the right answers hard to find.
  • Searching across most online content now depends on proprietary algorithms, indices, and logs.
  • Researchers need to be able to search collections of content that may never be made available publicly online by Google or other companies.


some areas of science, such as the social sciences, increasingly rely on proprietary collections of big data from commercial sources. Much of this growing evidence base is currently accessible only through proprietary API’s. To meet the heightened requirements for transparency and reproducibility, stewards are needed for these data who can ensure nondiscriminatory long-term research access.

More generally, it is increasingly well recognized that the evidence base of science not only includes published articles, community datasets (and benchmarks); but also may extends to scientific software, replication data, workflows, and even electronic lab notebooks