Togaware
freedom is in everyone's language Frihed Vrijheid Liberté Freiheit Ελευθερία Свобода Bebas Libertad


Home
Services
Freedoms
Resources

- Rattle

- OpenMoko

- Data Mining

- GNU/Linux

- LaTeX

Supporting

- Analytics/IAPA

- AusDM

- PAKDD

Hosting

- Dirt Navigator

- Gallery

About Us


Canberra Analytics Group

Record Linkage and Machine Learning
William E. Winkler, Principal Researcher, US Census Bureau

WHEN

11:30-12:30, Friday 15 April 2005

ABSTRACT

Although terminology differs, there is considerable overlap between record linkage (data cleaning) methods based on the Fellegi-Sunter model (JASA 1969) and Bayesian networks used in machine learning (Mitchell 1997). Both are based on formal probabilistic model that can be shown to be equivalent in many practical situations (Winkler 2000). When no missing data are present in identifying fields and training data are available, then both can efficiently estimate parameters of interest. When missing data are present, the EM algorithm can be used for parameter estimation in Bayesian Networks when there are training data (Friedman 1997) and in record linkage when there are no training data (unsupervised learning, Winkler 1988, 1989; Ravikumar and Cohen UAI 2004).

This talk describes some of the current methods of approximate string comparison for accounting for typographical error between strings, hidden Markov models for adaptive name and address parsing, methods of semi-supervised learning, fast indexing and retrieval methods for comparing records from files with hundreds of millions or billions of records (Yancey and Winkler 2003), and methods of creating information and data structures during linkage processes.

LOCATION

Room G35 Ground Floor John Dedman Building 27 ANU. It is located between the Union Building and the Drill Hall. G35 is on the western side near Sullivans Creek. There is a paid parking area corner of Childers St and Hutton St. This is located near the John Dedman Building on the eastern side

LUNCH

For those who wish to socialise after the presentation, we will adjourn to a nearby eatery for lunch.

DINNER

We will be having dinner with Bill on the same evening. Warwick Graco (Warwick.Graco@ato.gov.au) is organising and will advice of location closer to the date. Please let Warwick know if you would like to attend.