Monash University
Browse
main.pdf (3.67 MB)

Scalable Non-Markovian Sequential Modelling for Natural Language Processing

Download (3.67 MB)
thesis
posted on 2017-10-16, 00:57 authored by EHSAN SHAREGHI NOJEHDEH
We show that finite-order Markov models fail to capture long range dependencies that exist in human language and propose infinite-order non-Markovian (Bayesian and non-Bayesian) models which are capable of capturing unbounded dependencies. Presenting the structure of an infinite-order model amounts to a significant memory usage, and its very large space of parameters introduces computational and statistical burdens in the learning phase. We propose a framework based on compressed data structures which keeps the memory usage of modelling, learning, and inference steps independent from the order of the models. Our approach scales nicely with the order of the Markov model and data size, and is highly competitive with the state-of-the-art in terms of the memory and runtime, while allowing us to develop more accurate models.

History

Campus location

Australia

Principal supervisor

Gholamreza Haffari

Additional supervisor 1

Trevor Cohn

Additional supervisor 2

Ann Nicholson

Year of Award

2017

Department, School or Centre

Information Technology (Monash University Clayton)

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology