Abstract
We provide a general framework for the integration of formal semantics with probabilistic reasoning. This framework is conservative, in the sense that it relies only on typed \(\lambda \)-calculus and is thus compatible with logical systems already in use. The framework is also presented modularly, in that it regards probabilistic effects (i.e., sampling and marginalization) as side effects, using continuations. We show how our framework may be used to build probabilistic programs compositionally within typed \(\lambda \)-calculus and then illustrate its use on two applications: semantic learning and pragmatic inference within the Rational Speech Act framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The \(\lambda \)-homomorphisms that we employ map one higher-order language into another, preserving variables, abstractions, applications, pairing, and projection. They are accompanied by type-homomorphisms \(\overline{ \alpha }\) which, for us, preserve implication and products (i.e., \(\overline{ \alpha \rightarrow \beta } = \overline{ \alpha } \rightarrow \overline{\beta }\) and \(\overline{ \alpha \times \beta } = \overline{ \alpha } \times \overline{\beta }\)), but which may in principle affect base types. In general, if \(M : \alpha \), then \(\llparenthesis M\rrparenthesis : \overline{ \alpha }\). The motivation for these constraints is that they provide meanings to the constants of the source language, leaving the surrounding \(\lambda \)-calculus unaffected (as analogous to a traditional model-theoretic interpretation). In this case, both \(\llparenthesis \cdot \rrparenthesis \) and its associated type homomorphism are trivial, mapping both constants and base types onto themselves.
- 2.
There is some precedent for this representation of probabilistic programs, by Mohammed Ismail and Shan [17], who describe a small typed probabilistic programming language and provide a denotational semantics for it in terms of continuations. Our formulation is chiefly inspired by the dependently typed language of Bernardy et al. [3]. See also Jansson et al. [13].
- 3.
Here, we leave \(\mathcal {N} : d_{tall} \times d_{tall} \rightarrow (d_{tall} \rightarrow r) \rightarrow r\) unanalyzed. In general, computing a continuous distribution \(\mathcal {D} : p_1 \times ... \times p_n \rightarrow (d \rightarrow r) \rightarrow r\) over \(d\) amounts to computing
$$\lambda \langle p_1, ..., p_n\rangle , f.\int _{-\infty }^\infty \text {PDF}_{\mathcal {D}(p_1, ..., p_n)}(x) * f(x) dx$$where PDF\(_{\mathcal {D}(p_1, ..., p_n)}\) provides the probability density function associated with \(\mathcal {D}\) (given parameters \(p_1, ..., p_n\)). Such integrals don’t in general admit closed-form solutions, and so one must resort to approximations. We implement this via Markov chain Monte Carlo sampling in our Haskell implementation, using the library at https://github.com/jyp/ProbProg.
- 4.
Some may recognize it as akin to the \(guard\) function of Haskell’s MonadPlus and Alternative classes.
- 5.
Note that we define this posterior in terms of a joint prior distribution \(P_{L_1}(w, \theta )\). Lassiter and Goodman [14] assume the prior distributions over world states and linguistic parameters to be independent, with an effectively uniform prior over parameters.
- 6.
That is, \( observe ( \phi )(f) = factor (\mathbbm {1}( \phi ))(f)\).
- 7.
An alternative, syntactically closer to the discrete case, relies on the Dirac \({ \delta }\) distribution, whose value is zero everywhere except when its argument is zero, and whose total mass sums to one. Thus we recover a non-zero result after integration:
$$\text {PDF}_p = \lambda x.p(\lambda y. \delta (x - y))$$.
- 8.
More accurately, we would take \(U\) to be uniform over a finite set, \(S_U\). Thus we would define it as \(U = \lambda k.\varSigma _{u \in S_U}k(u)\).
- 9.
To implement the definition of cost employed by RSA models, for example, \(U^*\) could be \(U \star \lambda u. factor (e^{- \alpha * C(u)}) \star \lambda {\diamond }. \eta (u)\), given some uniform distribution \(U\).
- 10.
Emerson [5] advocates yet a third approach to RSA, in which linguistic parameters are marginalized out in the listener model altogether.
- 11.
Systematically, if \( \alpha \) tends to \(\infty \); probabilistically, otherwise.
- 12.
Available at https://github.com/juliangrove/grove-bernardy-lenls18.
References
Barker, C., Shan, C.C.: Continuations and Natural Language, vol. 53. Oxford Studies in Theoretical Linguistics (2014)
Bernardy, J.P., Blanck, R., Chatzikyriakidis, S., Lappin, S., Maskharashvili, A.: Predicates as boxes in Bayesian semantics for natural language. In: Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland, pp. 333–337. Linköping University Electronic Press (2019). https://www.aclweb.org/anthology/W19-6137
Bernardy, J.P., Blanck, R., Chatzikyriakidis, S., Maskharashvili, A.: Bayesian natural language semantics and pragmatics. In: Bernardy, J.P., Blanck, R., Chatzikyriakidis, S., Lappin, S., Maskharashvili, A. (eds.) Probabilistic Approaches to Linguistic Theory. CSLI Publications (2022)
Charlow, S.: On the semantics of exceptional scope. Ph.D. thesis, NYU, New York (2014). https://semanticsarchive.net/Archive/2JmMWRjY
Emerson, G.: Probabilistic lexical semantics: from gaussian embeddings to bernoulli fields. In: Bernardy, J.P., Blanck, R., Chatzikyriakidis, S., Lappin, S., Maskharashvili, A. (eds.) Probabilistic Approaches to Linguistic Theory. CSLI Publications (2022)
Girard, J.Y.: Interprétation fonctionnelle et élimination des coupures de l’arithmétique d’ordre supérieur. Ph.D. thesis, Université Paris 7 (1972)
Goodman, N.D., Frank, M.C.: Pragmatic language interpretation as probabilistic inference. Trends Cogn. Sci. 20(11), 818–829 (2016). ISSN 1364-6613. https://doi.org/10.1016/j.tics.2016.08.005. https://www.sciencedirect.com/science/article/pii/S136466131630122X
Goodman, N.D., Lassiter, D.: Probabilistic semantics and pragmatics uncertainty in language and thought. In: Lappin, S., Fox, C. (eds.) The Handbook of Contemporary Semantic Theory, pp. 655–686. Wiley (2015). ISBN 978-1-118-88213-9. https://doi.org/10.1002/9781118882139.ch21. http://onlinelibrary.wiley.com/doi/abs/10.1002/9781118882139.ch21, section: 21 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118882139.ch21
Goodman, N.D., Mansinghka, V.K., Roy, D., Bonawitz, K., Tenenbaum, J.B.: Church: a language for generative models. In: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2008, Arlington, Virginia, USA, pp. 220–229. AUAI Press (2008). ISBN 978-0-9749039-4-1
Goodman, N.D., Stuhlmüller, A.: Knowledge and implicature: modeling language understanding as social cognition. Top. Cogn. Sci. 5(1), 173–184 (2013). ISSN 1756-8765. https://doi.org/10.1111/tops.12007. https://onlinelibrary.wiley.com/doi/abs/10.1111/tops.12007
Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J.L. (eds.) Syntax and Semantics. Speech Acts, vol. 3, pp. 41–58. Academic Press, New York (1975)
Grove, J., Bernardy, J.P., Chatzikyriakidis, S.: From compositional semantics to Bayesian pragmatics via logical inference. In: Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA), Groningen, The Netherlands, pp. 60–70. Association for Computational Linguistics (2021). https://aclanthology.org/2021.naloma-1.8
Jansson, P., Ionescu, C., Bernardy, J.P.: Probability theory. In: Domain Specific Languages of Mathematics. Texts in Computing, no. 24, pp. 223–246 (2022)
Lassiter, D., Goodman, N.D.: Context, scale structure, and statistics in the interpretation of positive-form adjectives. Semant. Linguist. Theory 23(0), 587–610 (2013). ISSN 2163-5951. https://doi.org/10.3765/salt.v23i0.2658. https://journals.linguisticsociety.org/proceedings/index.php/SALT/article/view/2658
Lassiter, D., Goodman, N.D.: Adjectival vagueness in a Bayesian model of interpretation. Synthese 194(10), 3801–3836 (2015). https://doi.org/10.1007/s11229-015-0786-1
Lebedeva, E.: Expressing discourse dynamics through continuations. phdthesis, Université de Lorraine (2012). https://tel.archives-ouvertes.fr/tel-01749193
Mohammed Ismail, W., Shan, C.C.: Deriving a probability density calculator (functional pearl). In: Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, pp. 47–59. Association for Computing Machinery, New York (2016). ISBN 978-1-4503-4219-3. https://doi.org/10.1145/2951913.2951922. https://doi.org/10.1145/2951913.2951922
Shan, C.C.: Monads for natural language semantics. arXiv:cs/0205026 (2002). http://arxiv.org/abs/cs/0205026. arXiv: cs/0205026
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Grove, J., Bernardy, JP. (2023). Probabilistic Compositional Semantics, Purely. In: Yada, K., Takama, Y., Mineshima, K., Satoh, K. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2021. Lecture Notes in Computer Science(), vol 13856. Springer, Cham. https://doi.org/10.1007/978-3-031-36190-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-36190-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36189-0
Online ISBN: 978-3-031-36190-6
eBook Packages: Computer ScienceComputer Science (R0)