Activities and events in our lives are structural, be it a vacation, a camping trip, or a wedding. While individual details vary, there are characteristic patterns that are specific to each of these scenarios. For example, a wedding typically consists of a sequence of events such as walking down the aisle, exchanging vows, and dancing. This work focuses on learning hierarchical and temporal event knowledge from a large collection of photo albums that depict common scenarios. Hierarchical knowledge identifies which events make up a scenario. In the previous example, walking down the aisle, exchanging vows, and dancing are the events of a wedding.
Temporal knowledge captures whether there is an order to these events that is fundamental to the scenario. In a wedding, walking down the aisle generally must happen before the exchange of vows. The specific order of these events is crucial to understanding the scenario. Conversely, on a trip to Paris or New York City, there might be a less rigid temporal structure to events such as climbing the Eiffel Tower or visiting the Louvre. One can generally be done before the other without compromising the nature of the scenario.
Antoine Bosselut, Jianfu Chen, David Warren, Hannaneh Hajishirzi, Yejin Choi (2016). Learning Prototypical Event Structure from Photo Albums. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL).