This Indico installation shall not be used to organise MPI-CBG courses and events from beginning of 2024 on. Please use MPG Indico instead.
20-23 August 2019
Hotel Amfora, Hvar, Croatia
Europe/Berlin timezone

Learning to flock with reinforcement learning.

Not scheduled
20m
Terrace ballroom (Hotel Amfora, Hvar, Croatia)

Terrace ballroom

Hotel Amfora, Hvar, Croatia

Ul. Biskupa Jurja Dubokovica 5, 21450, Hvar, Croatia
Talk Session 4

Speaker

Mr Mihir Durve (Department of Physics, University of Trieste)

Description

In many biological systems the individual agents cluster together in space and exhibit collective behavior [1]. Thousands of starling birds show spectacular collective aerial maneuvers near their home, migratory birds migrate as a flock, school of fish forage together, thousands of insects march and feast on the crop fields etc are few examples[2-4].
Many simulation models are proposed to understand the fundamentals principles governing the collective behavior in such systems [5-7]. Yet such rules are not well understood.
We study multi-agent systems with machine learning techniques to understand the optimal decision making process by the agents to exhibit collective behavior. One of the widely used machine learning technique, that we implemented, is called the reinforcement learning[8]. The broad scheme of the reinforcement learning technique can be summarized as following. Agent as a decision maker takes action in its environment which is in state (s) and environment provides a reward signal and new state of the environment (s') to the agents as a consequence of the action performed by the agent. The goal of the agent is to discover a policy (by try and error) that maximize the total reward. A policy is a map from states to actions that dictates the best action (a*) to perform in the state (s).
We implement reinforcement learning technique to understand the decision making process by the individual agents in order to form a flock. For that purpose, we set reward scheme that encourage congregation of the agents. We observe that agents with learning algorithm discover multiple policies to maximize the total reward for congregation. While following these policies agents not only congregate but also form highly polar ordered states as observed in real flocks[5]. In highly polar ordered states, all the agents move in the same heading direction. And one of the policies that agents discovered is equivalent to the well known statistical physics model called the Vicsek model[6].
Ref :
[1] J. Parrish and W. Hamner, Animal Groups in Three Dimensions (Cambridge University Press, 1997)
[2] M. Ballerini et al., PNAS 105, 1232 (2008)
[3] T. Pitcher, et al., Behav. Ecol. Socio. Biol. 10, 149 (1982)
[4] J. Buhl et al., Science 312, 1402 (2006)
[5] A. Cavagna et al., J. Stat. Phys. 158, 601 (2005)
[6] T. Vicsek et al., Phys. Rev. Lett. 75, 1226 (1995).
[7] I. Couzin et al., Journal of Theoretical Biology 218, 1 (2002).
[8] R. Sutton and A. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, 1998)

Primary authors

Mr Mihir Durve (Department of Physics, University of Trieste) Dr Fernando Peruani (University of Nice Sophia Antipolis) Prof. Antonio Celani (The Abdus Salam International Centre for Theoretical Physics)

Presentation Materials

There are no materials yet.
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×