GIACS Focused Workshop on

"Large databases in social and economic complex systems research"

Jerusalem, Israel, September 17-18 2008


Main Menu
Invited Speakers
Contact Us

Esther Adi-Japha - "Large databases vs individual analysis: two complementary approaches in the study of education and learning"

In this talk I will discuss two methods for understanding the way children learn, and the factors that affect their learning. The first method concerns the most comprehensive child care study conducted to date to determine how variations in child care are related to children's development. This large-scale study was conducted at the USA, and data is available for secondary analysis. In the first part of the talk I will review the major findings of this study and shortly describe a secondary analysis we conducted. For various reasons concerned with data assessment and data sharing procedures, this study, in similar to other large-scale studies, does not allow inferences regarding the development of individuals. However, it is becoming wildly accepted that group data may mask critical phases in the individual's development. The second method presented in this talk concerns small-scale studies that describe a simple motor-task learning. Results of several studies that extend over hours, days, and weeks of practice on this specific task, a task that has been extensively studied as a model for skill learning, suggests that learning is not a smooth, continuous process, but is rather composed of discrete phases. In the talk I will review developmental results, and their possible implications for curricular planning

Stefan Bornholdt - "Physics of complex networks: applications in large online markets"

The new emerging internet phenomena as online markets and social networking sites provide an exciting new possibility for detailed studies of large socio-economic systems. They form large complex networks of interactions which call for new methods for their analysis. During recent years, physicists have started to fill this gap by developing methods and tools. I will give a short overview of some of these tools and their scope of application. The focus will lie on methods of community detection and their application to the complex networks of large online markets. One large case study will be presented, firstly to demonstrate the power and limits of analysis of large online markets with physical methods and, secondly, to discuss data issues in this context, from collection and availability, to their analysis and interpretation.

Robert Boruch - "Ethics, evidence grading systems, and evidence based decision making in Complex Systems research"

Boruch's segment of the workshop will consider three topics that are pertinent to the conference themes. The presentation briefly considers ethics related to individual privacy in research, the ethics of scientific inquiry, and resolving the tensions between them. Evaluating the quality of evidence is construed as part of the scientific ethic, and various systems for screening and grading evidence will be considered. This covers some international, national, and state/provincial systems, and a limited focus on certain research designs. The last part of the presentation concerns practical and theoretical conditions that enhance the likehood that evidence will be used in policy and practice contexts.

David Brée - "Energy policy: a complex systems perspective"

Energy generation is comparatively simple, energy policy is not. Can complex systems science help us to understand the national and international changes in technology past and present? How should the amount of subsidy, if any, be decided? - an economic perspective. What are the strengths of a complex systems representation? Two examples of micro-energy policy from Torino. Tidal energy: an interesting case. What might be a sensible way to proceed?

Mauro Gallegati - "Financially constrained fluctuations in an evolving network economy"

We explore the properties of a credit network characterized by inside credit, i.e. credit relationships connecting downstream (D) and upstream (U) firms and outside credit, i.e. credit relationships connecting firms and banks. The structure of the network changes over time due to the preferred-partner choice rule: each agent chooses the partner who charges the lowest price. The net worth of D firms turns out to be the driver of fluctuations. U production, in fact, is determined by demand of intermediate inputs on the part of D firms and production of the latter is financially constrained, i.e. determined by the availability of internal finance proxied by net worth. The output of simulations shows that at the macroeconomic level a business cycle can develop as a consequence of the complex interaction of the agents' financial conditions.

Byungnam Kahng - "Quantifying the Complete Trajectory of the Coauthorship Network Evolution"

We collect empirical datasets to study the evolution of complex networks. The datasets include the coauthorship relations of scientists working on the subject of complex network and string theory, respectively, and the word collocation data from infants. The dataset for a coauthorship network of scientists working on the complex network is over 115 months, and most importantly, spans from the initial point of its evolution till today. The other datasets are also available from the beginning of the evolution, providing us with the unique opportunity to study the complete trajectory of complex network evolution from the seed. Based on the statistics of evolution rates of various types of edges obtained from the empirical data, we find that growth of largest cluster of the coauthorship network is made by the continuous aggregation with finite clusters during the whole period of evolution. For the word collocation network of infants, however, the largest cluster grows incrementally but dominantly, without developing finite clusters. The largest clusters of the coauthorship networks form tree-like structures in the early stage and large-scale loop structures follow in later time. The largest cluster of the word collocation network, on the contrary, rapidly evolves into a dense cluster with excessive links and ultra-short diameter. We can detect the transition from the tree-like structure to a large-scale loop structure by measuring the fluctuation in the diameter of the giant cluster. In tree-like structures, the diameter of the giant cluster can be curtailed abruptly even by emerging of only one link between nodes in long distance and it can be expanded abruptly by cluster merging. Thus, although the number of nodes in the giant cluster of the coauthorship networks grows in a relatively gradual manner, the fluctuation in the largest-cluster diameter is significant. Following such empirical results, we construct a simple model for the coauthorship-type networks to pinpoint the origins and major driving forces of the observed non-trivial evolution pattern. The model has a key ingredient of the so-called locality constraint, motivated by the empirical finding that most of the new links in the real networks are created between nodes in short distances and new links which connect nodes in long distance are created rarely. Imposing the locality constraint, the model reproduces the observed evolution pattern of the coauthorship networks.

Janos Kertesz - "Searching people's digital footprints"

Recent development in information technology together with multidisciplinary efforts have opened a new avenue in social sciences. Instead of using the classical tools of rather limited number of questionnaires, research focuses on the "digital footprints" of people, i.e., on the electronic data we leave behind almost all our activities from communicating to working and from shopping to leisure. These data reflect the social interactions, the habits and attitudes of people and their proper evaluation gives new insight into the structure and the dynamics of the society. The use of such data is a scientific challenge and, at the same time, it raises ethical problems. Recent studies on databases of email and phone communications, are typical examples of this kind. Some data are publicly available, like the Enron email data set; others can be collected by software tools and some are difficult to obtain because of commercial interests and/or privacy issues. Results based on financial data, on the Enron email data set and on records of mobile phone conversations will be shown.

Alan Kirman - "The advantages and disadvantages of large market databases: the case of perishable goods and financial markets"

By large data-bases we typically mean ones with many observations. However, in markets the size may be generated by the number of dimensions of each observation. In the case of the perishable goods markets, such as that for fish, which we have studied we have many characteristics for each transaction and for each of the problems that we have analysed we had to abandon some of these dimensions. To what extent are we conditioning our results on the choice of omitted variables? This recalls the "identification" problem raised by Manski for social interaction models. Secondly the frequency of observation may change the nature of the appropriate model. In financial markets, high frequency data is used in conjunction with "temporary market equilibrium" models. But one can ask whether this is appropriate since each observation reflects an individual transaction and not a market clearing price in any standard sense. If we are to use this sort of data appropriately we would do better to model an order book explicitly and I will suggest how this might be done.

Imre Kondor - "Instability of downside risk measures"

We have recently shown that the axioms for coherent risk measures imply that whenever a dominant portfolio can be formed on a given sample (which happens with finite probability even for large samples), then portfolio optimization cannot be performed under any coherent measure on that sample, and the risk measure diverges to minus infinity. The fundamental reason for this instability is that, despite the abundance of financial data, we never have sufficient information for optimizing large portfolios. Here we extend this result and demonstrate that this instability is present in an even larger class of risk measures, including the most popular measure Value at Risk. An exact replica calculation allows us to determine the phase boundary where the instability of VaR sets in and where the estimation error diverges. The reason for why this instability has not been noticed before is also discussed.

Fabrizio Lillo - "The evolution of high frequency financial databases: from daily data to agent resolved data"

I will sketch the historical development of high frequency financial databases from daily (or even quarterly) data to databases with an increasing level of resolution. I will focus on two detailed types of databases, specifically order book data and agent resolved data. In the first case I will present some recent results on the microstructure of financial markets and on the insight one obtains on the price formation process. The agent resolved datasets may contains information on the trading activity of brokers, accounts, or traders. I will present some recent results obtained with this type of data which allows to classify agents according to their trading strategy, to study their interaction, and the effect of agents' activity on price formation.