Modelling the evolution of language for modellers and non-modellers
44 Slides491.00 KB
Modelling the evolution of language for modellers and non-modellers Benefits of modelling Pitfalls How to communicate your results? Modelling language origins and evolution IJCAI-05 1
Recapitulation Computer simulations are a synthetic science (versus analytic science) – A theory is implemented as a model. – The model is simulated using a computer. Modelling language origins and evolution IJCAI-05 2
Advantages of computer modelling CMs allow us a view on difficult to study processes – Old, complex or single-occurrence processes. CMs allow us to study mathematically intractable problems. – Complex non-linear systems such as language. CMs are explicit, detailed, consistent, and clear. – But that is also its weak point. More on that later CMs, through their relative simplicity, allow verification. – Experimental reproduction is rare in other disciplines. Modelling language origins and evolution IJCAI-05 3
More advantages of computer modelling CMs produce falsifiable claims. – This is really conducting science in the Popperian tradition. CMs produce quantitative predictions. – Allowing clear and unambiguous comparison with real data. CMs allow exploring different parameter settings – Evolutionary, environmental, individual and social factors can be easily varied. CMs allow unethical experiments. – No permission is needed from your ethical commission to do language deprivation experiments on agents. Modelling language origins and evolution IJCAI-05 4
Caveats Of course to balance all the advantages, computer modelling also has some disadvantages. – Being aware of possible problems, might enable us to dodge them. Modelling language origins and evolution IJCAI-05 5
Caveat 1: CMs are explicit, detailed, consistent, and clear Computer models contain simplifications and abstractions which are immediately obvious because of their clear specification. – This makes models lightning rods for criticism. Modelling language origins and evolution IJCAI-05 6
Caveat 1: CMs are explicit, detailed, consistent, and clear Solutions – Obfuscate your model so everyone is awed by its complexity and dares not criticise it. – Or better, justify every choice made during the construction of your model and stress the relevance for linguistics. Modelling language origins and evolution IJCAI-05 7
Caveat 2: Too far from reality We want computer models to explain cognitive or linguistic phenomena. – Examples A grammar is a symbol G with a learning probability. An individual creates utterances consisting of strings drawn from an alphabet {a,b,c}, – These abstractions make it hard for nonmodellers to accept CM results. Modelling language origins and evolution IJCAI-05 8
Caveat 2: Too far from reality The field should understand that abstraction is not necessarily bad. – Most scientific disciplines use abstraction. Think of physics or theoretical biology. – Verbal models and field research use abstraction and assumptions as well, but these are hardly ever doubted. Modelling language origins and evolution IJCAI-05 9
Caveat 3: CM is too much fun Too often computer models are just run for the fun of it, and the goal of modelling is neglected. – It is all too tempting to try yet another variation of a simulation or add yet another neat feature. – Eventually you end up with too much data, making a proper analysis impossible. Modelling language origins and evolution IJCAI-05 10
Caveat 3: CM is too much fun Solution – Define a hypothesis which you will a test using CM, work towards testing this hypothesis. – Demonstration is good, understanding is better. – Do exploratory data analysis: look beneath immediate results for explanations – Look for variability: what parameters have an influence on the results, what you are looking for is a causal effect. Modelling language origins and evolution IJCAI-05 11
Caveat 4: CMs are not embedded in the field Sometimes CM and their results are “solitary” – Models and results are not brought to bear with existing theories or existing empirical data. Modelling language origins and evolution IJCAI-05 12
Caveat 4: data should be related back to other disciplines Solution – Start from a claim, and look for existing theories in the field. – Empirical data is wonderful if you can lay your hands on it. But be aware that making the link between empirical data and your results is often very difficult. – Explain how your results might shed new light on existing theories, but don’t be overconfident. Modelling language origins and evolution IJCAI-05 13
Caveat 5: magic numbers When building models, one inescapably ends up introducing magic numbers. – Learning rate for a neural network, merging parameter for categories, number of possible grammars, – Sometimes magic numbers are inherent to the phenomenon your studying (like in physics). Modelling language origins and evolution IJCAI-05 14
Caveat 5: magic numbers Solution – Try to avoid magic numbers (easier said than done). – Try to choose extreme values, this polarises your argument. Learning rate is either 0 for memory-less learner, or 1 for a batch-learner (cfr. Gold, 1967; Nowak, 2001; Zuidema, 2003). – Find optimal values for magic numbers. Using some kind of optimisation (e.g. K. Smith, 2003). – Justify the magic numbers as well as possible. – Could the magic numbers be the important result of your research? – Try to make your results insensitive to them. Modelling language origins and evolution IJCAI-05 15
Caveat 6: reification Your model is an abstraction of reality. – Even though it behaves as the real thing, are you allowed to make claims about the real thing based on an abstract model? – Are you sure that the dynamics of your model are similar to what goes on in the real world? Do submarines swim? Modelling language origins and evolution IJCAI-05 16
Caveat 6: reification Solutions – Again, the field should understand that abstraction is not necessarily bad. – Make sure that you do not present simulation result as the truth and nothing but the truth. CMs do not provide proof! – CM is an exploratory tool, and should —if possible — be checked against hard data. Modelling language origins and evolution IJCAI-05 17
Some more practical advice Good advice –that each of us neglected once upon a time- for doing computational modelling. Modelling language origins and evolution IJCAI-05 18
Control A control is an experiment in which the hypothesized cause is left out – So the hypothesized effect should not occur either. – Be aware that placebo effects might occur, rendering your control experiment worthless. Modelling language origins and evolution IJCAI-05 19
Control Control experiments provide a base line to check your results against. – How successful are agents at communicating if they randomly generate syntactic rules (instead of using grammatical induction)? – Are the results where agents use grammatical induction significantly better? Without a base line, your results are meaningless. Modelling language origins and evolution IJCAI-05 20
Hypothesis testing Different ways to interpret results – Exploratory data analysis: looking for patterns in the data, often after filtering the data with statistical methods. – Hypothesis testing however remains superior. Modelling language origins and evolution IJCAI-05 21
Hypothesis testing Example: toss a coin ten times, observe eight heads. Is the coin fair (i.e., what is it’s long run behavior?) and what is your residual uncertainty? You say, “If the coin were fair, then eight or more heads is pretty unlikely, so I think the coin isn’t fair.” Proof by contradiction: Assert the opposite (the coin is fair) show that the sample result ( 8 heads) has low probability p, reject the assertion, with residual uncertainty related to p. Estimate p with a sampling distribution. (From Cohen, Gent & Walsh) Modelling language origins and evolution IJCAI-05 22
Hypothesis testing If the coin were fair (p .5, the null hypothesis) what is the probability distribution of r, the number of heads, obtained in N tosses of a fair coin? Get it analytically or estimate it by simulation (on a computer): – Loop K times r : 0 Loop N times ;; r is num.heads in N tosses ;; simulate the tosses – Generate a random 0 x 1.0 – If x p increment r ;; p is the probability of a head Push r onto sampling distribution Print sampling distribution Modelling language origins and evolution IJCAI-05 23
Hypothesis testing 10,000 times 10 tosses produces this distribution – This is an estimated distribution using Monte Carlo sampling 2500 2000 1500 1000 Probability of 8 or more heads in N 10 tosses is 0.057 500 0 0 1 2 3 4 5 6 7 8 9 10 As this probability is very low, we can reject the null hypothesis (H0: the coin is fair). p 0.057 is the residual uncertainty. Modelling language origins and evolution IJCAI-05 24
Dos and don’ts Don’t throw away old code – When programming keep a log of all program code and all parameter settings. – Use version control. Don’t change two things at once in your simulation – You will never know which parameter caused what. Do collect all your data – But be reasonable about this. Gigabyte large data files are often of little use. Modelling language origins and evolution IJCAI-05 25
Dos and don’ts Repeat your experiments – Using different settings, different random seeds, – Make sure your experiments are reproducible (don’t end up with a “cold fusion” experience). Don’t trust yourself on bugs – Time and time again tiny bugs are discovered in code that was taught to be flawless. Do look at the raw data – Statistical measures often obfuscate results (e.g. outliers are averaged away). Modelling language origins and evolution IJCAI-05 26
Dos and don’ts Make a fast implementation – When your program runs faster, you will do more experiments and explore more parameter settings Modelling language origins and evolution IJCAI-05 27
Communication Eventually you want to communicate your simulation results to others. How to do that? Bridging the gap between modellers and non-modellers using communication. Modelling language origins and evolution IJCAI-05 28
Hallmarks of a good experimental paper Clearly define your goals and claims Perform a large scale test – Both in number and size of instances Use a mixture of problems – Real-world, random, standard benchmarks, . Do a statistical analysis of results (source Bernard Moret & David Johnson) Modelling language origins and evolution IJCAI-05 29
Hallmarks continued Place your work in context – Compare your work to other work in the field. – Mention work by others Ensure reproducibility – Forces you to be clear. – Adds support to your claims. – Publish code and data on the web. Ensure comparability – Makes it easier for others to check your results. – Report all experimental settings. – Do not hide anomalous results. Modelling language origins and evolution IJCAI-05 30
Pitfalls Result could be predicted by back-ofenvelope calculation. Bad experimental setup – To few experiments. – Being happy with one “lucky run”. Poor presentation of data – Lack of statistics. – No mention of base line – Too much statistics, thus neglecting the raw data. Modelling language origins and evolution IJCAI-05 31
Pitfalls continued Failing to report key implementation issues. Extrapolating from tiny samples. Drawing conclusion not supported by the data. Ignoring the literature. Modelling language origins and evolution IJCAI-05 32
Resistance against modelling Modellers often have to answer critical remarks from non-modellers. – A survey among 30 experienced researchers in the field has yielded the following themes. Modelling language origins and evolution IJCAI-05 33
“How can you validate this model?” Often a mistaken assumption that simulation models must be realistic and hence “calibrated” against real data. Or a neglect on the part of the modeller, to not make the results falsifiable. Modelling language origins and evolution IJCAI-05 34
“You've built in the result" Show how there are parameter settings for the model where the particular result in question does not emerge. Be clear about what hypotheses the model is testing and to maintain a clear distinction between data, model and theory. Modelling language origins and evolution IJCAI-05 35
“This model stands on its own and has no relation with any linguistic phenomenon” This is only caused by neglecting the existing literature. – Always embed your model in the proper cognitive/linguistic context. Often modellers do not start from empirical data. – An appeal for starting for building models on existing research. Modelling language origins and evolution IJCAI-05 36
“It is possible to build models which come up with contrary results - how can you 'prove' which is correct?” Every model hinges on its initial assumptions, these should be clearly defined and maintained throughout the model. Your model is only as good as the initial assumptions it is based on. Modelling language origins and evolution IJCAI-05 37
“Your model uses evolutionary computing techniques, but language does not evolve - it is learned” There often is confusion between the techniques used and the phenomena which are studied. – It is not because some parameter is optimized using genetic algorithms, that the phenomenon is evolutionary. – One should also realize that genetic algorithms are by no means a model of evolution, but rather an optimization technique Modelling language origins and evolution IJCAI-05 38
“I liked your talk. I study Mayan grammatical constructions, can you incorporate this in your model?” This is a misapprehension about simple idealistic models - they are not intended to be exhaustive, but instead directed at testing a specific hypotheses. Modelling language origins and evolution IJCAI-05 39
Where do modellers publish? Journals sympathetic to computational modelling – – – – Artificial Life. Adaptive Behavior. Journal of Artificial Societies and Social Simulation. Artificial Intelligence Others – – – – – – – – – – – – – – – – – Complex Systems Journal of theoretical Biology. Connection Science Studies in Language Advances in Complex Systems Proceedings of the Royal Society of London, Series B Brain and Language Cognitive Science Trends in Cognitive Science Verbum Language Typology Sprachtypologie und Universalienforschung Language and Cognitive Processes Cognitive Brain Research Journal of Phonology Acoustic research letters online Behavioral & Brain Sciences Modelling language origins and evolution IJCAI-05 40
Where do modellers gather? Evolution of Language Conference International Conference on Artificial Life European Conference on Artificial Life From animals to animats: Simulation of Adaptive Behavior conference Emergence and Evolution of Linguistic Communication Modelling language origins and evolution IJCAI-05 41
What tools do modellers use? Programming languages – Mathematical packages – Matlab, Maxima, Visualization tools – – C, C , Lisp, Objective CAML, Prolog, Scheme, Perl, Java, GNUplot, xfig, Grace (open source and free tools) MS Excel (for graph plotting) Miscellaneous – – – – – Tlearn (neural net package), PHYLIP (phylogenetic tree reconstruction) NSL simulation environment (neural networks) SPSS (statistics) Praat (phonetics simulator) gawk Modelling language origins and evolution IJCAI-05 42
Take home messages Non-modellers have a hard time understanding your terminology and techniques. Explain and justify anything you do. Non-modellers often fail to see the usefulness of modelling. Place you model in a context and place your results in that context. Demonstrate how your results provide insights that could not be gotten from pen-and-paper analysis. Don’t do modelling for the modelling. Take a concrete problem and tackle it. Modelling language origins and evolution IJCAI-05 43
Resources Evolution of language resources http://www.isrl.uiuc.edu/amag/langev These slides, code and miscellaneous stuff http://www.ling.ed.ac.uk/ paulv/tutorial.html Modelling language origins and evolution IJCAI-05 44