Shakespeare in bits II

April 12, 2009

This one adds some bits more to one of my previous post.
Just for fun, it displays simillarities between rolling characters / personages between Shakespeare’s plays.

neighbor-joining algorithm applied for Shakespeare-s plays, under certain data-rank distance

Code behind was written in .NET (C#), under neighbour-joining algorithm.

Update #1
I was almost to break my neck trying to read what is written in the pic above.
I tried also to switch the monitor 90 degrees, but it still hurted me. Therefore I decided to add a second pic, especially for people that live in harmony with theirs necks…

protect your neck!

Copyright © 2009 Marius Iancu @ Revue Roumaine de Linguistique
Cahiers de Linguistique Théorique et Appliquée
free counters
oop! Copyright ©2009 http://marius09.wordpress.com
show me the money!

Shakespeare in bits

March 18, 2009

This post is about variation of average entropy per personage / character during a plot development.
Honestly, I don’t like Shakespeare. Maybe I don’t have the proper complexity to be excited by stuff like “to be or not to be”… To be what? “A horse”, maybe “for a kingdom”?. No, thanks.
However, Shakespeare is interesant first of all because he can be statistically processed. His work is large enough for allowing selection of representative samples, even by simulating recursive lectures of text. 39 plots, 5 acts per each! Rolling about 1000 characters… Imagine, he had no reason to make jogging or to go to gym – just writing and writing and writing… I suppose he developed the toughest fingers in that age, some sort of kung-fu masters secret skills…

Ok, now have a look here:

5 comedies
Copyright © 2009 Revue Roumaine de Linguistique Cahiers de Linguistique Théorique et Appliquée

These above do not reflect at all any semantical comparision. Literature is full on crazy links between numbers and semantics. Instead, it suggests a way of measuring surprise (in the sense of avoiding statistical mean/average) in rolling characters during the plot.

Enjoy the colours!

Copyright © 2009 Marius Iancu @ Revue Roumaine de Linguistique
Cahiers de Linguistique Théorique et Appliquée

;)   The app I have used for parsing and processing text is an ordinary web .NET app that I have proudly wrote. I also forced my dog, Darius, to simulate a bit some proud for me….

:(   Update March 25
Trying to figure out what another homo shakespearicus had to say about statistics on Shakespeare, I got this strange but interesting page: STATISTICS of plays by William Shakespeare, copyrighted by one called Hartmut Ilsemann. It is about the ancestral issue on counting how many words a character said, bla-bla-bla… (see more quotations on the same level of clever investigations on my previous delirium; …don’t click on that link! it would be sad to know that you to don’t have anything else better to do…).
Following the spirit of that approach, one could be interested, why not, about how many scratches men performed comparatively to womens’ … A correlation coefficient for these two variables would save the day for some people…Yeah! Anyway, it is encouraging that we have a lot of time for all of these….
free counters
oop! Copyright ©2009 http://marius09.wordpress.com
show me the money!

Fractal law…? No, fractal shit.

February 21, 2009

It happened these days (bad karma, for sure I’ve done a lot of crimes in my previous life…) to read some linguistics stuff [2] , [1].
What is briefly about (see [2] for joining me in the frustration…): those guys started to count words in different literary texts. Let say there is a set X of words (the literary piece). They actually perform some magic and detect a subset Y of distinct words. Pretty exciting…. but ok, all have hobbies. Well, it comes a new step now: it computes report(s)
K = card(Y)/card(X)*100, k= card(Y)/card(X)
which corresponds to the lexical wealth in terms of percentage” ([2], p.3) … Tough enough for you?
Well, wait a little. Is more much than this.
They quote Zipf’s law in a Mandelbrot way:
N = A * exp(k, -FI)
where: exp(x,y) = “x at power y” (my innovation in the lack of some math writing tools inside wordpress editor…), and “A is a constant amplitude and FI exponent which should be characteristics
of a given author.” ([2], p.3).
The next step is even more deep: it computes throgh linear regreesion involved A and FI values. All related to initial X. Or set of Xes.
That’s all? Yes.
Take a deep breath and look here typical approach:
(A) in the beginning, paragraph “2. ZIPF’S LAW IN LITERATURE“:
By assuming a power law behaviour for these quantities…” bla-bla-bla (…… some black-box in article…..)
and, you know what?:
(B) in “4. CONCLUDING REMARKS“: “We have shown, from the corpus analysis of the literary production of English authors and non-literary texts, that a power fractal law can be associated with the lexical wealth of the authors.“.
What sort of pseudo-science is this one..? Same as Dehmer papers I’ve quoted in Random graphs (1/4) : [1][2],[3].
Actually, this is about: a lot of pseudo scientists, stating different things without no start or end. Just an academical bla-bla-bla.
Coming back on these Brasilians guys, with fractals and so on: there are two possible approaches to present their work: (1) as a mathematical theorem, under several axioms, or (2) as a scientific theory.
We can agree that [2] is whatever one wants to be but 100% it is not a theorem. Is it a scientific theory? Supposing that its predictions are somehow to be taken into consideration, there is no minimal verifying of targeted hypothesis. Don’t believe me? Ok, brothers and sisters, let agree that there is a “statistical signature” in the sense of [2]. It should be easy to check: get several random texts (in the sense of meaningfull statistical sample) and let check it: belongs to poet Alpha? To big taler Beta? To …Dehmer (see my random graphs posts…plz,plz,plz!) ….?
Relax, none of these minimal things that should complete a decent approach were done.

Anyway, nasty type of articles. No one force us to read them, but there are a plenty of similar dumb ways of talking about nothing.
That’s all. Yeah!
By the way, I put in pdf bellow some of my early days approaches.. I almost was one of them!


Homework :)
[1]Entropy Gary Davis, Adam Callahan
[2]Fractal Power law in literary english L. L. Goncalves, L. B. Goncalves,arXiv:cond-mat/0501361v2 [cond-mat.other] 3 Jun 2005


peace, man! let rock the math!
(1) Information connections between the systemic and the semiotic aspects of a literary text (2) Some information-probabilistic aspects of literary texts (3) The narrative structure in terms of simple contextual grammars Marius Iancu 


Regretable update Referrenced [1] was somehow misteriously assasinated after a couple of days of my post. It actually hosted the link to [2]. I don’t know, have we to thank them or maybe just to continue the “hunt”…?


Encouraging update (March 12th) Be proud of me, old and tough teachers from my childhood!
It is my turn to say : My name is Bond. James Bond.:I discovered how to put some math formulas inside wordpress.
Therefore, instead of (1) should stay (2):

(1) (2)
 
N = A * exp(k, -FI) \mathcal{N} = A \cdot k^{-\phi}
K = card(Y)/card(X)*100 \mathcal{K} = \frac{card(Y)}{card(X)} \cdot 100
k= card(Y)/card(X) k = \frac{card(Y)}{card(X)}
 

I hunted it around, evaluate the best moment of letal attack, prepare my shot… and no chance for that damn formulas.
…….Or I should better try: …Doe. John Doe…? Nope. First one.

free counters
Copyright ©2009 http://marius09.wordpress.com
oh yeah!

Follow

Get every new post delivered to your Inbox.

Join 49 other followers