Leon Stafford

Intro to Quantitive Analysis – Not so average after all

I’m a month into a quantitive analysis course as part of my Associate
of Science in Information Systems Development at CityU, Hong Kong.

Missing most of the first month’s lectures and tutorials, I’ll recap
here the main points gleamed from the lecture notes and try to prepare
for a tutorial at lunchtime tomorrow.

What is quantitive analysis?

A business or financial analysis technique that seeks to
understand behavior by using complex mathematical and statistical
modeling, measurement and research. By assigning a numerical value to
variables, quantitative analysts try to replicate reality
mathematically.

Quantitative analysis can be done for a number of reasons such as
measurement, performance evaluation or valuation of a financial
instrument. It can also be used to predict real world events such as
changes in a share price.

Yes, what they said. I could over simplify it by saying the taking of
real data and reinterpreting it to prove a point with data which is not
100% real but fits more conveniently in a PowerPoint

Take a census of 100 people. The REAL data is what 100 people
answered for “Are you male or female?”. Stats are often used selectively
to prove a point, i.e.,

ONLY 1 person in the whole census was female – that proves why crime is so high in this area!

But, in checking the real data, you may see that 45 people wrote “Demi-God” and the rest wrote “Bruce”.

It can be used properly though, which would entail something like:

1% were women and 99% were recorded inaccurately

Averages

The first thing we covered in the lecture I attended on the first
day, were averages. I learned that is one of the slickest words used by
reporters and analysts to present numbers in a certain way.

Just the word average alone, we assume to mean the sum of all units,
divided by the number of units, but this is not so. Wikipedia gives an
intro:

In mathematics, an average, or central tendency[1] of a
data set is a measure of the “middle” value of the data set. Average is
one form of central tendency. Not all central tendencies should be
considered definitions of average.

There are many different descriptive statistics that can be chosen as
a measurement of the central tendency of the data items. These include arithmetic mean, the median and the mode.
Other statistical measures such as the standard deviation and the range
are called measures of spread and describe how spread out the data is.

So, we have at least these 3 words to define different types of averages, which can greatly vary the resulting central tendency (another word for average – I know, I’m already falling asleep!).

Arithmetic mean

This is the one I always associated with the word average (sum of all
units, divided by the number of units) and is the easiest to calculate
for me.. But here is a formula, just to complicate things:

If we have this sample set, , then we will use this formulaic equation to calculate arithmetic mean:

**I still need to get used to reading formulas again after 12 years out of school

One drawback of this method of calculating central tendency, is that
extreme values can greatly affect the central tendency ^1. For such
cases, a better representation of the central tendency may be the median, covered below.

^1 I believe this is why everyone thinks Hong Kong is one of the most
expensive places to live, because you have some extremely high values
affecting the “average”. Actually, most Hong Kong people earn very
little compared to the West, so they are good at living on the cheap
(though the country is far from stable, middle and lower class people
far from content).

more about arithmetic means @ Wikipedia

Median

This one is pretty simple – the value in the middle of an ordered
sample set. You take all values, arrange from lowest to highest and then
grab the middle value. If there is an even number of values in the
sample set, at uni, I (possibly mistakenly) remember being were told to
take the next value higher, but Wikipedia now tells me it should be the
mean of both values either side of the divide… that makes more sense – I
should dust off my lecture notes (actually, they are online
PowerPoints, no dusting required ).

I like this extra definition related to medians from Wikipedia (yes, Wikipedia is easy!):

At most, half the population have values less than the median,
and, at most, half have values greater than the median. If both groups
contain less than half the population, then some of the population is
exactly equal to the median. For example, if a < b < c, then the median of the list {a, b, c} is b, and, if a < b < c < d, then the median of the list {a, b, c, d} is the mean of b and c; i.e., it is (b + c)/2.

Here is a quick image n bit of text gleamed from the Murdoch Uni
website, showing one of those charts which looks like a game of Connect
Four:

The following sample has a mean of 7.69 and median of 7.65:

4.2, 4.4, 5.1, 5.6, 6.0, 6.4, 6.8, 7.1, 7.4, 7.4, 7.9, 8.2, 8.2, 8.7, 9.1, 9.6, 9.6, 10.0, 10.5, 11.6

Mode

The last of the 3 types of measuring central tendency I will cover
here is quite simply the value which occurs most frequently in a sample
set. I’ll use my new best friends, a stem and leaf plot and a ~~something or other plot~~ dot plot to show the number 17 is the most commonly occurring:

This one taken from somewhere on the BBC:

And below is from a spreadsheet in Excel I created:

**Ignore the “Base integer” there, I’m too lazy to remake an image at 3AM….

So, in this format, it is super quick n easy to spot the mode – did you find Wally/Waldo yet?

I need to make some notes for myself of the terminology used for populations and different types of sample sets… after a nap!

October 3, 2011
11:25 am

Site made with love of these open source tools

The beautifully-crafted OS that forces you to learn how to do things properly.

For better or worse, WordPress powers 30% of the web. ClassicPress reduces some of the bloat.

Text is my material. Learn one text editor well.

Version controlling all the things.

Retaining workspaces on local and remote servers.

Hosted by OpenBSD Amsterdam

Supporting the OpenBSD community with opinionated VMs. €10/yr donated to the OpenBSD Foundation.

Special Thanks

To Roman Zolotarev, for helping us Master the Web. My family and friends for enduring my voluntary financial hardship while pursuing my passions.