Intro to Quantitive Analysis – Not so average after all

I’m a month into a quantitive analysis course as part of my Associate
of Science in Information Systems Development at CityU, Hong Kong.

Missing most of the first month’s lectures and tutorials, I’ll recap
here the main points gleamed from the lecture notes and try to prepare
for a tutorial at lunchtime tomorrow.

What is quantitive analysis?

A business or financial analysis technique that seeks to
understand behavior by using complex mathematical and statistical
modeling, measurement and research. By assigning a numerical value to
variables, quantitative analysts try to replicate reality
mathematically.

Quantitative analysis can be done for a number of reasons such as
measurement, performance evaluation or valuation of a financial
instrument. It can also be used to predict real world events such as
changes in a share price.

Read more

:P

Yes, what they said. I could over simplify it by saying the taking of
real data and reinterpreting it to prove a point with data which is not
100% real but fits more conveniently in a PowerPoint

Take a census of 100 people. The REAL data is what 100 people
answered for “Are you male or female?”. Stats are often used selectively
to prove a point, i.e.,

ONLY 1 person in the whole census was female – that proves why crime is so high in this area!

But, in checking the real data, you may see that 45 people wrote “Demi-God” and the rest wrote “Bruce”.

It can be used properly though, which would entail something like:

1% were women and 99% were recorded inaccurately

Averages

The first thing we covered in the lecture I attended on the first
day, were averages. I learned that is one of the slickest words used by
reporters and analysts to present numbers in a certain way.

Just the word average alone, we assume to mean the sum of all units,
divided by the number of units, but this is not so. Wikipedia gives an
intro:

In mathematics, an average, or central tendency[1] of a
data set is a measure of the “middle” value of the data set. Average is
one form of central tendency. Not all central tendencies should be
considered definitions of average.

There are many different descriptive statistics that can be chosen as
a measurement of the central tendency of the data items. These include arithmetic mean, the median and the mode.
Other statistical measures such as the standard deviation and the range
are called measures of spread and describe how spread out the data is.

So, we have at least these 3 words to define different types of averages, which can greatly vary the resulting central tendency (another word for average – I know, I’m already falling asleep!).

Arithmetic mean

This is the one I always associated with the word average (sum of all
units, divided by the number of units) and is the easiest to calculate
for me.. But here is a formula, just to complicate things:

If we have this sample set, , then we will use this formulaic equation to calculate arithmetic mean:

**I still need to get used to reading formulas again after 12 years out of school

One drawback of this method of calculating central tendency, is that
extreme values can greatly affect the central tendency ^1. For such
cases, a better representation of the central tendency may be the median, covered below.

^1 I believe this is why everyone thinks Hong Kong is one of the most
expensive places to live, because you have some extremely high values
affecting the “average”. Actually, most Hong Kong people earn very
little compared to the West, so they are good at living on the cheap
(though the country is far from stable, middle and lower class people
far from content).

more about arithmetic means @ Wikipedia

Median

:)

This one is pretty simple – the value in the middle of an ordered
sample set. You take all values, arrange from lowest to highest and then
grab the middle value. If there is an even number of values in the
sample set, at uni, I (possibly mistakenly) remember being were told to
take the next value higher, but Wikipedia now tells me it should be the
mean of both values either side of the divide… that makes more sense – I
should dust off my lecture notes (actually, they are online
PowerPoints, no dusting required ).

I like this extra definition related to medians from Wikipedia (yes, Wikipedia is easy!):

At most, half the population have values less than the median,
and, at most, half have values greater than the median. If both groups
contain less than half the population, then some of the population is
exactly equal to the median. For example, if a < b < c, then the median of the list {abc} is b, and, if a < b < c < d, then the median of the list {abcd} is the mean of b and c; i.e., it is (b + c)/2.

Here is a quick image n bit of text gleamed from the Murdoch Uni
website, showing one of those charts which looks like a game of Connect
Four:

The following sample has a mean of 7.69 and median of 7.65:

4.2, 4.4, 5.1, 5.6, 6.0, 6.4, 6.8, 7.1, 7.4, 7.4, 7.9, 8.2, 8.2, 8.7, 9.1, 9.6, 9.6, 10.0, 10.5, 11.6

Mode

The last of the 3 types of measuring central tendency I will cover
here is quite simply the value which occurs most frequently in a sample
set. I’ll use my new best friends, a stem and leaf plot and a something or other plot dot plot to show the number 17 is the most commonly occurring:

This one taken from somewhere on the BBC:

And below is from a spreadsheet in Excel I created:

**Ignore the “Base integer” there, I’m too lazy to remake an image at 3AM….

So, in this format, it is super quick n easy to spot the mode – did you find Wally/Waldo yet?

I need to make some notes for myself of the terminology used for populations and different types of sample sets… after a nap!

Apple to abandon AppleScript… I think not!

A good geek friend
of mine just warned me that AppleScript may be phased out when Apple
starts sandboxing OS X apps delivered through the App Store.

Never one to agree with friends, I am writing this post to give my opinion why this would not eventuate.

AppleScript has been an integral part of the Mac OS for many years.
Most personal Mac users may not even know about AppleScript, as it’s
default editor is hidden in the Utilities folder. I have, however, seen
it in use at large companies, who depend on it for creating streamlined,
automated workflows. Some geeks, like myself, think it is the best
thing since sliced bread and use it for automating large parts of our
business and personal computer usage.

The business usage should be reason enough alone why Apple would not
cut such an integral part of it’s operating system. Printing companies
that drop EPS files into folders to have them color separated and sent
to pre-press for CMYK plate making, while simultaneously emailing a PDF
copy of the same document to admin staff for confirmation, etc. That is
one of a million+ ways AppleScript is helping business large and small
automate their workflows. Dropping AppleScript might force them to look
at other OS’s, not something Apple would like to do.

The term “sandboxing” is the other reason I believe Apple would not
drop support for AppleScript from any apps. Applescript is just like an
API which apps need to build-in, so I’m assuming Apple would not care
what data people send TO the apps, rather what the apps can ACCESS on
the computer itself… i.e., Apple would not like apps delivered through
the App Store to have unapproved access to every file on your computer
(perhaps they will address this with asking your permission each time?).
This point does not really apply to AppleScript, as AppleScript support
is something which is programmed into an app to allow INCOMING requests
for information or action.

While AppleScript can also be used by developers inside an ordinary
OS X app to interact with the OS, Apple would likely apply the same
quality checking of these codes as it does to the Objective-C or other
code inside the app before approval to the App Store.

So there… that’s what I think!