Talk:Optimal Classification

This page or book was previously nominated for deletion, but was kept. Please see the discussion in the Wikibooks:Requests for deletion archives for justifications and discussion. Old discussions should be taken into account before nominating again for deletion.

Data submittal

Latest comment: 16 years ago1 comment1 person in discussion

You may submit a CSV formatted unoptimized data set here and I will provide the optimization below it. Topic name goes in the first column of the first row followed by the names or labels of each characteristic, followed by the name of the first element in the first column of the second row followed by the states that link each characteristic to the element and so forth. Please limit the number of elements to 1,000, the number of characteristics to 50 and the number of states to 20. Typative (talk) 11:54, 2 August 2008 (UTC)Reply

Example

FLAGS/LOC,A,B,C,D,E,F,G,H,I
BELGIUM,BLACK,YELLOW,ORANGE,BLACK,YELLOW,ORANGE,BLACK,YELLOW,ORANGE
FRANCE,BLUE,WHITE,RED,BLUE,WHITE,RED,BLUE,WHITE,RED
GERMANY,BLACK,BLACK,BLACK,RED,RED,RED,YELLOW,YELLOW,YELLOW
IRELAND,GREEN,WHITE,ORANGE,GREEN,WHITE,ORANGE,GREEN,WHITE,ORANGE
ITALY,GREEN,WHITE,RED,GREEN,WHITE,RED,GREEN,WHITE,RED
JAPAN,WHITE,WHITE,WHITE,WHITE,RED,WHITE,WHITE,WHITE,WHITE
LUXEMBOURG,RED,RED,RED,WHITE,WHITE,WHITE,BABY,BABY,BABY
NETHERLANDS,RED,RED,RED,WHITE,WHITE,WHITE,BLUE,BLUE,BLUE
SPAIN,RED,RED,RED,YELLOW,YELLOW,YELLOW,RED,RED,RED

comments and questions

Latest comment: 16 years ago1 comment1 person in discussion

This book is a work in progress and is undergoing revisions, expansion, etc. Please feel free to add your comments and questions regarding anything about this topic, including the method, formatting, ext. to help editors improve this book should you choose not to make any edits yourself. Thanks. Typative (talk) 17:32, 1 August 2008 (UTC)Reply

Splitting into smaller sub-pages

Latest comment: 16 years ago6 comments4 people in discussion

Why should this be split into smaller sub-pages? I don't find the length confusing in the least. On the contrary, keeping everything here allows for a quick overview. --Swift (talk) 23:10, 31 July 2008 (UTC)Reply

I felt the same way since for one thing the application example has a tendency to immediately capture one's attention. One might otherwise keep on "thumbing". Never hurts to give something a shot but let me know if you do not like it and I'll be happy to restore the single page. Typative (talk) 11:20, 1 August 2008 (UTC)Reply

I also don't see a need to split it into sub-pages at this time (the book is to recent and still very small in size). The split request tag was added by Mike.lifeguard but with not specific comment or post to indicate the reason. I'll leave a post in his talk page. --Panic (talk) 18:28, 3 August 2008 (UTC)Reply

It is customary (though you are by no means bound by custom) to have the main page of a book as a cover or TOC, and have content on subpages. — Mike.lifeguard | ^talk 18:48, 3 August 2008 (UTC)Reply

Yes, it's customary. It's customary because the size of most books makes that a useful approach — not because of an inherent value in that setup. --Swift (talk) 00:30, 4 August 2008 (UTC)Reply

One of the problem I find when contributing to a book even if its size is an impediment to a monolithic approach, is that if it is to segmented, there is no way to see all the structure, it becomes very hard to give the work any flow or sense of directionality and to reduce duplication of content. --Panic (talk) 00:43, 4 August 2008 (UTC)Reply

Expand this

Latest comment: 16 years ago10 comments4 people in discussion

The theory of optimal classification is a large area of research.This book, however, only covers a single algorithm, like an article would. This book really needs to be expanded to include more information, like a book should. There are a lot of important topics like Bayes classification algorithm that aren't covered here. To help expand this book quickly, you can request imports of related articles from Wikipedia, and turn those into book pages. --Whiteknight (Page) (Talk) 23:53, 31 July 2008 (UTC)Reply

What I had in mind was limiting it to algorithms and processes that perform the same function. Is that what you have in mind as well? Typative (talk) 11:23, 1 August 2008 (UTC)Reply
Just tested method of hierarchal expansion similar to computer folder tree (Microsoft Windows) hierarchy with:

"Optimal Classification/Application Example/Flag Recognition".

The tree structure can easily accommodate expansion or contraction.

For instance, if one assumes Optimal Classification to be at "book" level then a "chapter" level for each method of Optimal Classification can be inserted into the hierarchy as follows:

Optimal Classification/Chapter 1/Application Example/Flag Recognition.

Revisions of the tree will require that links be updated manually or perhaps by bot.

Typative (talk) 14:47, 1 August 2008 (UTC)Reply

This is the benefit of the flat database structure of the wiki. Don't introduce a hierarchy except where absolutely needed. So, to indicate which pages are part of a book, you place them under <Bookname>/<pagename>. Where there is clear use in mutual back-links (such as is the case with Japanese/Vocabulary: there are links to this page just under the title on the sub-pages, e.g. Japanese/Vocabulary/Animals) pages can be place in deeper sub-pages. Otherwise, the structure of the book should be created simply by linking.

That allows for a more flexible system of interlinking and breaks us out of the confines of linear books. It also allows for reorganising of content such as moving a page between chapters simply by changing the link, rather than having to move the page as well. A page can even appear in the learning path more than once. --Swift (talk) 00:46, 4 August 2008 (UTC)Reply

So it sounds like you are saying that the tree structure is already automated here and to take advantage of its automation only requires use of a tag like object (<bookname>/<pagename>) at the head of the page? Typative (talk) 07:37, 4 August 2008 (UTC)Reply

The back-links don't require any "tags" — just a hierarchy. --Swift (talk) 11:28, 7 August 2008 (UTC)Reply

Further testing indicates that a lot of work is required to manually fix the links when the hierarchy structure is changed. Although a tree structure is logically superior it may not be feasible without a bot to follow and make the changes. Typative (talk) 15:23, 1 August 2008 (UTC)Reply

No tags, it's only based on the location of the pages take a look into the C++ Programming notice that if you select TOC1 on the top of the window it shows "< C++ Programming", the order is given by the location of the page on the book namespace. Move a page to another subpage location and you reorganize the tree but that implies that the top page the root be the TOC. --Panic (talk) 17:23, 4 August 2008 (UTC)Reply

Is there a tutorial for using this method? Typative (talk) 10:15, 7 August 2008 (UTC)Reply

It's not a method but a feature provided in the software. See the corresponding links at Optimal Classification/Application Example/Flag Recognition (the ones that link to Optimal Classification and Application Example. --Swift (talk) 11:28, 7 August 2008 (UTC)Reply

Additional methods being sought to fill additional chapters...

Latest comment: 16 years ago1 comment1 person in discussion

Starting with this query at the Wikipedia mathematics reference desk.

Note: Due to the limitation of set size shown in the primary reference and the notation of a sub-scheme, the method of permutation was used in the evaluation program. It is assumed the need for a set size limit and a sub scheme was due to the limitation of the computational facility a the time (1971), namely a Burroughs 5700 time sharing terminal. Current PC technology is sufficient to accommodate a much larger set size before a sub-scheme for remaining characteristics that fall outside the limit of the set must be used. If a decision tree trimming method which can accomplish the same function without requiring a limit on set size or the need for a sub-scheme then it should be included here. Typative (talk) 13:42, 12 August 2008 (UTC)Reply

more general book

Latest comment: 16 years ago3 comments3 people in discussion

I think a discussion of classification algorithms, such as this algorithm, should stay at Wikibooks. However, rather than put each algorithm in its own book, I think it would be better to collect several algorithms per book. So I suggest moving this "Optimal Classification" "book", to make it part of a more general book.

Which book is the appropriate place for this algorithm?

Algorithm implementation?
Artificial Intelligence?
Advanced Data Structures and Algorithms?
Systems Theory/Decision Structure?
Start a new book that covers only classification and clustering algorithms?
Some other book I've overlooked?

Which book do you feel is most appropriate for discussing this algorithm? --DavidCary (talk) 15:56, 13 August 2008 (UTC)Reply

I agree with you, my preference goes to a work that "covers only classification and clustering algorithms", but then why not let this evolve into a more complete book or just wait for someone else to create book on the same lines and propose a merge ? (for what I've understood the actual author only intended to cover this one algorithm, that gives us a usable book on a given subject, not the beginning of a great project, with luck and more contributions maybe it will get there).

I take you aren't committing to help extend it, just advancing the proposal ? --Panic (talk) 17:34, 13 August 2008 (UTC)Reply

Although in need of improvement and expansion (still) I liked this "book" in the form of an article much better than in the form of a book. In the form of a book I agree that it makes a very thin, although very potent, book which has already been expanded about as much a possible without making it more complicated than it needs to be by covering every minute detail as in the primary reference. My goal after all in creating it online anywhere in the first place was to show its,

universal application and
practical simplicity in combination with its solid mathematical base.

I feel therefore that perhaps a new book project which "...covers only classification and clustering algorithms" so as to show and compare the advantages, disadvantages and applications of each method (in which case Dr. Rypka's method might become only a section within a chapter covering optimal classification) with additional sections to cover other methods of optimal classification is acceptable but may also be highly beneficial and supportive of the understanding of Dr. Rypka's method as well.

Although I find one swimsuit contestant attractive, I find it much easier to judge one by looking at the others as well! ;-} Typative (talk) 13:27, 14 August 2008 (UTC)Reply

clustering

Latest comment: 16 years ago18 comments2 people in discussion

Hi David,

I noticed your clustering link at Optimal Classification.

I had not previously read the Wikipedia cluster analysis article.

Dr. Rypka refers to clustering on page 171, section 3(a) of the Primary reference^[1]. The "number of clusters" label on the graphic in the "Elbow" clustering paragraph corresponds directly to the "number of attributes" in the target set of Dr. Rypka's method.

While Dr. Rypka stopped at three attributes and thereafter incorporated a sub-scheme to complete the classification, presumably due to the computing limits imposed by his Burroughs (now UNISYS) 5700 TimeShare Terminal, it is easy, using the computing power of a modern PC, to optionally match the number of attributes in the target set with the number of attributes in the group or to stop computing when separation of the elements reaches 100%. The failure to find 100% separation within the group indicates the need for additional attributes to be incorporated into the group. However, since the process is dynamic and dependent upon attribute values, attributes which are included in the target set may still be necessary when values change.

Since Dr. Rypka's method employs the method of attribute value clustering as shown here, I wonder if the level of the book chapter or page hierarchy should be changed?

Coincidentally, I am very much aware of this clustering method having discovered and applied a sort routine which the clustering method serves to demonstrate perfectly.

Typative (talk) 16:04, 9 September 2008 (UTC)Reply

notes

↑ Biological Identification with Computers edited by R.J. Pankhurst, British museum (natural history) London, England proceedings of a meeting held at Kings College, Cambridge 27 and 28 September 1973 of the Systematics Association Special Volume Number 7 and published by the Academic Press 1975 noting the work of Eugene W. Rypka, Dept. of Microbiology, Lovelace Center for Health Sciences, Albuquerque, New Mexico, "Pattern Recognition and Microbial Identification." ISBN 0125448503

A lot of people are tricked [1] into thinking that book writers and newspaper journalists first come up with a title, then create an outline, then start filling in the information -- merely because that's the order they read in.

So I strongly recommend that people write down everything they know about a subject first. After you can't think of anything else about the subject, then start organizing. So ... yes, I agree that the book page hierarchy should be changed, someday. But rather than put it into the "final" order, I much prefer putting the pages of the book into an "easy to shuffle and reorganize" order -- the "flat page structure" (WB:NP).

As I already said on Talk:Optimal Classification, I think this algorithm needs to be placed into a more general book that includes more than just this one algorithm, but I haven't yet decided which one is best yet (or if we should start yet another book).

I am confused. When you mention "this clustering method", are you referring to Rypka's algorithm, or one of the other 10 algorithms listed at [[Optimal Classification/clustering] ?

When I talk about clustering, I use the term "number of attributes" to describe the amount of information attached to each element -- in my case, it's almost always "3" integers. But I see that other people often work with far more information, and with textual tags rather than simple integers.

I use the term "number of clusters" to describe the number of different "buckets" that I use to sort the elements. I toss each element into one and only one bucket. I often use 256 buckets.

So you can see how I would be confused if someone told me that "number of attributes" and "number of clusters" are the same thing. I suspect it's a terminology thing -- those terms mean something else to you.

You know what would help? It would help if someone wrote a book about this subject. :-) --DavidCary (talk) 05:20, 10 September 2008 (UTC)Reply

I will have to look at the other "clustering" algorithms before I can comment further. Over the years I have studied many algorithms to learn about them and many of those algorithms may have changed and I have not looked at the changes.

Wikipedia Cluster analysis - "Elbow criterion" - obvious misinterpretation/original research

Theoretical and Empirical Separation Curves

In the context we are discussing a set or multiset of values or states is called an attribute when it is used in combination with other attributes to define a bounded class of elements that have the same attributes in common. It appears that the point of confusion here is that you (and others) are using the term "cluster" to refer to a "subset of a set" or to a group of attributes which define a bounded class rather than using the term "cluster" to refer to the multiset count for each value or state of an attribute. In this context the set or multiset of values of an attribute will always have a number of clusters equal to the count of its values or states.

The number of attributes selected to derive target set size of the subset is usually not fixed but initially set to one and thereafter incremented progressively until 100% separation is achieved or before the target set size exceeds computer capacity or the time allocated for classification is exceeded. The minimum number of attributes can be determined mathematically as follows:

t_{min}={\frac {\log G}{\log V}}

, where:^[1]

t_min is the minimal number of characteristics to result in theoretical separation,
G is the number of elements in the bounded class and
V is the highest value of logic in the group.

Typative (talk) 11:27, 10 September 2008 (UTC)Reply

edit break

My comments are indented below... Typative (talk) 22:05, 10 September 2008 (UTC)Reply

One of the difficult things about teaching is trying to describe some concept to a student without dragging in a bunch of other abstract concepts that the student hasn't learned yet.

I like the flags example.

Look at it again. Recall the statement under the Flag overlay grid? The areas represent attributes, the colors represent attribute values and the flags represent elements. Typative (talk) 18:50, 10 September 2008 (UTC)Reply

An example for clustering:

I have a pile of rocks. And I'm sorting them into some literal buckets.

To sort you need first to separate. To separate you first need a criteria upon which to separate. The criteria in regard to rocks might be weight or size or color or shape. What criteria are you going to use to separate the rocks? Typative (talk) 18:50, 10 September 2008 (UTC)Reply

As you probably already know, any one rock can be put into at most one bucket.

Correct but irrelevant. The rock itself can not be split between two buckets without using a mallet or sledge hammer. Typative (talk) 18:50, 10 September 2008 (UTC)Reply

Once I'm done, each bucket contains one cluster of rocks.

What makes these rocks a cluster? The weight or size or color or shape? Being in the same bucket without a criteria does not make them a cluster, like a "cluster" of stars in the heavens with no other criteria except proximity, i.e., you can't rationally say they are a cluster because they are close together in the bucket. Typative (talk) 18:50, 10 September 2008 (UTC)Reply

I suppose you could call that cluster "a subset of a set or group of attributes which define a bounded class". Forgive me for thinking that is unnecessarily confusing.

No. It is improper to use the term cluster in this way. The word set is a generalized term but the word cluster is not. Think about the flag above. A flag is made up of different colors in different locations but a flag is not properly defined as a cluster of colors. Typative (talk) 18:50, 10 September 2008 (UTC)Reply

Another example for clustering:

I'm drawing a map. I want to use 3 colors for "forest", "plains", and "houses/roads", and I draw a thin black line between each of those colors. But the photograph I'm using for a reference has jillions of tiny little spots, one for each tree.

I draw a loop around each "forest", and fill in with with the dark green "forest" color.

In several places there's a flat path that winds through an area that is dense with trees on both sides of the path. In some of those places, I say there is 2 different forests bordering that path -- I draw 2 loops of black line, and the path between those loops has the "plains" color. Other times I call it one big forest -- I draw one big loop of back line, and points on that path have the "forest" color.

After I'm done, each forest is a cluster of trees.

Okay, I believe I see the problem. You are deriving the meaning of the word cluster from its colloquial uses. Another example would be an oak leaf cluster.

Also, you are using "forest", "plains", and "houses/roads" as if they were attributes when in fact they are elements in your bounded class. (See attribute-valued system.) Typative (talk) 18:50, 10 September 2008 (UTC)Reply

Even though I only know 2 attributes about each tree (its x and y location on the photograph), I may draw dozens of small forests on this map.

You are describing "trees" here as if they are elements but using them above as if they are attributes. Typative (talk) 18:50, 10 September 2008 (UTC)Reply

This kind of spacial clustering is hard-wired in the human eye/brain. But just because humans can quickly draw loops around each forest doesn't make it easy to program a computer to do it.

In particular, counting "how many forests are in this photograph?" is surprisingly difficult for current computer algorithms, given how quickly and easily humans can do it "without giving it much thought".

--DavidCary (talk) 16:45, 10 September 2008 (UTC)Reply

edit break

Okay, I have inserted comments above instead of placing them down here to help eliminate confusion but I need to eat super before concluding. Please review the attribute-value systems Wikipedia article linked to above and I will return as soon as possible. Typative (talk) 18:50, 10 September 2008 (UTC)Reply

A brief look at the list of clustering algorithms tends to convince me now that most do not belong here,i.e., clustering and classification are two different things of which clustering is but one step in the process of classification just as it is but one step in the process of executing an artificial neural network and just as it appears to be but one step in the other examples of what are called clustering algorithms in the above list. Putting it here would be a bit like a page or a chapter about multiplication being put in a book about linear programming. Sure, linear programming relies upon multiplication but so do many other techniques and methodologies. Typative (talk) 22:01, 10 September 2008 (UTC)Reply

Typative (talk) 19:49, 11 September 2008 (UTC)Reply

Dear fellow Wikibookian,

I see that what I've been doing is actually vector quantization.

So we shouldn't be surprised that what I am doing is technically not exactly the same thing as "classification".

I've been asked To separate you first need a criteria upon which to separate. ... What criteria are you going to use to separate the rocks? What makes these rocks a cluster? The weight or size or color or shape?

I don't know ahead of time which criteria I will use.

I have a bunch of rocks. (I actually work with other things, but I don't want to bring in a bunch of irrelevant details, so let's pretend I'm working with rocks).

For each one, I measure its density and hardness.

I suppose I *could* decide, before looking at any rocks (top-down), exactly which range of density and hardness to assign to each bucket.

However, many times I do something more like spread them across the ground in a density-vs-hardness graph.

In an ideal world, there would be a few discrete points in the density-vs-hardness graph ("sandstone", "granite", "salt"), and every rock would fall exactly on top of one of those points, making a tower.

And in that ideal world, it would be easy to discover exactly how many kinds of rocks I have, *after* I have already divided up all the rocks into a few discrete towers (bottom-up), by counting how many towers I see.

Alas, because of my own measurement error, and also because of variations in the rocks, very rarely do 2 rocks fall at exactly the same point in this graph.

Still, I can usually visually pick out one group of closely spaced rocks, call them all "Type 1" and put them into bucket #1, and pick out another group of closely spaced rocks (far away form the first group), call them "Type 2", put them into bucket #2, etc.

Now it may turn out that *all* of tomorrow's rocks have the same hardness, and my criteria ends up being density alone.

Or it may turn out that *all* of tomorrow's rocks have the same density, and my criteria ends up being hardness alone.

Or it may turn out that *all* of tomorrow's rocks fall along a diagonal line, and *either* density alone *or* hardness alone is sufficient to separate the piles I discover.

Or it may turn out that they end up in one tight pile, and I may decide to use color or shape or some other measurement to divide them.

But I won't know until I try it.

I don't know ahead of time which criteria I will use.

Tomorrow, *after* I divide up all the rocks, *then* I could give you some criteria -- a range of hardness and density (or something else) that describes the rocks in each bucket.

Let me say again -- what I'm actually doing in this example is vector quantization.

The reason I'm bringing up vector quantization is that I think that the "Optimal Classification" algorithm *should* go into some book at Wikibooks. However, I think a book that discussed that algorithm *and* other closely-related algorithms would be *better* than a book about a single algorithm.

And "vector quantization" is the most closely related algorithm that I am familiar with.

I suspect other algorithms have been developed that are even more closely related -- the book should talk about those as well.

If we discover that there are 100 other algorithms that are even more closely related to "Optimal Classification" than "vector quantization", then I would agree that "vector quantization" doesn't belong in this book -- we would put the most-closely-related algorithms in this book, and put the other algorithms (and vector quantization) in some other book or books.

I agree that "vector quantization" and "optimal classification" are "two different things".

But how can a book discuss several algorithms unless those algorithms are different?

If X is the first step in the process of doing Y, then I think a Wikibook about Y should *either* also cover X, *or* have a link to some other prerequisite book. (For example, Microprocessor Design points out that Digital Circuits is a prerequisite).

So would you prefer me to keep talking about clustering and classification and "vector quantization" in this book? Or are you going to point out a more appropriate prerequisite book?

--DavidCary (talk) 21:17, 11 September 2008 (UTC)Reply

edit break

Okay, lets see. First it might help if you understand that you don't have to know what label you are going to put on a bucket because the first rock to go into any bucket will do that for you. its really that simple. Of course you have to decide thereafter 1.) what criteria the next rock you pick up embodies and 2.) if a bucket already contains a rock with the same criteria or if you have to recruit another bucket. This might help.

How about other types of classification besides optimal? For instance, how about a chapter on temporal classification such as is used to create a family tree or a biological taxonomy that has a hierarchy which is not optimal but based on sequence of events? A chapter I think would be really great would be on the classification used in soil taxonomy. In fact I'm working on a way to classify soil using optimal classification instead of the method used to classify soil so it would be really helpful to have such a chapter. Lots of things just do not fit the optimal classification scheme. There could also be chapters on how create a classification table, that is, how to select elements, attributes and values or one on how to find applications. There could be a chapter on examples and a chapter on the whole theory behind classification and why do it and what advantage it offers. That way the book won't be entitled Optimal Classification but rather Classification with Optimal Classification as a chapter just like the other types of classification or topics about classification will have their own chapter. Typative (talk) 23:04, 11 September 2008 (UTC)Reply

Yes. I agree that it would be good to open this book up to all kinds of classification algorithms and related techniques. --DavidCary (talk) 23:11, 3 October 2008 (UTC)Reply

↑ See page 157, Primary Schemes footnote of the primary reference

[1] Biological Identification with Computers edited by R.J. Pankhurst, British museum (natural history) London, England proceedings of a meeting held at Kings College, Cambridge 27 and 28 September 1973 of the Systematics Association Special Volume Number 7 and published by the Academic Press 1975 noting the work of Eugene W. Rypka, Dept. of Microbiology, Lovelace Center for Health Sciences, Albuquerque, New Mexico, "Pattern Recognition and Microbial Identification." ISBN 0125448503

[2] See page 157, Primary Schemes footnote of the primary reference

[1]

[1]