What Came After IZE? Three Domains, Three Answers

In the previous post in this series, I discussed the technical details of IZE and its reception. Here I want to look at what came after — and where IZE-like ideas might still have potential.

The short version: IZE was forgotten, but the ideas it embodied — hierarchical clustering, single-word splits, dynamic navigation — were re-invented independently in several different domains. Each domain found a different answer, for reasons that are worth understanding.

IZE was introduced in 1988. The first graphical web browser, NCSA Mosaic, was released in 1993. Within a few years, hundreds of millions of people were using the web, and the need for better search and navigation tools became apparent.

The initial solution was a category structure. Web sites were manually curated into hierarchical directories for discovery — Yahoo! being the canonical example. As Hearst (2009) notes:

A fixed category structure helps define the information space, organizing information into a familiar structure for those who know the field, and providing a novice with scaffolding to help begin to understand the domain. (Hearst, 2009)

This worked until it didn't. The web grew faster than any editorial team could curate it, and hierarchical directories collapsed under scale. Engines like Google (1996) took a different approach: don't organize, rank. Powerful keyword-based retrieval combined with algorithmic ranking (PageRank) made web site categorization largely unnecessary. The search UI eventually became "10 blue links" with pagination and relatively minimal use of filters.

In academia, meanwhile, clustering approaches tried to impose post-hoc hierarchy on web results. The Scatter/Gather algorithm (Cutting et al., 1992) is an example. It grouped similar documents together and let users navigate the resulting clusters. Each cluster was defined by a set of keywords, and documents were assigned based on their similarity to those keywords. Compared with IZE, which used a single keyword to split results at each stage of a hierarchy, Scatter/Gather and other algorithms in this class used multiple, often hundreds of keywords per cluster. The clusters may have been coherent, but the lists of words defining them were not intuitive to users. Hearst's verdict is blunt: "The disadvantages of [many-word] clusters for user interfaces include their lack of predictability, their conflation of many dimensions simultaneously, the difficulty of labeling the groups, and the counter-intuitiveness of cluster subhierarchies."

The approach most similar to IZE from this era was Findex (Kummamuru et al., 2004) — almost certainly an independent re-invention, since IZE was not widely cited by this point. Findex used a hierarchical clustering algorithm to group web search results into a tree, then let users navigate it. Unlike Scatter/Gather, Findex used a single word or phrase to split results at each level — much closer to IZE's design. It used document titles and search snippets to identify frequent keywords, automatically handling stopwords and morphology. User studies showed it was most helpful with broad or ambiguous queries, or in early exploration of unfamiliar document sets.

Findex clustered search results, from Hearst (2009)
Findex clustered search results, from Hearst (2009)

Clusty.com was a commercial attempt at this -- clustering plus incremental refinement, applied to web search results.

As users entered queries — for example, "Walt Disney" — Clusty automatically created categories such as "Walt Disney World," "Collectables," "History," and "Biography". This categorized view allowed users to focus on the subtopic most relevant to them, reducing the need to sift through irrelevant results. (From an article about Clusty on Yippy.com)

You probably haven't used Clusty.com in a while. It eventually became clear that clustering was not a good fit for web search at scale:

It appears that, although hierarchical structures are commonly used and are useful when applied to smaller collections, as a navigation structure, they become unwieldy when applied to very large collections. (Hearst, 2009)

This domain is, I think, settled. Dynamic hierarchical clustering lost on the open web. It just doesn't work at that scale, with such heterogeneous content, and such a diverse user base.

The most successful personal information management tools -- Evernote, Obsidian, Tana, Roam -- let users create, search, and navigate their own documents. They rely on a combination of full-text search (Evernote excels here), manual or semi-automated tagging, and graph views (Obsidian). A graph view lets users see relationships between documents and navigate to related ones.

These tools are not based on clustering or faceted navigation. They're bets on different models of how people think about their own knowledge: as a flat searchable archive, as a network of linked ideas, as an outline. Each has devoted users. None has clearly won.

Enterprise search is the large-scale cousin of this problem: same challenge, higher stakes, often worse tools. A new employee trying to find relevant internal documents faces the same orientation problem as a user with a large personal archive — they don't know the vocabulary, they don't know what's there, keyword search only helps once you know what to search for. Plus, with many users, sources are wildly different, content gets stale or is conflicting, and hand-curating a table of contents typically fails.

An IZE-like approach — dynamically clustering your documents as you accumulate them, surfacing a navigable hierarchy without requiring you to tag everything manually — doesn't seem to have been seriously tried here. I'm not sure it would fail. The collection is smaller than the web, the user is often a domain expert, and orientation matters a lot. Whether algorithmically-generated hierarchies would help, or whether the lack of user control would just feel alienating, is genuinely unclear to me.

E-commerce went a different direction from web search. Because sites control their own catalogs, they can manually or automatically add consistent metadata to their products. In the early 2000s, academic projects such as Flamenco (Hearst, 2000; Yee et al., 2003) investigated faceted navigation, where each item has multiple orthogonal categories and users can filter by any of them. This work was quickly adopted by Amazon, eBay, and others. Today, faceted search and navigation is a standard feature of most e-commerce sites.

Flamenco faceted search, from Hearst (2009)
Flamenco faceted search, from Hearst (2009)

Facets work because the domain structure is pre-existing and stable. The catalog owner controls the vocabulary; the facets can be designed to match how users think about the product space. As Hearst points out, "If the facets do not reflect a user's mental model of the space, or if items are not assigned facet labels appropriately, the interface will suffer some of the same problems as directory structures." A mismatch signals to users that the site doesn't understand them, and they'll trust it less.

But faceted search has a known failure mode: users who don't know the vocabulary can't filter effectively. Someone browsing a new-to-them catalog — unfamiliar with the product categories, the brand names, the relevant attributes — often can't use facets productively. They bounce, or fall back to keyword search and struggle. The orientation problem is still unsolved.

Could a dynamic clustering layer above facets help users orient before they filter? That's the question I'll be taking up next. AI changes the picture too — LLM-powered query understanding, conversational search, and semantic retrieval all attack the orientation problem from different angles, and I'll get to those as well.

The idea of algorithmically generating a navigable hierarchy from a document collection didn't disappear with IZE — it was re-invented, independently, in each domain. On the open web, it lost to ranking. In personal information management and enterprise search, it was never really tried. In e-commerce, faceted navigation won the orientation battle, though not completely. The places where IZE-like ideas still seem interesting are exactly the ones where users arrive without a clear vocabulary: a new enterprise knowledge base, an unfamiliar product catalog, a personal archive they haven't touched in years.


Notes: This post was primarily human-authored, with AI assistance for research, editing, and organization. The AI filled a Secondary author role. The core ideas and final voice are mine.