SharePoint metadata design principles

November 30, 2012April 26, 2012Technology

Metadata is a messy subject. So messy, in fact, that many SharePoint projects never truly deal with it.

Metadata is complicated, full of dependencies we cannot control and—to put it mildly—not exactly liked by end users. Once you move beyond the initial “oh wow” moment of demonstrating how metadata can help arrange documents in multiple useful dimensions and users encounter the additional work that’s required to tag a document—each and every time!—before they’re allowed to upload it, it’s typically game over for metadata.

You’re really only left with one of two options: an unenthusiastically adopted SharePoint implementation (grudgingly, your users do what they’re expected to do, but they’re openly hostile to it), or you scale back your metadata to “next to nothing,” which is the only way to ensure that people will keep using SharePoint. And what you’ve achieved is not much better than a shared folder—the very thing you wanted to get away from. (The open secret of Microsoft SharePoint is that what people really want is a folder.)

There is lots of really great advice out there about the practical ins and outs of working with stakeholders to capture and negotiate metadata properties for their libraries. And there are a lot of examples of poorly thought-out attempts at creating giant catalogues of enterprise metadata that look and act like a library classification system, and that SharePoint program managers seem to want to apply everywhere (whether appropriate to the purpose or not).

But beyond the typical high-level ECM advice that you should develop an enterprise-wide taxonomy before you engage business units at a lower level, nobody seems to have developed a pragmatic and outcomes-oriented framework for SharePoint metadata—a theoretically sound and practical approach that you can explain to end users, that they’ll embrace, and that will help you make metadata decisions.

So let me propose such an approach. I call it the “magic intersection” of metadata, as illustrated in the Venn diagram above.

In my ~10 years with SharePoint, it has emerged over time that there are three reasons—and three reasons only—for creating metadata. Three distinct business purposes make metadata necessary and useful in SharePoint. They are:

To support business processes: This is the most immediately useful application of metadata—and the only one that users can own unequivocally. Metadata to support business processes consists of fields that have immediate applicability for day-to-day document-centric work processes. They may include things like “Status,” “Task Owner” or “Contract Value.” This kind of metadata is in daily use to organize libraries and views, for example to determine a project’s status, group items by owner or track summary financials of a project portfolio. Here, metadata is clearly an enhancement over a simple shared folder, and users are happy to embrace it (and the extra work it creates) because it is useful.
To support search and findability: The pain related to searching for documents is delayed pain. While your document is still new and in active use, you don’t really care about whether you’ll be able to find it again in the future because you already know where it is. So the value of adding metadata fields that will support future findability is limited in the present moment, especially since the fields that will help you the most in the future (for example, “Customer Sector,” “Deal Size” or other broadly descriptive fields) may actually be a distraction right now. Search/findability oriented metadata fields are much harder to ‘sell’ to users than properties that support day-to-day work processes. (One approach that I’ve been tooling away at recently is to talk about some sort of social contract in relation to search-enhancing metadata: Tagging your own documents may not seem useful to you right now, but it is useful to others. Similarly, they would tag their documents now, for your benefit. Reciprocity may be the key to ‘selling’ this kind of metadata to users.)
To support records management: Records management in SharePoint unfortunately requires certain metadata properties in order to work properly. Since you can’t keep everything forever (because that’s a compliance risk), you need to tell SharePoint’s records management functionality what it is and how long to keep it (from that point forward, most other activities related to records management could, conceivably, be automated). The problem with this kind of metadata is that users don’t typically have the information and don’t care about it—at all. Others are afraid of making the wrong determination in relation to records management, and their instinct is to overstate the importance of a document and ‘save it forever.’ From a records management perspective, the last thing you want is to have to train all users in the records management principles your organization is subject to. This is cumbersome, costly and a thankless, unsustainable task.

All other possible purposes for metadata can reasonably be subordinated to one of the above categories.

Next, it is important to discuss the strategic intent of proposing such a model.

I follow two core principles for metadata design:

Principle #1 is that the number of metadata fields should be as small as possible while still meeting a reasonable number of business goals.
Principle #2 is that—ideally—all metadata fields should be generated during the active portion of the document lifecycle, to support business process.

The reason for #1 should be self-evident: to minimize the amount of work required to tag each document.

The reason for #2 is that it’s easiest to ask users to create metadata when there’s a clear and practical motivation for doing so. Another way to look at it is that I’ve seen a number of SharePoint deployments where re-tagging of documents is required at the point of transition from one phase of the document lifecycle to another: for example, before entering the records management stage, users need to enter or select RM-specific metadata and ‘close’ a document to turn it into a record. Re-tagging is tedious and results in user frustration (and subsequent lack of adoption).

What follows from the principles is that the discussion about metadata should focus primarily and deeply on business process. Metadata design workshops should approximately follow these proportions:

The majority of the time should be spent discussing metadata that has immediate, practical application in the business process realm. While it is important to pay some attention to the search and records management buckets, the hope is that the majority of the required tags can actually be gathered under the business process rubric, making metadata collection easier and more transparent to end users.

Let’s look at a practical example to illustrate the proposed framework.

A local government legal department of around 5 lawyers and 5 legal and administrative assistants manages the day-to-day legal work in the municipal offices. Much of it is requests for legal opinions from other departments, but a substantial number of the department’s resources are in fact dedicated to managing claims against the municipality. Claims can include litigation brought against the city as a result of accidents that allegedly happened because of poorly maintained roads, municipally managed trees falling on private roofs during rain storms, etc. The city is subject to freedom of information legislation but also needs to be aware of privacy regulations, especially if children are involved in any of the claims.

We would start by mapping out the document lifecycle for this team once we have completed a basic content audit to understand the types of documents that are to be managed here. Let’s pick a SharePoint document set called “Claim” for our example. This set collects and manages all documents related to a claim, such as the original filing, any supporting evidence (scanned and uploaded), the initial legal opinion from the city’s in-house counsel, the court filing, etc.

The document set grows and changes as the claim moves through the document lifecycle, which is—to an extent—designed to be a mirror of the associated business process. When an assistant initially creates the document set, a small but significant subset of the full metadata is captured—we enter what we know at this time; this includes information such as “Opposing Counsel,” “Location of Accident,” etc. (information that will support both our short term business processes and our searchability objectives later).

As the document set increases in terms of the number of documents it contains, its related business activities—and metadata—also ramp up. From a business process perspective, the primary operating parameters of the libraries where these document sets are kept are the “Claim Status” and “Assigned to” fields, which allow us to support basic claim status tracking by employee using SharePoint views. These help us understand objectively where we are at and can be used as the basis for a weekly team meeting.

As more information about the claim becomes available, we are reminded to add it to the metadata for the document, for example by means of a workflow which asks for the now-required additional fields prior to allowing us to advance the status of a document (for example, from “Preparing” to “Ready for Court,” we now also need to capture whether the claim involves children under the age of 12, information that is relevant for counsel at this point).

Once the claim winds down, either through a judgment or a settlement, those final documents are included in the document set, and setting the set’s status to “Closed” prompts the assistant to enter the validity period of the settlement. This readies the document set as a record and applies the relevant SharePoint information management policies (IMPs) to set retention periods, etc.

This example illustrates the application of the proposed metadata design principles as follows:

As much as possible, metadata is collected while the documents are in the active part of the lifecycle.
Most of the metadata fields are captured to facilitate day-to-day business processes in the legal department, but many are also very useful to support searchability objectives (for example, this model enables is to search for ‘all claims we’ve settled where opposing counsel was Peter Smith’).
Basic records management information is collected during the active part of the lifecycle: we know how long to manage this record for, and we know under what conditions we may disclose it under freedom of information if children were involved.
The majority of metadata helps keep track of our daily work and makes the department’s activities more transparent and easier to manage. (While this may not be comfortable for all staff members, it’s a big selling point for management.)

As these “magic intersection” metadata design principles are very much a work in progress for me, I would welcome any comments and input you have from your own best practices in metadata design.

11 thoughts on “SharePoint metadata design principles”

Ruven Gotz

April 26, 2012 at 11:12 pm

This post crystalizes (and very clearly articulates) a line of reasoning that has been circulating in my mind for a while. My experience with clients mirrors Carsten’s, and has led me to pare back the use of metadata.

Just this week I had the opportunity to hear USAF Lt. Col. David Sanchez speak at SHARE 2012 in Atlanta about ‘weaponized SharePoint’. His metadata involves life & death decision making and protection of content vital to national security. The gist of his argument was that the metadata is crucial to the operation of the system, and that people are totally unreliable.

His solution is to leverage a lot of ‘out of the box’ policy and workflow functionality along with automated concept classification tools that automate the assignment of content types and associated metadata in ways that ensure compliance when combined with workflows.

This work is complex in his scenario due to the nature of the multiple dimensions of security and audiences that he has to deal with.

I think that we in the SharePoint community need to think hard about the ideas laid out here and validate (or refute) them, leading to an adjustment in the way we design and build our SharePoint solutions.

-Ruven
Pingback: Program | Pearltrees
Sarah Haase

April 27, 2012 at 3:11 pm

I completely agree, Ruven. Carsten does a great job describing the intersection between metadata needs and end-user willingness. In the end, we have to focus on the metadata that will solve our users’ pain. If we can do that, we have a hope of getting them to embrace metadata population. If not, we’re sunk before we even begin.
Paul Culmsee

April 27, 2012 at 7:34 pm

Hi Carsten…

As I read this I was thinking about one of my current clients. They are the project and contract management practice department. They are accountable for improving the quality of how projects are managed and the contracts that underpin them. This is the world of policies, procedures, knowledge management, changing prevailing attitudes towards project delivery.

But they do not run projects themselves, and the organsiation has many different project delivery models, contract types and is geographically dispersed across all of Western Australia. To put it more bluntly, this is essentially a giant community of practice scenario.

Now in this scenario, when I apply your model, search tends to be an equal (or bigger) bubble than business process. Granted, the vast majority of scenarios are more like the situation you describe here are more transactional and process focused because ultimately its about getting shit done better than before.

So this led me to (re)thinking about my facets of collaboration model. http://www.cleverworkarounds.com/2011/01/19/the-facets-of-collaboration-part-2enter-the-matrix/. That was an attempt to provide a better framing for collaborative scenarios and if I was to put a label onto the example you described here, I would argue that it was primary a task based scenario, incorporating transactional as well as knowledge work. But the scenario I describe is on the trait side of the fence.

So I wonder how your model would work with the facets model. Imagine “sizing” each bubble according to each quadrant?

regards

Paul
Carsten Knoch

April 27, 2012 at 7:47 pm

Hi Paul,

I will take a look at your blog post; sounds like there are interesting connections to be made.

For me, the point was mainly to foreground the collection of metadata to support day-to-day business processes… and ideally, to try and collect _all_ metadata in this bucket, as I see the other buckets as a “hard sell” to users. Of course, how hard of a sell depends on the sophistication of your users. My point was that if we could fit our entire minimal term set into the “magic intersection” we would by definition no longer have to justify metadata to users at all because its value would be self-evident.

I realize of course that this is an ideal state and no project is ever likely to get there entirely. But that’s what I’d like to try and strive for because it solves several problems with a single approach.

Best,
Carsten
Sue Hanley

April 28, 2012 at 11:06 am

I agree with Ruven that this is a great way to think about metadata. I am firmly in the “just enough to do the job” camp of creating and assigning metadata. I think your three business purposes for metadata are spot on. Whether you are a power user or a consultant, being thoughtful about metadata is going to result in a better outcome. I think you typically get about 15 seconds or less of “metadata patience” when you ask people to add content to SharePoint in a collaboration scenario so you better have a really good reason for every metadata value. In a publishing scenario (for example, HR publishing benefits information on the intranet), you can ask for more time to classify content and, if the metadata choices meet your criteria, you can be a little more flexible. As with pretty much all things in SharePoint, just because you CAN (create a gazillion possible metadata columns), doesn’t mean you SHOULD!
Johann Visagie

May 3, 2012 at 9:21 am
Carsten Knoch

May 3, 2012 at 2:09 pm

Looking for a freely available term set can be a good starting point. But the point I’m making is that–whether you’re constructing, downloading or buying one–off-the-shelf ontologies are never a match for a specific organization’s needs because those needs are specific to the business objectives, software implementation, user behaviour, etc. Which is why I’m advocating for this “ground-up” model of creating a minimal term set that’s anchored in people’s actual day-to-day work rather than an outside classification.
Ken Burgess

May 5, 2012 at 2:16 pm

An excellent description of the problem and some thoughtful and useful comments. I would like to throw a wrench into all that by heading out on a tangent (a “talent” I seem to be genetically predisposed to).

At what point do we stop asking users to manage their own information (something they are not good at and have no interest in doing) and hire someone qualified to do it for them? (oops, I think my bias is showing…)

Yes, the business process has to aimed at, and completed by, the business users. But as you point out, it is a hard (impossible?) sell to get them to give a damn about search or records management. So rather than rely on them to supply that metadata (which is vital to the business even if nobody recognizes it), perhaps someone else has to step in to add the necessary information. Maybe this is what Ruven is suggesting could be done via workflows? Inferring metadata values based on other metadata values would be great, but as a colleague of mine keeps pointing out – “Nobody has invented “ESP.exe” yet.”

As you can tell, I am no expert and have no solutions to offer, just some comments to “stir the pot”.
Carsten Knoch

May 8, 2012 at 3:29 pm

Some organizations in fact do this. I once interviewed with a “big four” consulting firm for a position in their international knowledge management team, and they actually have a “knowledge harvesting team” (could never quite figure out if that sounded more like farming or organ harvesting, but neither association made me feel like “wow, that sounds like a fun job!”).

I can certainly see some validity to the approach (if you want it done right, just do it yourself), but it is expensive and labour-intensive. It’s also not scalable at all. It results in creating another chain of command (the chain of harvesting?) that is required to identify, scrub, nominate, curate, bless and publish. So while it would possibly work, I feel philosophically (and economically) opposed to not letting people manage their own information.

Instead, I continue to toil away at figuring out how to make better systems and user experiences to encourage people to do it themselves. There are, of course, some indications out there in the world that certain tactics can result in taking ownership for managing your own information: hash tags on Twitter are a good example. So there’s hope.
David Roper

July 3, 2012 at 8:35 am

An interesting article that reinforces and very clearly articulates views that I too had developed in the course of attempting to specify requirements for a new legal document management system.

Users like folders; they understand folders; and they don’t like being required to enter metadata that they, themselves, don’t perceive to be of value. Somewhere I found a quote along the the lines that, to willingly enter metadata (even tags), users have to accept a social contract based on reciprocality. Absent obvious reciprocality the social contract will break unless it is punitively enforced by management; and that never happens because management is more focussed on productivity today than tomorrow. This may even be a correct view; there is a very good chance in legal document management that the great majority of documents will never be retrieved, so the cost in time and aggravation in correctly tagging every document may not in fact be justified. Users get shot for not doing todays work, they rarely get shot for making somebody else’s future work more difficult. This means that a reasonably coherent folder structure in which each document is initially filed in exactly one place is actually the optimal (lowest cost) solution for the majority of users and the majority of documents.

The question seems, therefore, to be how best to exploit the short term (day-by-day) utility of a simple folder tree without loosing the long term (yearly or longer) flexibility of a metadata driven system. The answer I propose is that every folder should be a “smart” folder, populated by querying tags/ labels/ categories (pick your term). Filing a document in that folder should assign the folder’s and all its parent folders’ labels to the document. Moving a document from one folder to another should delete and reassign the document’s labels; copying a document should merge the two sets of lables. All other metadata should follow your principle #2.

This doen’t, of course, obviate the need to create a simple ontology of categories and to manage additions to that ontology and business needs dictate (e.g. new ptojects, new clients); and clearly the queries embodied in the smart folder must be restricted to those that can be expressed solely in terms of AND conjunctions. NOT and OR don’t give unique values for each category of label to be applied to the document. However, its seems (at least to me) that this is a reasonable compromise; certainly its no worse than a typical arranagement folders on a shared drive . Because all folders would be “smart” folders, it has the distinct advantage that folder hierarchies can be readily adjusted to meet differing business requirements – do I want to search first by client or by project or by work type or what – and can be tuned to produce rapidly navigable structures. These are (IMHO) the principle benefits of label driven classification.

The question I have, not being a SharePoint programmer, is: can it be done?