Unstructured DataPosted: January 17, 2012
The nontraditional data: Unstructured Data
According to www.zdnet.com‘s article, “Unstructured data: the elephant in the Big Data room,” organizations are being everything from introduced to even closely overwhelmed with “unstructured data” that is falling outside of the traditional “structured data” box. In the search to define this out of the box data, it has been explained as “data that does not reside in fixed locations,” as well as ” any corporate information that is not in a database-and it can be textual or non-textual”. Examples of this unstructured data includes: word docs, PDFs, emails, blogs, web pages, PowerPoint presentations, and instant messages (all which are textual)- as well as, medias: JPEGs, MP3s, and Flash files (all non-textual and Bitmap Objects). Information-management.com defined this data as follows:
“Unstructured data consists of any data stored in an unstructured format at an atomic level. That is, in the unstructured content, there is no conceptual definition and no data type definition – in textual documents, a word is simply a word. Some current technologies used for content searches on unstructured data require tagging entities such as names or applying keywords and meta tags. Therefore, human intervention is required to help make the unstructured data machine readable.”
Opposite: Structured Data
In understanding unstructured data we must also take a look at the original data category: structured data are “Data in fixed fields within a record or file; it can be tagged and accurately identified- for example, XML files, databases, and spreadsheets”. This type of data, as defined by information-management.com in the article “Two Worlds of Data–Unstructured and Structured,” refers to “anything that has an enforced composition to the atomic data types. Structured data is managed by technology that allows for querying and reporting against predetermined data types and understood relationships”. In other words, this data is understood and organized by computers in an efficient way for human individuals to understand.
Among companies, there is a growing concerned around this “unstructured data”. As many are realizing, ignored unstructured data can have damaging and catastrophic consequences, especially for companies and customer relationships. Popular applications such as Twitter and Facebook have been at the forefront in fueling this unstructured data wildfire. Why write a letter of complaint to a company when you can instantly tweet it?
With that tweet comes text-words-letters with meanings that may be harder to decipher by a reader than previously thought. Within these unstructured textual datas such as a tweet or post, exact intended meaning meaning of the text often times gets lost without the oral delivery. If a reader has a hard time figuring out true meaning from text, then it must be a greater challenge for computers. And here is where one of the biggest problems with unstructured data lies. As expressed in the zdnet.com article:
“There are still loads of things that we can’t do. There is a whole aspect of computing which PhD students are working on, which is basically trying to understand text. Understand sentiment. A five-year-old child can say in 30 seconds whether Mom or Dad is angry, or happy, or whatever. Sense the mood in the room. A computer program still has a hard time figuring that out…. The analysis of text, the analysis of video, the analysis of audio — it works a lot better in James Bond movies. In real life, it is extremely hard from a fundamental computer science perspective to understand all that information.”
This data is constantly growing and many companies are facing the problem of what to do with it all, how to store it, where to store it, and frankly what to even make of it. Suggested approaches to the problems that come along with unstructured data include building concepts and category relationships with the unstructured data and bringing the parts of the data that into the structured world by identifying context in order to categorize and organize with structured data (data bridging).
But bridging the unstructured with the structured may not be enough to meet the growing concern within companies surrounding this technological hot topic. It is evident that customer/corporation relationships are more important than ever, especially with all the instant feedback technologies at a consumer’s fingertips.
What some companies are doing:
Companies are realizing that these tweets and other unstructured data are important to corporate survival; and in order to stand out in front and on top, a handle on the ever growing unstructured data needs to be figured out. Though many companies admit this data importance, only a minuscule number actually have procedures and policies set in place in order to handle and deal with such data. Currently, unstructured data is a challenge to get a grasp of, but according to zdnet.com’s article, an awareness of the existence of this data is a start. As we continue to understand what to do with this data, more applications will be created to form that bridge, but until that firm hold and control is obtained, corporations can only take one step at a time by organizing unstructured information into manageable databases and starting to manage these smaller portions of the given data.