Applying big data analytics to human-generated data

Table of Contents

  1. Summary
  2. Human-generated data
  3. Big data analytics
  4. The integration challenge
  5. Mitigating risk
  6. Other use cases
  7. Key takeaways
  8. About Paul Miller

1. Summary

As the analytics industry moves to address an explosion in machine-generated data, another opportunity is already here. Emails, texts, documents, and other unstructured human-generated data — and the metadata associated with them — deliver significant insight to businesses with the resources and will to mine them.

Taking control of human-generated data provides companies with a more complete understanding of their intellectual property, enables them to aggregate business intelligence for sharing with employees, and allows security professionals to identify and mitigate both casual and deliberate breaches of policy. However, the operational cost of normalizing and mining this data is significant and requires a sound strategic understanding of technologies and goals. This research report evaluates the opportunities and challenges associated with analyzing human-generated data. It examines early adoption in the risk management and governance use cases, and evaluates the potential impact of these analytics for other use cases and industries.

Key findings include:

  • Human-generated data in word-processed documents, presentations, spreadsheets, and emails typically comprises an organization’s most prized assets, including key intellectual property, operating procedures, and the plans and strategies that shape future development.
  • Most organizations fail to adequately manage the creation, use, and dissemination of these key assets. As a result, they either introduce friction into collaboration through excessively strict access controls or risk serious data loss by sharing data too permissively.
  • Tools and techniques from the big data sector offer the means to monitor human-generated data across an organization’s different IT environments, protecting key assets and ensuring that regulatory obligations are met in a cost-effective and timely manner.
  • Data governance, audits, and other regulatory requirements are typically the initial drivers for deployment of these technologies, but other opportunities present themselves once systems and procedures are in place. The same tools, for example, can identify individuals and teams in different parts of a large organization who happen to be accessing similar resources without knowledge of one another, brokering introductions to teams that may be tackling complementary problems unwittingly.

Thumbnail image courtesy of OGGM/Thinkstock.