Cloudera Data Platform’s integration with Azure delivers enterprise security and governance 

2 min read

Modern analytics and the resulting business insights unlock new opportunities to optimize company performance and open new revenue streams. Since these initiatives also heighten the need for greater security and governance of company data, Identity and Access Management (IAM) needs to be a foundational component of any corporate security plan that covers company data. Critical Read more

Hyperspace, an indexing subsystem for Apache Spark™, is now open source 

1 min read

For Microsoft’s internal teams and external customers, we store datasets that span from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative ”finding the needle in a haystack” type of queries (e.g., point-lookups, summarization). Resorting to linear Read more

What’s new in SandDance 3 

2 min read

SandDance, the open source data visualization tool from Microsoft Research, is launching several new features in version 3. Facets on all chart types We’ve added much more control to faceted data. All chart types now have the Facet By column feature. When a Facet By column contains quantitative data, you can specify the number of Read more

1 Comment

Microsoft open sources SandDance, a visual data exploration tool 

2 min read

SandDance, the beloved data visualization tool from Microsoft Research, has been re-released as an open source project on GitHub. This new version of SandDance has been re-written from the ground up as an embeddable component that works with modern JavaScript toolchains. The release is comprised of several components that work in native JavaScript or React Read more

Trill 103: Ingress, Egress, and Trill’s notion of time 

8 min read

Congratulations! You’ve made it to the next installment of our overview of Trill, Microsoft’s open source streaming data engine. As noted in our previous posts about basic queries and joins, Trill is a temporal query processor. Trill works with data that has some intrinsic notion of time. However, Trill doesn’t assign any semantics to that Read more

AzureR now available: Create, manage, and monitor Azure services with R 

4 min read

AzureR, a family of packages that provides tools to manage Azure resources from the open source R language, is now available. If you code in Python, C#, Java or JavaScript, you already have a rich selection of SDKs to choose from to interact with Azure. AzureR extends SDK support to the R language, by providing Read more

1 Comment

Trill 102: Temporal Joins 

5 min read

This post is the second in a sequence intended to introduce developers to the Trill streaming query engine, its programming model, and its capabilities. We introduced in the previous post the concept of snapshot semantics for temporal query processing. Here, we go deeper into the mechanics of snapshot semantics by showing its impact on one Read more

Microsoft open sources Data Accelerator for Apache Spark 

4 min read

Welcome to Data Accelerator! Data Accelerator for Apache Spark simplifies streaming big data using Spark. Data Accelerator has been used for two years within Microsoft for processing streamed data across many internal deployments handling data volumes at Microsoft scale. Offering an easy to use platform to learn and evaluate your streaming needs and requirements, we Read more

1 Comment

Trill 101: how to add temporal queries to your applications 

6 min read

Last December, we released Trill, an open source .NET library designed to process one trillion events a day. Trill provides a temporal query language enabling you to embed real-time analytics in your own application. In this blog post, we spend some time introducing how to get started using Trill. Trill’s query and data model A Read more

Microsoft open sources Trill, a powerful query processor for analytics at incredible speeds 

1 min read

In today’s demanding business environment, processing massive amounts of data each millisecond is becoming a common business requirement. We are excited to be announcing that an internal Microsoft project known as Trill—for processing “a trillion events per day”—is now being open sourced. Trill started as a research project at Microsoft Research in 2012, and has Read more

How to process streams of data with Apache Kafka and Spark 

23 min read

Data is produced every second, it comes from millions of sources and is constantly growing. Have you ever thought how much data you personally are generating every day? Data: direct result of our actions There’s data generated as a direct result of our actions and activities: Browsing twitter Using mobile apps Performing financial transactions Using Read more

3 Comments