Nova Publications

Extensive thought leadership across peer-reviewed research, industry journals, technical blogs, and enterprise SAP publications

150+ Publications

10+ SAP Books authored

200+ Technical Articles

In-Memory Data Management Publication CoverSupply Chain Management Based on SAP Systems Publication CoverPractical Data Science with SAP Publication Cover

Nova Research Center

Massachusetts Institute of Technology

A Comparative Study of Data Storage and Processing Architectures for the Smart Grid

A number of governments and organizations around the world agree that the first step to address national and international problems such as energy independence, global warming or emergency resilience, is the redesign of electricity networks, known as Smart Grids. Typically, power grids have broadcast power from generation plants to large population of consumers on a sub-optimal way. Nevertheless, the fusion of energy delivery networks and digital information networks, along with the introduction of intelligent monitoring systems (Smart Meters) and renewable energies, would enable two-way electricity trading relationships between electricity suppliers and electricity consumers. The availability of real-time information on electricity demand and pricing, would enable suppliers optimizing their delivery systems, while consumers would have the means to minimize their bill by turning on appliances at off-peak hours. The construction of the Smart Grid entails the design and deployment of information networks and systems of unprecedented requirements on storage, real-time event processing and availability. In this paper, a series of system architectures to store and process Smart Meter reading data are explored and compared aiming to establish a solid foundation in which future intelligent systems could be supported.

Read the full publication

Dr. Alexander Zeier

Co-Author

Hasso Plattner Institute

Finally, A Real Business Intelligence System Is at Hand

In this chapter, we offer our most important insights regarding operational and analytical systems from a business perspective. We describe how they can be unified to create a fast combined system. We also discuss how benchmarking can be changed to evaluate the unified system from both an operational and analytical processing perspective. As we saw earlier, it used to be possible to perform Business Intelligence (BI) tasks on a company’s operational data. By the end of the 90s, however, this was no longer possible as data volumes had increased to such an extent that executing long-running analytical queries slowed systems down so much that they became unusable. BI solutions thus evolved over the years from the initial operational systems through to the current separation into operational and analytical domains. As we have seen, this separation causes a number of problems. Benchmarking systems have followed a similar trajectory, with current benchmarks measuring either operational or analytical performance, making it difficult to ascertain a systems true performance. With IMDB technology we now have the opportunity to once again reunite the operational and analytical domains. We can create BI solutions that analyze a company’s up-to-the-minute data without the need to create expensive secondary analytical systems. We are also able to create unified benchmarks that give us a much more accurate view of the performance of the entire system. This chapter describes these two topics in detail. In Sect. 7. 1, we cover the evolution of BI solutions from the initial operational systems through the separation into two domains, and then we give a recommendation regarding the unification of analytical and operational systems based on in-memory technology. In Sect. 7. 2, we examine benchmarking across the operational and analytical domains.

Read the full publication

Dr. Alexander Zeier

Co-Author

Hasso Plattner Institute

Main memory databases for enterprise applications

Enterprise applications are traditionally divided in transactional and analytical processing. This separation was essential as growing data volume and more complex requests were no longer performing feasibly on conventional relational databases. While many research activities in recent years focussed on the optimization of such separation - particularly in the last decade - databases as well as hardware continued to develop. On the one hand there are data management systems that organize data column-oriented and thereby ideally fulfill the requirement profile of analytical requests. On the other hand significantly more main memory is available to applications that allow to store the complete compressed database of an enterprise in combination with the equally significantly enhanced performance. Both developments enable processing of complex analytical requests in a fraction of a second and thus facilitate complete new business processes and -applications. Obviously the question arises whether the artificially introduced separation between OLTP and OLAP can be revoked and all requests be handled on a combined data set. This paper focuses on the characteristics of data processing in enterprise applications and demonstrates how selected technologies can optimize data processing. A further trend is the use of cloud computing and with it the outsourcing of the data centre to optimize cost efficiency. Here column-oriented inmemory databases are helpful as well as they permit a greater throughput, which in turn enables more effective use of the hardware and thus saves costs.

Read the full publication

Dr. Alexander Zeier

Co-Author

Precise and Scalable Querying of Syntactical Source Code Patterns Using Sample Code Snippets and a Database

While analyzing a log file of a text-based source code search engine we discovered that developers search for fine-grained syntactical patterns in 36% of queries. Currently, to cope with queries of this kind developers need to use regular expressions, to add redundant terms to the query or to combine searching with other tools provided by the development environment. To improve the expressiveness of the queries, these can be formulated as tree patterns of abstract syntax trees. These search patterns can be expressed by using query languages, such as XPath. However, developers usually do not work with either XPath or with AST. To shield developers from the complexity of query formulation we propose using sample code snippets as queries. The novelty of our approach is the combination of a query language that is very close to the surface programming language and a special database technology to store a large amount of abstract syntax trees. The advantage of this approach over existing source code query languages and search engines is the performance of both query formulation and query execution. This paper describes the technical details of the method and illustrates the value of this approach with performance measures and an industrial controlled experiment. All developers were able to complete the tasks of the experiment faster and more accurately by using our tool (ACS) than by using a text-based search engine. The number of false positives in the result lists was significantly decreased.

Read the full publication

Dr. Alexander Zeier

Co-Author

New Frontiers in Information and Software as Services

Towards Analytics-as-a-Service Using an In-Memory Column Database

For traditional data warehouses, mostly large and expensive server and storage systems are used. For small- and medium size companies, it is often too expensive to implement and run such systems. Given this situation, the SaaS model comes in handy, since these companies might opt to run their OLAP as a service. The challenge is then for the analytics service provider to minimize TCO by consolidating as many tenants onto as few servers as possible, a technique often referred to as multi-tenancy. In this article, we report on three different results on our research around building a cluster of multi-tenant main memory column databases for analytics as a service. For this purpose we ported SAP’s in-memory column database TREX to run in the Amazon cloud. We evaluated the relation between data size of a tenant and number of queries per second and created a formula which allows us to estimate how many tenants with different sizes and request rates can be put on one instance for our main memory database. We discuss findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon’s S3 near-line archiving storage and cached on the local VM disks.

Read the full publication

Dr. Alexander Zeier

Co-Author

RFID SysTech 2011

A Distributed EPC Discovery Service based on Peer-to-peer Technology

Supply chain visibility and real-time awareness are two of the major drivers for the implementation of Auto-ID technologies in Supply Chain Management. A prerequisite for achieving real-time awareness and company overlapping visibility is an infrastructure to enable companies to share supply chain information in a reliable and secure way. EPCglobal proposes an application layer protocol called the EPC discovery service, which is supposed to provide services to gather supply chain information from a number of independent resources, across company borders. Our investigations on pharmaceutical and tobacco supply chains revealed tremendous data volumes and network traffic, generated from RFID-enabled supply chain networks. It is highly questionable if a single global discovery service is able to cope with such requirements. In this paper, we address the question how distributed discovery services can deliver their service in a Peer-to-Peer (P2P) based manner. For this purpose, we analyzed the applicability of distinctive distribution schemes and present an approach, which enforces a distribution scheme that allows product managers to decide, which discovery service their data is stored at. Furthermore, we present a prototypical implementation that is based on the open source P2P protocol JXTA. Our architecture utilizes an unstructured P2P network coupled with cache optimizations for lowering response times concerning the processing of a query.

Read the full publication

Dr. Alexander Zeier

Co-Author

2010 First IEEE International Conference

A Comparative Study of Data Storage and Processing Architectures for the Smart Grid

A number of governments and organizations around the world agree that the first step to address national and international problems such as energy independence, global warming or emergency resilience, is the redesign of electricity networks, known as Smart Grids. Typically, power grids have "broadcasted" power from generation plants to large population of consumers on a suboptimal way. Nevertheless, the fusion of energy delivery networks and digital information networks, along with the introduction of intelligent monitoring systems (Smart Meters) and renewable energies, would enable two- way electricity trading relationships between electricity suppliers and electricity consumers. The availability of real-time information on electricity demand and pricing, would enable suppliers optimizing their delivery systems, while consumers would have the means to minimize their bill by turning on appliances at off-peak hours. The construction of the Smart Grid entails the design and deployment of information networks and systems of unprecedented requirements on storage, real-time event processing and availability. In this paper, a series of system architectures to store and process Smart Meter reading data are explored and compared aiming to establish a solid foundation in which future intelligent systems could be supported.

Read the full publication

Dr. Alexander Zeier

Co-Author