IT Field of Competence Data Driven Innovations: Reference Story Berlin Big Data Center


Data Tools for a New Digital World

The amount of information we produce every day in our daily lives, at doctors’ offices, in workplaces, while shopping, or even driving is growing rapidly. We are in the process of creating a complete digital image of our reality. Not only are we facing a digital “second world”, but also this world is becoming more comprehensive, more precise, and more objective than our human perception. Today, sensors can capture extended physical dimensions all around us 24 hours and in real-time. Computers cannot only process these data, they can also understand them and learn from them. This all makes augmented realities possible. Knowing how to use the data correctly, one cannot only understand reality better, but also make better predictions.

Big Data- Not a Solution, but a Problem above All

Big data is a hot topic today; we have the processing power, the storage capacities, and above all the technologies to store and process these enormous amounts of data. Yet these data per se are neither information nor knowledge. Just like oil, data is initially a raw material to produce a wide range of benefits through numerous processing and refining steps. At the moment, there is a lack of qualified data scientists who are able to carry out big data analysis with the existing, very young, and premature technologies. To be able to do so, data scientists need a very broad knowledge of various fields of mathematics, computer science, and big data applications.
To address the bottleneck of data scientists, we need systems and tools that simplify complex data analysis with distributed data management and artificial intelligence methods. A good example of such systems is Apache Flink, an open source project originating from Berlin.

Apache Flink – A Success Story made in Berlin

Apache Flink – or the community's nickname Flink - is an internationally recognized open source system that is specialized in processing continuous data streams. It can be used to program applications that process and analyze large and fast incoming data in real-time.
The history of Flink's development serves as a model for the successful path of a research idea into a software system that has a life of its own, behind which there is a growing international community. Currently, Flink is used not only in universities and research institutes, but also in companies such as Zalando or the Otto Group. Flink stands as a European counterbalance on the market for big data systems, which is otherwise dominated by US systems and providers.

The origins of Apache Flink can be traced back to June 2008, when Prof. Volker Markl initially founded the Database Systems and Information Management (DIMA) Group at the Technische Universität (TU) Berlin. His vision of an innovative approach to the processing and analysis of Big Data resulted in the Stratosphere research project in 2010. Together with scientists from the HU Berlin and the Hasso Plattner Institute, an open source system for the scalable processing of massive data sets was developed and ultimately transferred to the Apache Software Foundation in 2014 under the name "Apache Flink".
Later the same year, “data Artisans” was created as a company focused on making Flink the next-generation open source platform for programming data-intensive applications. They currently have 16 employees in Berlin and San Francisco working on the commercial use of Flink.

In addition, Apache Flink was introduced as the basic technology for Berlin’s big data competence center called "Berlin Big Data Center” that was founded in 2014.

Berlin Big Data Center – Big Data Tools for All

The task of the Berlin Big Data Center is to develop improved data analysis systems and languages that enable big data analysis without system programming skills. Analysis programs are automatically translated to the selected execution platform and adapted to the computer architecture, data distribution, and system load. This automatic optimization, parallelization, and adaptation not only leads to a broader access and a broader application of data analyzes in economy, science, and society; but also reduces analysis costs and results in faster analysis time. It is only with the help of these technologies that can we give big data and thus digitization a broader basis.

Website Berlin Big Data Center

Photo: Flink Forward 2017 – © CC BY 2.0 by iStream

More Reference Stories from the IT Field of Competence Data Driven Innovations

Factor-E Analytics: Digitalisation made easy

Factor-E Analytics (Factor-E) is a Berlin company that develops and markets an intelligent solution for the digitalisation and networking of manufacturing plants. Through smart measuring of the electricity flow the company can evaluate the productivity and energy consumption of production units – regardless of whether they have an IT interface or not. Read more ...