Tag: big data

  • Models All The Way Down

    Christo Buschek und Jer Thorp haben sich für ein Knowing Machines Project einmal die Funktionsweise von “Large Language Models” (LLM) zu Beschreiben vorgenommen. Man scrollt durch die Website und kann sich Schritt für Schritt anschauen, was warum passiert. Welche Daten für das Training von solchen LLMs verwendet werden, aus welchen Quellen die Inhalte stammen und wie die sich zusammensetzen. Und welche Einflüsse die Zusammensetzung bestimmen.

    Und vor allem beschreibt die Seite auch auch, was nicht automatisiert ist und wie das alles potentiell problematisch ist. Selber bin ich noch nicht ganz durch, aber die Woche hat die Mitte ja noch nicht ganz erreicht.

    Hier geht’s zu Models All The Way Down.

  • Google Data Cloud Summit – Recap

    Google’s Data Cloud Summit took place May 26th, 18PT. The summit is home to their big data products and offerings, that aim to help customers succeed in data driven businesses. Here is a summary of news and announcements:

    • Dataplex, an intelligent data fabric. The product allows management of data across multiple sources, including data lakes, data warehouses and data marts for the goal of centralizing management and governance. From there, Dataplex allows to make data available for analytics and data science.
    • Datastream, a server-less change data capture (CDC) and replication service. The service allows to syncronize datasets across multiple systems by transferring changes alone, thus reducing the amount of data transferred and increasing performance and reliability.
    • Announcement of Analytics Hub, a fully-managed service built on BigQuery. The service aims to provide an open ecosystem for sharing and exchanging data across organisations at scale. Part of the offering will be controls and monitoring over data usage and sharing. The hub will offer self service and monetization for data owners, while reducing the need to operate infrastructure for data owners.
    • Dataflow Prime, a no-ops, serverless data processing platform. Dataflow Prime is a managed offering of Apache Beam based data processing pipelines. The product will autoscale infrastructure.
    • Cloud Spanner will allow more flexible and granular instance sizing
    • Key Visualizer, an interactive monitoring tool to analyze usage patterns in Cloud Spanner
    • Cloud Bigtable lifts SLA to 99.999% and introduces new security features. Security features are namely customer managed encryption keys (Googles acronym CMEK) and audit logs. Alongside with SLAs, the product now aims at compliance with regulated industries.
    • Sessions are available on demand
    Google Data Cloud Summit

    Join us to learn how leading companies are powering innovation with our data solutions. Attend sessions, demos, and live Q&As to discover how data can help you make smarter business decisions and solve your organization’s most complex challenges.

    Google Data Cloud Summit

    Source: Home – Data Cloud Summit

  • DataRobot announces Zepl acquisition

    Zepl offers a data science and big data platform. The company was founded in 2014 to build a Jupyter like experience, with added collaboration capabilities. Today TechCrunch reports it’s acquisition by Boston-based DataRobot.

    DataRobot, the Boston-based automated machine learning startup, had a bushel of announcements this morning as it expanded its platform to give technical and nontechnical users alike something new. It also announced it has acquired Zepl, giving it an advanced development environment where data scientists can bring their own code to DataRobot. The two companies did […]

    Source: DataRobot expands platform and announces Zepl acquisition | TechCrunch

  • Talend to be Acquired by Thoma Bravo

    Talend (NASDAQ: TLND), a leader in data integration and data integrity, to be acquired by private equity leader Thoma Bravo for approximately $2.4 billion.

    Source: Talend to be Acquired by Thoma Bravo in a $2.4 Billion Transaction

  • Sumo Logic aims to raise $310 million in US IPO

    Sumo Logic plans to go public. The company offers log management services, along with analytics for the purpose of management and observability of IT Systems. The offer comes differentiated as a fully managed solution, delivered from the cloud. Now the company apparently plans for an IPO, reports Reuters:

    Source: Reuters – Big data firm Sumo Logic aims to raise $310 million in U.S. IPO

  • SAP and AWS announce IoT interoperability

    Meanwhile, after Microsoft announced their cooperation with SAP in IoT with Leonardo, AWS also announced tighter integration with SAPs business processes.

    High Level Architecture

    via aws.amazon.com/blogs

  • SAP and Microsoft bring IoT data together

    SAP and Microsoft bring IoT data together

    The previously announced cooperation shows first results. Particularly in form of Microsofts announcement of having integrated SAP Leonardo into their Azure IoT Hub.

    Microsoft and SAP Announce to bring IoT and Business Data together

    Source: SAP and Microsoft bring IoT data to the core of the business applications | Blog | Microsoft Azure

  • Amazon could write books.

    Today in dystopian news: Amazon, the book selling department, controlling about 40% of the US book market, collects reading habbits from their sales and Kindle. By now the corporation knows enough about it’s customers it could be generating best selling books. Spookey. And potentially game changing, when machines replace creative professions.

    Amazon has the ability to track vast amounts of reader data and use it to change the landscape of American fiction.

    Source: Amazon has so much data it could make algorithm-driven fiction — Quartz

  • The terrifying, hidden reality of Ridiculously Complicated Algorithms

    Leseempfehlung: Ein Journalist spricht mit einem anonymen Big Data Engineer/Analyst über die Komplexität von Algorithmen. Wie erschreckend die Abhängigkeit von undurchschaubaren Komponenten geworden ist gegenüber dem Einfluss den Maschinen damit auf unser Leben haben.

    Man kann das auch als Laie verstehen, wie ich meine, selbst mein Verständnis von Big Data reicht nur so weit als das als realistisch einzuschätzen.

    ‘I’ll lose my job if anyone knows about this.”There was a long silence which I didn’t dare to break. I had begged to make this meeting happen. And now the person I had long been trying to meet leaned towards me. “Someone is going to go through your book line by line,” he said, “to try to work out who I am.”He’d been a talented researcher, an academic, until his friend started a small technology company. He had joined the company and helped it to grow. It eventually became so big that the company had been acquired by one of the tech giants. And so, then, was he.He was now paid a fortune to help design the algorithms that were central to what the tech giant did. And he had signed solemn legal documents prohibiting him from speaking to me, or to anyone, about his work. But as the…

    Source: The terrifying, hidden reality of Ridiculously Complicated Algorithms