Christo Buschek und Jer Thorp haben sich für ein Knowing Machines Project einmal die Funktionsweise von “Large Language Models” (LLM) zu Beschreiben vorgenommen. Man scrollt durch die Website und kann sich Schritt für Schritt anschauen, was warum passiert. Welche Daten für das Training von solchen LLMs verwendet werden, aus welchen Quellen die Inhalte stammen und wie die sich zusammensetzen. Und welche Einflüsse die Zusammensetzung bestimmen.
Und vor allem beschreibt die Seite auch auch, was nicht automatisiert ist und wie das alles potentiell problematisch ist. Selber bin ich noch nicht ganz durch, aber die Woche hat die Mitte ja noch nicht ganz erreicht.
Just after Facebook lost 500M user profiles to the public Internet, it’s LinkedIn’s turn a week later. Wherever data is collected, data is subject to breach or theft.
Like the Facebook incident earlier this week, the information — including user profile IDs, email addresses and other PII — was scraped from the social-media platform.
Talend (NASDAQ: TLND), a leader in data integration and data integrity, to be acquired by private equity leader Thoma Bravo for approximately $2.4 billion.
Der Mitteilung des Bundesministerium für Wirtschaft und Energie (BMWi) zu Folge haben heute je 11 deutsche und französische Gründungsmitglieder die notariellen Unterlagen zur Gründung einer “Association internationale sans but lucratif“, kurz AISBL, unterzeichnet. Es handelt sich dabei um eine “Vereinigung ohne Gewinnerzielungsabsicht”, einer belgischen Gesellschaftsform, die dem deutschen gemeinnützigen Verein vergleichbar ist. Sitz der Vereinigung wird Brüssel sein.
Die juristische Geburt des Konzeptes GAIA-X stellt einen großen Schritt nach vorne für europäische Dateninfrastruktur dar.
is part one of One nation, tracked, an New York Times investigation series of smart phone information tracking and by Stuart A. Thompson and Charlie Warzel, within their privacy project. The research covers multiple topics, only starting out with an analysis of the potential contained in smartphone tracking information.
What we learned from the spy in your pocket.
Twelve Million Phones, One Dataset, Zero Privacy
The authors analyse a large dataset of location information from New York and Washington, DC, cell phone users. With the analysis, the article debunks myths about data privacy. The key takeaway of the analysis, to my interpretation are:
Data is not anonymous – the authors successfully identified a Senior Defense Department official and his wife. And this was possible during the Women’s March. According to authors, nearly half a million descended on the capital for this event. (Other sources only mention one hundred thousand attendants)
Data is not safe – the authors point out complex relationships of companies in the tracking business. Complexity makes it impossible to ensure ownership. There is no foolproof way for anyone or anywhere in the chain to prevent data from falling into the hands of a foreign security service.
Affected persons cannot consent – the authors criticism seems reasonable. Virtually all companies involved with tracking require user consent. And even cell phones make the geo-tracking feature visible to users. Only barely anyone in the business makes purpose transparent. In other words, no company prominently announce how they package and sell data or insight.
One Nation, Tracked
The article is a creepy read, but worth the time spending. The series One Nation, Tracked continues with 6 other parts:
Moore’s Law in Action: You’ll probably remember the prediction back from your days in University. In essence, Mr. Moore, founder of Fairchild Semi and CEO of Intel, predicted the density of transistors in modern integrated systems will double about every 18 months. He was right for a long time, while many predicted the end of his law. Visual Capitalist today linked a illustration showing the law in Action up to 2019.
Can the predictions from Moore’s Law keep up with technological innovation spanning almost 50 years? Watch this stunning animation to find out.
WTF of the day. The most advanced MySQL raytracer on the market right now. A raytracer, written in a single SELECT statment, that MySQL is able to process into an image. Pure Demoscene spirit here, whatever it is, make it run an animation or raytrace some spheres, something beautiful it was not meant to produce in first place.
Data driven product management requires measurements and metrics. Over at Product Management Insider, shares some detail about the pirate („AARRR!“) system and the HEART model.
In an ideal world, product managers have plenty of data they can use to validate their idea before building the wrong product. Yana Yushkina describes her journey from a Data Analyst to a Product Manager.
She talks about characteristics a good PM should bring, that include foundational analytical understanding, curiosity not just for technology but to search for the right answers in data, a sense of responsibility and the ability to communicate.
All of that combined with the right metrics at hand and self sufficient mindset will give a Product Manager the right answer from data.
In today’s edition of privacy related topics, it is Google that apparently stored customer passwords in plaintext. Google didn’t disclose which (enterprise) customers have been affected, but was clear that improper access is out of question. With this recent incident, Google joins ranks of Facebook, Instagram, but also Twitter and LinkedIn.
Google says it discovered a bug that caused some of its enterprise G Suite customers to have their passwords stored in an unhashed form for about 14 years.