Skip to main content

· 2 min read
Michael Robinson

Join us on Thursday, September 12th, 2024, from 6:00-9:00 pm PT at the Astronomer offices in San Francisco to learn more about the present and future of OpenLineage. Meet other members of the ecosystem, learn about the project’s goals and fundamental design, and participate in a discussion about the future of the project. Bring your ideas and vision for OpenLineage!

Agenda:

  • Unlocking Data Products with OpenLineage at Astronomer: Julian LaNeve and Jason Ma, Astronomer
  • OpenLineage: From Operators to Hooks by Maciej Obuchowski, Astronomer+GetInData/Xebia
  • Activating Operational Metadata with Airflow, Atlan and Openlineage by Kacper Muda, GetInData/Xebia
  • Hamilton, a Scaffold for all Your Python Platform Concerns (and a New OpenLineage Producer) by Stefan Krawczyk
  • Lightning Talk on New Marquez Features and the Marquez Project Roadmap by Willy Lulciuc, Marquez Lead, and Peter Hicks, Marquez Committer

· One min read
Michael Robinson

Join us on Tuesday, March 19th, 2024, from 5:30-8:00 pm at the Microsoft New England Conference Center in Boston to learn more about the current state of lineage in general and static lineage support in data catalogs in particular. Bring your ideas and vision for data lineage!

· One min read
Michael Robinson

At this year's Kafka Summit in London, two project committers, Paweł Leszczyński and Maciej Obuchowski, will give a talk entitled OpenLineage for Stream Processing on March 19th at 2:00 PM GMT.

As the abstract available on the summit website says, the talk will cover some of the 'many useful features completed or begun' recently related to stream processing, including:

  • a seamless OpenLineage & Apache Flink integration,
  • support for streaming jobs in Marquez,
  • progress on a built-in lineage API within the Flink codebase.

As the abstract goes on to say,

Cross-platform lineage allows for a holistic overview of data flow and its dependencies within organizations, including stream processing. This talk will provide an overview of the most recent developments in the OpenLineage Flink integration and share what’s in store for this important collaboration. This talk is a must-attend for those wishing to stay up-to-date on lineage developments in the stream processing world.

Register and attend this interesting talk if you can. And keep an eye out for an announcement about a recording if and when one becomes available.

Thanks, Maciej and Paweł, for spreading the word about these exciting developments in the project.

· One min read
Michael Robinson

Join us on Wednesday, January 31st, 2024, from 6:00-8:00 pm at the Confluent offices in London to learn more about the current state of lineage in general and streaming support in particular. Bring your ideas and vision for OpenLineage!

· One min read
Michael Robinson

Join us on Wednesday, November 29th, 2023, from 17:30-20:30 CET in Warsaw, Poland, to contribute to a discussion of the future of OpenLineage. On the tentative agenda:

  1. Mary Idamkina on OpenLineage in GCP Dataplex
  2. Paweł Leszczynski on recent developments in the Spark Integration
  3. Jakub Dardziński on Extracting lineage from PythonOperator - how come this is possible?
  4. Paweł Leszczynski on How to Become a Spark-OpenLineage Contributor in 5 Steps

· 3 min read
Yi Wang
Mars Lan

In the ever-evolving landscape of data management and governance, organizations constantly seek innovative solutions to streamline their processes, foster collaboration, and maximize the value of their data assets. Metaphor, born out of the minds behind LinkedIn's DataHub, has emerged as a modern data catalog and social platform for data. We take a unique approach by combining technical metadata with social collaboration, making data governance accessible and engaging for everyone in the organization. In this blog post, we explain the motivation behind Metaphor’s adoption of OpenLineage, delve into the integration methodology, and discuss its current status and benefits.

· 5 min read
Michael Robinson
Maciej Obuchowski
Julien Le Dem

This one is big. With the release of Airflow 2.7.0, the Airflow integration is now officially an Airflow Provider. This means the openlineage-airflow package is now apache-airflow-providers-openlineage in Airflow itself – a built-in feature of Airflow rather than an externally managed integration. Why does it matter where the integration “lives”? The short answer: as an Airflow Provider, the integration will offer improved reliability, broader support for operators, enhanced lineage, and easier implementation in custom operators going forward.

Although still a work in progress in some key respects, the OpenLineage Provider promises to pay dividends to users and contributors alike while accelerating the growth of the OpenLineage Ecosystem.

· 2 min read
Michael Robinson

Join us on Monday, September 18th, 2023, from 5:00-8:00 pm PT ET in Toronto to contribute to a discussion of the future of OpenLineage. On the tentative agenda:

  • Intros
  • Evolution of spec presentation/discussion (project background/history)
  • State of the community
  • Integrating OpenLineage with Metaphor (by special guests Ye & Ivan)
  • Spark/Column lineage update
  • Airflow Provider update
  • Roadmap Discussion
  • Action items review/next steps

Bring your ideas and vision for OpenLineage!

· 2 min read
Michael Robinson

Join us on Wednesday, August 30th, 2023, from 5:30-8:30 pm PT at the Astronomer offices in San Francisco to learn more about the present and future of OpenLineage. Meet other members of the ecosystem, learn about the project’s goals and fundamental design, and participate in a discussion about the future of the project. Bring your ideas and vision for OpenLineage!

Also on the agenda: a presentation by new contributor/partner John Lukenoff, who will be speaking about his lineage use case.