BI

Extracting Metadata from the Power BI Report File

The more and more is Power BI used everywhere it’s starting to be really painful to incorporate it in any metadata management solution. This article is a result of 10 hours investigation on how to get metadata from Power BI files. To be honest, a lot of workarounds, community tools, and nearly zero support by Microsoft. Correct me if I’m wrong.

The original requirement was to extract metadata from more than five hundred Power BI reports deployed in on-premises Power BI Report Server. It’s not an issue to download all these reports to the local folder via a custom tool (I will publish one soon) or using the Report Server REST API. But this API doesn’t expose metadata contained inside the Power BI report file (*.pbix). And opposite there is nothing directly inside the Power BI Desktop tool to export metadata in some human-readable or further processing suitable format. A little bit more (means not a full metadata-export solution) can be realized with the Power BI Cloud Service API but this isn’t an option for on-premises running Power BI Report Server.

Let’s concentrate on the *.pbix file itself and keep the API option to another post.

If I’m talking about Power BI “metadata” I mean everything that is hidden inside the report: data sources, tables, columns, measures, relations, visuals used, etc. “External” metadata like where the report is sitting on the Report Server, how often it is used, and who can access it that’s easy to collect directly from the Report Server. But users, especially data analysts, DWH developers, and metadata solution specialists are more interested to see what is referenced from the DWH inside the Power BI report, how the whole data lineage looks like, how measures are calculated and relations used.

There basically two ways how to get this metadata:

  • via opening the *.pbix file in Power BI and connecting to the underlying SSAS service hosting an instance of the Power BI model
  • extracting the *.pbix file content and processing various files (*.json) from it.

I don’t want to describe the SSAS option there because we have realized this will require a significant amount of time to automate it somehow for hundreds of reports. It may also need dedicated hardware to process report changes in time. But still, it’s a valid solution and you can view more on it there:

Power BI files (*.pbix, *.pbit)

The *.pbix file is the main file of the Power BI report. It contains all metadata and also all data imported inside the model using the Import option for source queries. *.pbit is the Power BI Template file. It has in general the same content regarding metadata but it doesn’t contain any data imported from data sources like the *.pbix file. Both files are in fact *.zip files with other files packed inside. So rename them with *.zip extension and review the content.

It looks like that both files have the same content. But if you will review it twice there is a significant difference in how the DataModel is stored. Inside the *.pbix file is the model stored using Xpress9 compression to make the file much smaller because it contains data too. Inside the *.pbit template file it is an extracted DataModelShema in JSON format instead of it.

The Xpress9 compression is my point of view the main issue of why it’s so complicated to get the metadata from the Power BI. I did several attempts to extract the file playing with a lot of stuff from C++ libraries thru Python scripts, 7z, ending inside UWP Windows. Compression namespace but without success. All these tools can work with the original Express only compression algorithm which is a part of the Windows encryption API for a long time. But I really didn’t found anything to extract the new Xpress9 algorithm.

What does it mean? Simple answer: Without the ability to decompress the Xpress9 DataModel file only two options are in-game on how to get the Power BI metadata from the file:

  • via connection to running Power BI SSAS instance (as described above)
  • or via the option to save Power BI Template in the Power BI Desktop application and processing the *.pbit file.

Saving Power BI files as a template looks to be a simple solution at first look. But let’s go back to the initial requirement: process metadata for more than five hundred Power BI reports deployed to Report Server or even tens of files stored somewhere locally. There is no option to call Power BI from the command line to do this conversion in the background or something like that. It really means that every single Power bI report must be opened in Power BI Desktop and manually resaved as a Power BI Template file.

There is one solution in this video. We can use Power Automate Desktop to capture the manual conversion from *.pbix to *.pbit and then replay it automatically for other *.pbix file in the folder. This solution may be suitable for the initial conversion of multiple *.pbix files to templates but it may be hard to implement it on regular basis to the customer’s infrastructure. It’s still a workaround solution only.

The *.pbit file content

What is inside the Power BI Report file is pretty well described in this blog post.

The DataMashup file, containing Query Editor metadata, may be extracted with 7z:

7z x DataMashup -o".\DataMashupDir\" -aoa

It’s reporting some warnings but looks like all files are extracted properly:

For working with the DataModelSchema JSON file inside the *.pbit file there is one great tool available: https://tabulareditor.com. You can use it to open the *.pbit file directly and review its metadata:

It’s doing a great job in parsing the DataModelSchema JSON and it is open source, so you can get inspired in the source a lot on what is inside the JSON.

For the live connection to the running Power BI model, DAX Studio is the ultimate solution.

Conclusion

Without the Xpress9 issue, it will be easy to have some simple solution to download *.pbix files from dbo.Catalog table in Report Server, convert it to ZIP, extract DataModel JSON schema, parse it, and upload it to metadata catalog. Conversion to *.pbit makes it cumbersome and workarounds are complicated.

Further reading:

  • R library to get Power BI metadata (using the SSAS connection solution but with interesting info about DataModel file)

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *