Format of XML files in Application extract zip file

MarcusH
Contributor III

Does anyone know if it is possible to format the XML files that are created by the application extract. In particular I want to format the Metadata extract so that there is a single line for the member definition. I want to run a compare process to check what differences there are between 2 files. I want the member name included in the difference line rather than (for example) the currency has changed without any indication of the member. So this

<member name...><properties><property name...>

instead of:

<member name...>

<properties>

<property name...>

Thanks

1 ACCEPTED SOLUTION

I get what you're doing.  It makes sense.  However, Git has the ability to track this without needing to mangle the XML.  in VS you can open your xml in a diff to see the individual changes.  Its built in. 🙂

 

View solution in original post

6 REPLIES 6

JackLacava
Community Manager
Community Manager

Comparing xmls by relying on text-formatting rules is the wrong approach; you want a dedicated tool, like xmldiff. There are other options, like the ones suggested here.

Hi Jack

Thanks for the reply. I have already written a script for comparing XML files. It's cumbersome and I was wondering if there was a better way. It's the format that OS produces the file in eg there's a setting somewhere in the API that controls the XML output. It would make Git much simpler.

Unfortunately, there's no "setting" controlling XML output specific to the formatting of indicated tags.  Its the nature of the XDocument library to output human formatted XML.  As an experiment you can literally "flatten" each member tag section using a regex and add it back to the XDocument object.   Upon writing that document back to a file, you will find your previously flattened xml all nicely formatted again.

    public static void XdocumentFormatExperiment(string inputFilePath, string outputFilePath)
    {
        XDocument doc = XDocument.Load(inputFilePath);

        foreach (var member in doc.Descendants("member").ToList())
        {
            // Convert the member element and its children to a single line string
            string memberAsString = member.ToString();
            string singleLineMember = Regex.Replace(memberAsString, @"\r\n?|\n", "");
            singleLineMember = Regex.Replace(singleLineMember, @">\s*<", "><");
            Console.WriteLine(singleLineMember); // all nicely flattened on one line...

            // replace the original <member tag with the flattened one
            member.ReplaceWith(XElement.Parse(singleLineMember));
        }
         //write out the new xml doc with the flattened member tags
         // open the file and find they're all formatted again.
        doc.Save(outputFilePath);
    }

 

RobbSalzmann
Valued Contributor

Why are you wanting to use string mangling to compare differences in XML files?  Better to use the XML libraries for this.  

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

internal class XmlMetadataMemberComparer
{
  public static void CompareXmlFiles(string filePath1, string filePath2)
  {
    XDocument doc1 = XDocument.Load(filePath1);
    XDocument doc2 = XDocument.Load(filePath2);

    var members1 = doc1.Descendants("member");
    var members2 = doc2.Descendants("member");

    foreach (var member1 in members1)
    {
      var memberName = member1.Attribute("name")?.Value;
      var member2 = members2.FirstOrDefault(m => m.Attribute("name")?.Value == memberName);

      if (member2 != null)
      {
        CompareMemberProperties(memberName, member1, member2);
      }
    }
  }

  private static void CompareMemberProperties(string memberName, XElement member1, XElement member2)
  {
    var properties1 = member1.Descendants("property").ToDictionary(p => p.Attribute("name")?.Value, p => p.Attribute("value")?.Value);
    var properties2 = member2.Descendants("property").ToDictionary(p => p.Attribute("name")?.Value, p => p.Attribute("value")?.Value);

    foreach (var prop1 in properties1)
    {
      if (properties2.TryGetValue(prop1.Key, out var value2) && prop1.Value != value2)
      {
        Console.WriteLine($"Difference found in member '{memberName}', property '{prop1.Key}': File1='{prop1.Value}', File2='{value2}'");
      }
    }
  }
}

If you really must use string mangling, this should get you close:

internal class XmlMangler
{
    public static void FlattenMemberTagContents(string inputFilePath, string outputFilePath)
    {
        string xmlContent = File.ReadAllText(inputFilePath);
        string pattern = @"(<member.*?>)([\s\S]*?)(<\/member>)";
        string formattedXml = Regex.Replace(xmlContent, pattern, SingleLineMember);
        File.WriteAllText(outputFilePath, formattedXml);
    }

    private static string SingleLineMember(Match m)
    {
        string singleLine = Regex.Replace(m.Value, @"\s+", " ");
        return singleLine;
    }
}

 

It's to use with Git. I wanted to know if there were any options to control the format of the XML to make the Git versioning simpler. Thanks for the code; it looks really good.

I get what you're doing.  It makes sense.  However, Git has the ability to track this without needing to mangle the XML.  in VS you can open your xml in a diff to see the individual changes.  Its built in. 🙂