I have been working with XML files produced by a third-party tool which had almost no documentation. I wanted to find ways to automatically parse them in Java and it seemed that this involves getting or re-creating the XSD schema.There is a thread on Stackoverflow where multiple possible solutions are suggested. Unfortunately, the thread is closed for further contributions so I am going to describe my solution here.
I stumbled almost by chance on the XML toolset in IntelliJ which includes such functionality. My solution involved the following simple steps:
1. Open a (hopefully representative) XML file in IntelliJ
I used the free community version but I presume that the same functionality is included in the commercial version, too.
2. Go to "Tools"/"XML actions"/"Generate XSD schema from XML file"
I used the default values for all parameters apart from "Detect enumerations limit" which I changed from 10 to 2:
3. (optional) Verify the resulting XSD schema
The resulting XSD schema seems OK but optional elements may cause issues. If the XML file you have opened is missing them than the XSD schema will miss them, too. A better solution would be to create the XSD schema based on a set of files (I have dozens) rather than just on a single file. Unfortunately, I know of no tool that does that. Instead, I decided to verify the schema using the following method (based on Grzegorz Szpetkowski's excellent post on Stackoverflow):
4. (optional) Add optional elements to the XSD schema and adjust the data types
It turned out that a few of the other XML files did, indeed, contain optional elements that were not part of the XSD schema (i.e. they failed the xml validation) so I added them manually. Another manual change was setting the data type for the elements. I tried initially running the IntelliJ wizzards with automatic data type detection enabled ("Detect simple content type" set to "smart") but it did not work well. So I used in step 2 the "dumb" option of setting all data types to "string". Once I got the XSD schema I than manually set the data types.
The above is a relatively simple recipe - it takes about 20 minutes. Still, I am somewhat disappointed that currently no tool seems to be able to generate a XSD schema based on multiple files - such functionality would cut the manual effort by half and would make the recipe more robust.