As I noted in my previous post, text mining is a goal for this project. In order to do this kind of analysis on a body of works (like a collection of poetry), it’s easiest to combine all of the text from each entry and dump it all into one document that can be analyzed as a whole. One could brute-force this process by manually copy-pasting every piece by hand, but in all honesty, that’s wasted time that could be much better spent working on something else. So, we automate it.
That brings us to XQuery, a language designed to help, well, query, and transform large collections of (typically) structured text. Using it, we can write a very simple script that iterates over each document, takes all of its internal text (the whole thing or just specific elements), and spits it out into one big master document -- all in a click of a button. Pretty standard stuff, really.
Things get interesting when you do this whole operation while still creating the collection of texts you’re querying, like I am. As it stands right now, my poetry collection is a little half-way to its ideal finished state, meaning I have quite a few more poems to write, but I decided to try getting a head start on the query business early. I’ve been able to analyze my writing so far, and it’s given quite a few insights. Insights that may influence the way I write in the future…
I notice that I rely on certain words a lot: “heartbeat”, “eyes”, “feel”, “like”, and “body”. All of my poems, every single one, are in first-person (“I” was used 10 times across 4 poems). The poems have an overall negative vibe to them, though this was a factor that I was aware of even before the analysis.
After reviewing the results a few days ago, I’ve found myself contemplating on what it might mean going forward. I have many ideas in my head for new poems, ideas that fall very much in-line with these trends that I now feel hesitant to engage with. There are a surge of questions that flood through me whenever I sit down to write.
Do I really want a collection of homogeneous writing that all sounds the same? Shouldn’t I attempt some level of variety, a difference in perspective? Will it cheapen the work by being lacking in diversity of themes and images?
And, yet, on the other hand… Wouldn’t it be inauthentic to alter the way I write, directly from the heart, to fit some arbitrary standard? Will it devalue the entire project to write from a place of being put-upon, of insecurity in the writing itself, rather than trusting my own intuition?
But how else can I improve my writing -- if I don’t adjust, will I stagnate?
It’s exciting as it is overwhelming, because I don’t have the answers to any of these questions. As I sit down in the coming days and weeks, and force out words onto the page, we’ll see if that influence is great enough to justify all of this ruminating.
Beyond just the influence on poetry, there is also the query itself to be
considered. I structure my poem documents to include the poem itself and a
section of line infos that contain annotations of certain lines/stanzas (I
think I mentioned them before). I realized upon evaluation that I, without
even thinking, resolved to query without these lineInfo
elements, only iterating on the poem
element itself. But, why?
Are they not a part of the document, my own writing? I think of them as
“extras,” as a borderline after-thought to the actual text. But, that may be
my own bias as ~poet~ shining through. They’re a part of the project after
all, why shouldn’t they be represented in the analysis?
I have much to think about it seems, and even more querying to do...