Last November when Microsoft announced they were using the recently released JFK assassination records as a test sample of their new Artificial Intelligence Cognitive software, they said they would make it available to the public and serious researchers.
They also noticed some things right away, - the Cuban connection was key, something that I too noticed when I first stared reading the 24 volume Warren Commission testimony and documents when I was in college.
Microsoft also announced, a few weeks later, that the test system was meant to be an example to potential customers and not a system that others could work.
As some other more computer savy researchers have commented, what is needed is a complete scan and digitalization of the entire JFK Collection and not just the small percentage that were withheld and released recently over the internet.
More to come on this.
BK
Using Azure and AI to Explore the JFK Files
This post is by Corom Thompson, Principal Software
Engineer at Microsoft.
On November 22nd, 1963, the President of the United
States, John F. Kennedy, was assassinated. He was shot by a lone
gunman named Lee Harvey Oswald while driving through the streets of
Dallas in his motorcade. The assassination has been the subject
of so much controversy that, 25 years ago, an act of
Congress mandated that all documents related to the assassination be
released this year. The first batch of released files has more than 6,000
documents totaling 34,000 pages, and the last drop of files contains at least
twice as many documents.
We’re all curious to know what’s inside them, but it
would take decades to read through these. We approached this problem of gaining
insights by using Azure Search and Cognitive Services to extract knowledge from
this deluge of documents, using a continuous process that ingests raw
documents, enriching them into structured information that enables you to explore the underlying
data.
Today, at the Microsoft Connect(); 2017
event, we created the demo web
site* shown in Figure 1 below – this is a web application that uses
the AzSearch.js
library and designed to give you interesting insights
into this vast trove of information.
Figure 1 – JFK Files web application for exploring
the released files
On the left you can see that the documents are
broken down by the entities that were extracted from them. Already we know
these documents are related to JFK, the CIA, and the FBI. Leveraging several
Cognitive Services, including optical character recognition (OCR), Computer
Vision, and custom entity linking, we were able to annotate all the documents
to create a searchable tag index.
We were also able to create a visual map of these
linked entities to demonstrate the relationships between the different tags and
data. Below, in Figure 2, is the visualization of what happened when we
searched this index for “Oswald”.
Figure 2 – Visualization of the entity linked
mapping of tags for the search term “Oswald”
Through further investigation and linking, we were
able to even identify that the entity linking Cognitive Service annotated this
term with a connection to Wikipedia, and we quickly realized that the Nosenko
who was identified in the documents was actually a KGB defector interrogated by
the CIA, and these are audio tapes of the actual interrogation. It would have
taken years to figure out these connections, but we were instead able to do
this in minutes thanks to the power of Azure Search and Cognitive Services.
Another fun fact we learned is that the government
was actually using SQL Server and a secured architecture to manage these
documents in 1997, as seen in the architecture diagram in Figure 3 below.
Figure 3 – Architecture diagram from 1997 indicating
SQL Server was used to manage these documents
Figure 4 – Updated architecture of Azure Search and
Cognitive Services
We’ll be making this code available soon, along with
tutorials of how we built the solution – stay tuned for more updates and links
on this blog.
Meanwhile, you can navigate through the online version of
our application* and draw your own insights!
Corom
* Try typing a keyword into the Search bar up at the
top of the demo site, to get started, e.g. “Oswald”.
We have created an architecture diagram of our own
to demonstrate how this new AI-powered approach is orchestrating the data and
pulling insights from it – see Figure 4 below.
This is the updated architecture we used to apply
the latest and greatest Azure-powered developer tools to create these
insightful web apps. Figure 4 displays this architecture using the same style
from 54 years ago.
“Cognitive
search” is Microsoft’s concept and tool; use computing power to capture analyze
massive amounts of data found in the latest JFK files.
But the
example these engineers give is not exactly inspiring.
We just
ran all the content through, and right away you could see that the CIA, the
FBI, and even Cuba were involved in all of this.
This
statement is certainly true–the CIA, FBI and Cuba all figure in the
assassination story. But we don’t need artificial intelligence to reach it.
Anybody who read the Warren Commission, Church Committee or HSCA final reports
know this fact.
Sociable notes that it was “purely
coincidental” Microsoft brought this new service to market at the very same
time that the last of the JFK files were in the news. In April President Trump
released thousands of JFK files while keeping redactions in at least 15,834 other JFK documents.
The
question is whether AI and Machine Learning can bring us to new conclusions
about the JFK case. I don’t dismiss the possibility. I just don’t think we yet
have the right questions for the cognitive search technology..
So I
want to put this to the technologists and programmers out there: Can AI give us
new, empirical, verifiable insights about the JFK story? If so, how?
For
those that do not see value in this, you must have missed things in this video
for some odd reason.
The software is enabled to read and recognise, names and therefor can group related documents together, but can also see patterns with stamps, scribbles and signatures. What researcher that goes through the trove of documents would not want that type of tech to his assistance. Think about the man hours saved.
The spider web part shows how these names inter-relate to each other. I love it.
No comments:
Post a Comment