Sunday, July 15, 2018

Microsoft JFK Experiment

Last November when Microsoft announced they were using the recently released JFK assassination records as a test sample of their new Artificial Intelligence Cognitive software, they said they would make it available to the public and serious researchers.

They also noticed some things right away, - the Cuban connection was key, something that I too noticed when I first stared reading the 24 volume Warren Commission testimony and documents when I was in college. 

Microsoft also announced, a few weeks later, that the test system was meant to be an example to potential customers and not a system that others could work. 

As some other more computer savy researchers have commented, what is needed is a complete scan and digitalization of the entire JFK Collection and not just the small percentage that were withheld and released recently over the internet. 

More to come on this. 

BK



Using Azure and AI to Explore the JFK Files

November 15, 2017 by ML Blog Team /

This post is by Corom Thompson, Principal Software Engineer at Microsoft.

On November 22nd, 1963, the President of the United States, John F. Kennedy, was assassinated. He was shot by a lone gunman named Lee Harvey Oswald while driving through the streets of Dallas in his motorcade. The assassination has been the subject of so much controversy that, 25 years ago, an act of Congress mandated that all documents related to the assassination be released this year. The first batch of released files has more than 6,000 documents totaling 34,000 pages, and the last drop of files contains at least twice as many documents. 

We’re all curious to know what’s inside them, but it would take decades to read through these. We approached this problem of gaining insights by using Azure Search and Cognitive Services to extract knowledge from this deluge of documents, using a continuous process that ingests raw documents, enriching them into structured information that enables you to explore the underlying data.
Today, at the Microsoft Connect(); 2017 event, we created the demo web site* shown in Figure 1 below – this is a web application that uses the AzSearch.js library and designed to give you interesting insights into this vast trove of information.

Figure 1 – JFK Files web application for exploring the released files

On the left you can see that the documents are broken down by the entities that were extracted from them. Already we know these documents are related to JFK, the CIA, and the FBI. Leveraging several Cognitive Services, including optical character recognition (OCR), Computer Vision, and custom entity linking, we were able to annotate all the documents to create a searchable tag index.

We were also able to create a visual map of these linked entities to demonstrate the relationships between the different tags and data. Below, in Figure 2, is the visualization of what happened when we searched this index for “Oswald”.

Figure 2 – Visualization of the entity linked mapping of tags for the search term “Oswald”

Through further investigation and linking, we were able to even identify that the entity linking Cognitive Service annotated this term with a connection to Wikipedia, and we quickly realized that the Nosenko who was identified in the documents was actually a KGB defector interrogated by the CIA, and these are audio tapes of the actual interrogation. It would have taken years to figure out these connections, but we were instead able to do this in minutes thanks to the power of Azure Search and Cognitive Services.

Another fun fact we learned is that the government was actually using SQL Server and a secured architecture to manage these documents in 1997, as seen in the architecture diagram in Figure 3 below.

Figure 3 – Architecture diagram from 1997 indicating SQL Server was used to manage these documents

Figure 4 – Updated architecture of Azure Search and Cognitive Services

We’ll be making this code available soon, along with tutorials of how we built the solution – stay tuned for more updates and links on this blog.

Update to original blog post: 
The code is now available in GitHub here.

Meanwhile, you can navigate through the online version of our application* and draw your own insights!

Corom

* Try typing a keyword into the Search bar up at the top of the demo site, to get started, e.g. “Oswald”.

We have created an architecture diagram of our own to demonstrate how this new AI-powered approach is orchestrating the data and pulling insights from it – see Figure 4 below.

This is the updated architecture we used to apply the latest and greatest Azure-powered developer tools to create these insightful web apps. Figure 4 displays this architecture using the same style from 54 years ago.


“Cognitive search” is Microsoft’s concept and tool; use computing power to capture analyze massive amounts of data found in the latest JFK files.

But the example these engineers give is not exactly inspiring.

We just ran all the content through, and right away you could see that the CIA, the FBI, and even Cuba were involved in all of this.

This statement is certainly true–the CIA, FBI and Cuba all figure in the assassination story. But we don’t need artificial intelligence to reach it. Anybody who read the Warren Commission, Church Committee or HSCA final reports know this fact.

Sociable notes that it was “purely coincidental” Microsoft brought this new service to market at the very same time that the last of the JFK files were in the news. In April President Trump released thousands of JFK files while keeping redactions in at least 15,834 other JFK documents.

The question is whether AI and Machine Learning can bring us to new conclusions about the JFK case. I don’t dismiss the possibility. I just don’t think we yet have the right questions for the cognitive search technology..

So I want to put this to the technologists and programmers out there: Can AI give us new, empirical, verifiable insights about the JFK story? If so, how?

For those that do not see value in this, you must have missed things in this video for some odd reason.

The software is enabled to read and recognise, names and therefor can group related documents together, but can also see patterns with stamps, scribbles and signatures. What researcher that goes through the trove of documents would not want that type of tech to his assistance. Think about the man hours saved.

The spider web part shows how these names inter-relate to each other. I love it.



No comments:

Post a Comment