drawing

Machine Readable News - Hands on

Warm up : Reading Single News Item¶

  • How many days of the news items are covered in the dataset ?
  • List down the top 5 languages by item count
  • List down the top 4 Topics by item count
  • List down the top 5 Sources by item count
  • What is the proportion of alerts to articles in the dataset ?
  • What is the maximum number of alerts reported for any story in the dataset?
  • Take any company identi er in any of the news item and retrieve the name of the company
In [1]:
import pandas as pd
import json 
import os
data_dir  = "./data/mrn/"
sample_news_item = os.path.join(data_dir , "MRN-JSON-Sample.JSON")
In [2]:
with open(sample_news_item,"r", encoding="utf8")  as f: 
    sample_json = json.load(f)
In [3]:
print(json.dumps(sample_json,indent=4, sort_keys=False))
{
    "guid": "20170102-001056000-nL4N1ES007-1-2",
    "timestamps": [
        {
            "source": "EMEA",
            "name": "recorded",
            "timestamp": "2017-01-02T00:10:56.613Z"
        }
    ],
    "data": {
        "body": " (Repeats story issued on Sunday)\n    BEIJING, Jan 1 (Reuters) - China's manufacturing sector\nexpanded for a fifth month in December, but growth slowed a\ntouch more than expected in a sign that government measures to\nrein in soaring asset prices are starting to have a knock-on\neffect on the broader economy. \n    The official Purchasing Managers' Index (PMI) stood at 51.4\nin December compared with 51.7 in November. A reading above 50\nindicates an expansion on a monthly basis while one below 50\nsuggests a contraction.\n    December's reading was slightly below the forecast in a\nReuters poll for 51.5.\n    A housing boom in the second half of 2016 and a government\nspending spree on infrastructure have helped boost prices for\ncommodities from cement to steel, giving the manufacturing\nsector a much-needed lift. \n    But the government is cracking down on speculative property\nbuying, and signals from policymakers that more will be done to\ncontain asset bubbles and rising debt - even at the expense of\nslower growth - means extra stimulus measures could be limited.\n    \"Today's PMI figures suggest that the change of policy tone\nhas taken its toll, as the authorities are seriously concerned\nabout the asset bubbles,\" said Zhou Hao, senior economist at\nCommerzbank. \n    Factory output slowed in December, with the sub-index\nhitting 53.3 compared with 53.9 the previous month.\n    Total new orders were flat at 53.2, logging the same as in\nNovember, while new export orders fell to 50.1 from 50.3.\n    Jobs were again lost, with the employment sub-index sitting\nat 48.9, compared to 49.2 in November, as the country pledged to\ncut excess capacity over a range of industries. \n    A sub-index for smaller firms fell, and performance for\nlarger companies also worsened.   \n    The Markit/Caixin PMI, a private gauge of manufacturing\nactivity which focuses more on small- and mid-sized firms, is\ndue on Jan. 3.\n    Analysts in a Reuters poll expect it to fall to 50.7 from\nthe previous month's reading of 50.9.\n    A separate reading on the services sector showed the pace of\ngrowth slowed in December, Sunday's data showed.\n    The official non-manufacturing Purchasing Managers' Index\n(PMI) stood at 54.5 in December, down from 54.7 in November, \nbut well above the 50-point mark.\n    China is counting on growth in services - which account for\nmore than half of gross domestic product - to offset persistent\nsoftness in exports that is dragging on the economy. Private\ninvestment has also remained stubbornly weak. \n    But GDP still looks set to hit Beijing's 2016 growth target\nof 6.5 to 7 percent, after expanding 6.7 percent for each of the\nfirst three quarters.\n\n (Reporting by Ben Blanchard, Elias Glenn and Ryan Woo; Editing\nby Richard Pullin)\n ((ben.blanchard@thomsonreuters.com; +86 10 6627 1201; Reuters\nMessaging: ben.blanchard.thomsonreuters.com@reuters.net))\n\nKeywords: CHINA ECONOMY/PMI FACTORY OFFICIAL (REPEAT, UPDATE",
        "mimeType": "text/plain",
        "firstCreated": "2017-01-02T00:10:56.000Z",
        "language": "en",
        "altId": "nL4N1ES007",
        "headline": "RPT-UPDATE 2-Growth in China's factories, services slows in December - official PMI",
        "takeSequence": 1,
        "pubStatus": "stat:usable",
        "subjects": [
            "N2:MCE",
            "N2:ECON",
            "N2:ECI",
            "N2:EMRG",
            "N2:CN",
            "N2:ASIA",
            "N2:PMI",
            "N2:NEWS1",
            "N2:ITSE",
            "N2:ISER",
            "N2:CMSS",
            "N2:BSUP",
            "N2:TCOM",
            "N2:BSVC",
            "N2:ENDOCR",
            "N2:IPR",
            "N2:FINS",
            "N2:BISV",
            "N2:TECH",
            "N2:SWIT",
            "N2:TMT",
            "N2:INDS",
            "N2:GEN",
            "N2:HEA",
            "N2:LEN",
            "N2:RTRS",
            "R:CNPMIB=ECI"
        ],
        "audiences": [
            "NP:C",
            "NP:D",
            "NP:E",
            "NP:M",
            "NP:O",
            "NP:T",
            "NP:U",
            "NP:OIL",
            "NP:NAT",
            "NP:UKI",
            "NP:GRO",
            "NP:MTL",
            "NP:SOF",
            "NP:Z",
            "NP:DNP"
        ],
        "versionCreated": "2017-01-02T00:10:56.000Z",
        "provider": "NS:RTRS",
        "instancesOf": [],
        "id": "20170102-001056000-nL4N1ES007-1-2",
        "urgency": 3
    }
}

Write sample code to answer the following questions:¶

Use the Sample_News_File.JSON from the data directory

  1. How many days of the news items are covered in the dataset ?
  2. List down the top 5 languages by item count
  3. List down the top 4 Topics by item count(Topics have a prefix as N2)
  4. List down the top 5 Sources by item count
  5. What is the proportion of alerts to articles in the dataset ?
  6. What is the maximum number of alerts reported for any story in the dataset?
  7. Take any company mentioned in any of the news item and retrieve the country of incorporation

Q1. How many days of the news items are covered in the dataset ?¶

In [ ]:
 

Q2. List down the top 5 languages by item count¶

In [ ]:
 

Q3. List down the top 4 Topics by item count¶

In [ ]:
 

Q4. List down the top 5 Sources by item count¶

In [ ]:
 

Q5. What is the proportion of alerts to articles in the dataset ?¶

In [ ]:
 

Q6. What is the maximum number of alerts reported for any story in the dataset?¶

In [ ]:
 

Q7. Take any company identifier in any of the news item and retrieve the name of the company¶

In [ ]: