Golang : Grab news article text and use NLP to get each paragraph's sentences
This tutorial is a slight improvement over the previous tutorial which uses prose
natural language processing library to get all the sentences for each paragraph. Instead of pumping in the text as a constant, in this tutorial, we will use the goquery
package to extract the paragraphs from a news article. The example below is configured to read in HTML data from the dailyfx.com website and will not work on other websites unless it is configured first.
Here you go!
package main
import (
"fmt"
"log"
"os"
"strings"
"github.com/PuerkitoBio/goquery"
"gopkg.in/jdkato/prose.v2"
)
func main() {
if len(os.Args) != 2 {
fmt.Printf("Usage : %s url\n", os.Args[0])
os.Exit(0)
}
url := os.Args[1]
// perform a simple sanity check
if url == "" {
fmt.Println("URL address cannot be empty!")
os.Exit(0)
}
fmt.Println("Grabbing text from : ", url)
allTextData := GetArticleText(url)
fmt.Println("All : ", allTextData)
fmt.Println("-----------------------------------------------------")
paragraphs := strings.Split(allTextData, "\n")
//fmt.Println("Paragraph 0 : ", paragraph[0])
//fmt.Println("Paragraph 1 : ", paragraph[1])
//fmt.Println("Paragraph 2 : ", paragraph[2])
// Create a new document for each paragraph
for k, v := range paragraphs {
fmt.Println("Processing paragraph ", k, " : ")
doc, err := prose.NewDocument(v)
if err != nil {
log.Fatal(err)
}
// Iterate over the doc's sentences:
for _, sent := range doc.Sentences() {
fmt.Println("[" + sent.Text + "]")
}
}
}
func GetArticleText(url string) string {
doc, err := goquery.NewDocument(url)
if err != nil {
panic(err)
}
allText := ""
doc.Find(".story_paragraph p").Each(func(i int, s *goquery.Selection) { <---- modify here for other websites
// For each item found, get the paragraph
paragraph := s.Text()
//fmt.Println("Paragraph >>>> ", paragraph)
allText = allText + "\n" + paragraph
})
return allText
}
Sample output:
$./grabdailyfxtext https://www.dailyfx.com/forex/fundamental/dailybriefing/sessionbriefing/euro_open/2019/06/25/EURUSD-Uptrend-May-be-Accelerated-by-Powell-Commentary-US-Data.html
Grabbing text from : https://www.dailyfx.com/forex/fundamental/dailybriefing/sessionbriefing/euro_open/2019/06/25/EURUSD-Uptrend-May-be-Accelerated-by-Powell-Commentary-US-Data.html
All :
See our free guide to learn how to use economic news in your trading strategy! .....Twitter
Processing paragraph 0 :
Processing paragraph 1 : [See our free guide to learn how to use economic news in your trading strategy!]
Processing paragraph 2 : [During the Asia Pacific trading hours, the Japanese Yen and the New Zealand Dollar were outperforming their peers.] [Rising US-Iran tensions induced risk aversion and sent JPY higher, though the cycle sensitive NZD was insulated from the sour market sentiment.] [Better-than-expected trade data appeared to have been the culprit behind the Kiwi’s resilience, though it may soon fizzle out and succumb to the fate of its peers.] []
Processing paragraph 3 : [EURUSD’s uptrend may be accelerated if US consumer confidence comes in below the 131.0 estimate and Fed Chairman Jerome Powell’s economic outlook bolsters market expectations of rate cuts.] [Overnight index swaps are pricing in a 100 percent probability of a cut from the July meeting through year-end.] [However, rhetoric from the central bank has not indicated that policymakers are feeling dovish to that degree.] []
Processing paragraph 4 : [However, hawkish members of the Fed are finding it increasingly difficult to justify their position in light of US growth.] [Since February, economic activity out of the US has been broadly underperforming relative to economists’ expectations – signaling that analysts are over estimating the economy’s strength.] [Inflationary pressure has also been waning alongside a deterioration in global trade due to the US-China trade war.] []
Processing paragraph 5 : [Interested in the impact of the US-China trade war on APAC equities?] [Be sure to follow me on Twitter at @ZabelinDimitri.]
Processing paragraph 6 : [There is the possibility that if the Chairman’s commentary is pessimistic enough, it could put a premium on liquidity and cause investors to flock to the US Dollar.] [However, this would fall out of line with the overall trend of EURUSD’s price action in light of increasing rate cut expectations caused by waning economic growth prospects.] [The pair’s recent rise has primarily been the result of a weaker Greenback.] []
Processing paragraph 7 : [After breaking above 18-month descending resistance (red parallel channel), EURUSD has jumped almost two percent and is now approaching what appears to be a ceiling at 1.1424. Poor confidence data and dovish rhetoric from Powell could propel the pair beyond resistance, though the longer-term outlook suggests EURUSD will capitulate and slide lower as risk aversion causes investors to crowd the US Dollar.] []
Processing paragraph 8 : [CHART OF THE DAY: EURUSD AIMING AT 1.1424 RESISTANCE]
Processing paragraph 9 : [--- Written by Dimitri Zabelin, Jr Currency Analyst for DailyFX.com]
Processing paragraph 10 : [To contact Dimitri, use the comments section below or @ZabelinDimitrion Twitter]
See also : Golang : Get FX sentiment from website example
By Adam Ng(黃武俊)
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+10.2k Generate Random number with math/rand in Go
+18.7k Golang : Check whether a network interface is up on your machine
+13.3k Golang : Strings comparison
+19.5k Golang : Convert(cast) bytes.Buffer or bytes.NewBuffer type to io.Reader
+14.3k Golang : Rename directory
+7.7k Golang : Handle Palindrome string with case sensitivity and unicode
+22.1k Golang : How to read JPG(JPEG), GIF and PNG files ?
+10.2k Golang : Meaning of omitempty in struct's field tag
+14.8k Golang : How to check if IP address is in range
+27.1k Golang : Convert integer to binary, octal, hexadecimal and back to integer
+10.8k Golang : Read until certain character to break for loop