ProductPromotion
Logo

Go.Lang

made by https://0x3d.site

GitHub - jdkato/prose: :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction. - jdkato/prose
Visit Site

GitHub - jdkato/prose: :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

GitHub - jdkato/prose: :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose Build Status GoDoc Coverage Status Go Report Card codebeat badge Awesome

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

You can find a more detailed summary on the library's performance here: Introducing prose v2.0.0: Bringing NLP to Go.

Installation

$ go get github.com/jdkato/prose/v2

Usage

Contents

Overview

package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("Go is an open-source programming language created at Google.")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag, tok.Label)
        // Go NNP B-GPE
        // is VBZ O
        // an DT O
        // ...
    }

    // Iterate over the doc's named-entities:
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Go GPE
        // Google GPE
    }

    // Iterate over the doc's sentences:
    for _, sent := range doc.Sentences() {
        fmt.Println(sent.Text)
        // Go is an open-source programming language created at Google.
    }
}

The document-creation process adheres to the following sequence of steps:

tokenization -> POS tagging -> NE extraction
            \
             segmentation

Each step may be disabled (assuming later steps aren't required) by passing the appropriate functional option. To disable named-entity extraction, for example, you'd do the following:

doc, err := prose.NewDocument(
        "Go is an open-source programming language created at Google.",
        prose.WithExtraction(false))

Tokenizing

prose includes a tokenizer capable of processing modern text, including the non-word character spans shown below.

Type Example
Email addresses [email protected]
Hashtags #trending
Mentions @jdkato
URLs https://github.com/jdkato/prose
Emoticons :-), >:(, o_0, etc.
package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("@jdkato, go to http://example.com thanks :).")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag)
        // @jdkato NN
        // , ,
        // go VB
        // to TO
        // http://example.com NN
        // thanks NNS
        // :) SYM
        // . .
    }
}

Segmenting

prose includes one of the most accurate sentence segmenters available, according to the Golden Rules created by the developers of the pragmatic_segmenter.

Name Language License GRS (English) GRS (Other) Speed†
Pragmatic Segmenter Ruby MIT 98.08% (51/52) 100.00% 3.84 s
prose Go MIT 75.00% (39/52) N/A 0.96 s
TactfulTokenizer Ruby GNU GPLv3 65.38% (34/52) 48.57% 46.32 s
OpenNLP Java APLv2 59.62% (31/52) 45.71% 1.27 s
Standford CoreNLP Java GNU GPLv3 59.62% (31/52) 31.43% 0.92 s
Splitta Python APLv2 55.77% (29/52) 37.14% N/A
Punkt Python APLv2 46.15% (24/52) 48.57% 1.79 s
SRX English Ruby GNU GPLv3 30.77% (16/52) 28.57% 6.19 s
Scapel Ruby GNU GPLv3 28.85% (15/52) 20.00% 0.13 s

† The original tests were performed using a MacBook Pro 3.7 GHz Quad-Core Intel Xeon E5 running 10.9.5, while prose was timed using a MacBook Pro 2.9 GHz Intel Core i7 running 10.13.3.

package main

import (
    "fmt"
    "strings"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, _ := prose.NewDocument(strings.Join([]string{
        "I can see Mt. Fuji from here.",
        "St. Michael's Church is on 5th st. near the light."}, " "))

    // Iterate over the doc's sentences:
    sents := doc.Sentences()
    fmt.Println(len(sents)) // 2
    for _, sent := range sents {
        fmt.Println(sent.Text)
        // I can see Mt. Fuji from here.
        // St. Michael's Church is on 5th st. near the light.
    }
}

Tagging

prose includes a tagger based on Textblob's "fast and accurate" POS tagger. Below is a comparison of its performance against NLTK's implementation of the same tagger on the Treebank corpus:

Library Accuracy 5-Run Average (sec)
NLTK 0.893 7.224
prose 0.961 2.538

(See scripts/test_model.py for more information.)

The full list of supported POS tags is given below.

TAG DESCRIPTION
( left round bracket
) right round bracket
, comma
: colon
. period
'' closing quotation mark
`` opening quotation mark
# number sign
$ currency
CC conjunction, coordinating
CD cardinal number
DT determiner
EX existential there
FW foreign word
IN conjunction, subordinating or preposition
JJ adjective
JJR adjective, comparative
JJS adjective, superlative
LS list item marker
MD verb, modal auxiliary
NN noun, singular or mass
NNP noun, proper singular
NNPS noun, proper plural
NNS noun, plural
PDT predeterminer
POS possessive ending
PRP pronoun, personal
PRP$ pronoun, possessive
RB adverb
RBR adverb, comparative
RBS adverb, superlative
RP adverb, particle
SYM symbol
TO infinitival to
UH interjection
VB verb, base form
VBD verb, past tense
VBG verb, gerund or present participle
VBN verb, past participle
VBP verb, non-3rd person singular present
VBZ verb, 3rd person singular present
WDT wh-determiner
WP wh-pronoun, personal
WP$ wh-pronoun, possessive
WRB wh-adverb

NER

prose v2.0.0 includes a much improved version of v1.0.0's chunk package, which can identify people (PERSON) and geographical/political Entities (GPE) by default.

package main

import (
    "github.com/jdkato/prose/v2"
)

func main() {
    doc, _ := prose.NewDocument("Lebron James plays basketball in Los Angeles.")
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Lebron James PERSON
        // Los Angeles GPE
    }
}

However, in an attempt to make this feature more useful, we've made it straightforward to train your own models for specific use cases. See Prodigy + prose: Radically efficient machine teaching in Go for a tutorial.

Articles
to learn more about the golang concepts.

Resources
which are currently available to browse on.

mail [email protected] to add your project or resources here 🔥.

FAQ's
to know more about the topic.

mail [email protected] to add your project or resources here 🔥.

Queries
or most google FAQ's about GoLang.

mail [email protected] to add more queries here 🔍.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory